Anatomy of Tool Calling in AI: step by step

Apr 21, 2026 · Dima & Alita

The name “Tool Calling” or “Function Calling” itself is often confusing because it seems like the neural network (LLM) executes the code itself.

How exactly does Tool Calling work in OpenClaw or GitHub Copilot?

All the magic of an “Agent” (whether it’s OpenClaw, NanoClaw, ZeroClaw, Hermes, GitHub Copilot, or ChatGPT) is born from a dialogue (ping-pong) between the Model and the Executor (let’s take OpenClaw as an example).

The model never executes the code itself. An LLM is simply a text generator. It has no idea how to run bash, nodejs, python, how to read a file, make a database request, or log in to a website.

Imagine a user writes in Telegram: “How much sugar is needed for the charlotte recipe?”.

Under the hood, Telegram formats this request as a JSON object and passes it to OpenClaw:

{
	"message": {
		"text": "How much sugar is needed for the charlotte recipe?",
		"chat": { "id": 123456789 }
	}
}

Step 1: OpenClaw prepares the ground (System prompt)

OpenClaw is dumb - it doesn’t know what to do with this request from the user - it simply takes the user’s message, “attaches” to it a list of its tools in JSON Schema format, and sends the request to the model - it essentially asks the model: here is what I can do, and here is a request from the user - what should I do with it?

It sends the following request to the model:

{
	"messages": [
		{
			"role": "user",
			"content": "How much sugar is needed for the charlotte recipe?"
		}
	],
	"tools": [
		{
			"type": "function",
			"function": {
				"name": "search_files",
				"description": "Searches for files by name or pattern",
				"parameters": {
					"type": "object",
					"properties": {
						"query": { "type": "string" }
					},
					"required": ["query"]
				}
			}
		},
		{
			"type": "function",
			"function": {
				"name": "read_file",
				"description": "Reads file contents",
				"parameters": {
					"type": "object",
					"properties": {
						"path": { "type": "string" }
					},
					"required": ["path"]
				}
			}
		},
		{
			"type": "function",
			"function": {
				"name": "list_files",
				"description": "Shows the list of files in a directory",
				"parameters": {
					"type": "object",
					"properties": {
						"path": { "type": "string" }
					}
				}
			}
		},
		{
			"type": "function",
			"function": {
				"name": "run_terminal_command",
				"description": "Runs a command in the terminal",
				"parameters": {
					"type": "object",
					"properties": {
						"command": { "type": "string" }
					},
					"required": ["command"]
				}
			}
		},
		{
			"type": "function",
			"function": {
				"name": "http_request",
				"description": "Makes an HTTP request to an external API",
				"parameters": {
					"type": "object",
					"properties": {
						"url": { "type": "string" },
						"method": { "type": "string" }
					},
					"required": ["url", "method"]
				}
			}
		}
	]
}

Step 2: The model understands what to do

The model (for example, GPT-4o, Claude 3.7 Sonnet, or Gemini 1.5 Pro) receives this request. It sees the user’s question and sees a whole set of tools right in front of it. Then it matches the user’s task with the available tools and tries to choose an appropriate one.

Since the user is asking about a recipe, the model has to guess where to look for it. And here lies a critical nuance: the model can make a mistake and choose the wrong tool.

For example, it might think: “Aha, a recipe! I’ll look it up on the internet!” and try to make an http_request to Google. Or the model might try to “guess” the file name and immediately call read_file with the path path="charlotte.txt", which will result in an error if the path doesn’t exist. But given a good search tool (and perhaps proper system instructions), the model will realize that it’s better to first search for a file on the local disk, and will choose search_files.

In practice, the choice is made not by “magic”, but by combining multiple signals: what the user asks for, how the tools are named, what their description says, what arguments they accept, and what has already happened earlier in this dialogue.

Therefore, solving a task often involves a whole chain of calls. For example, the model first asks the agent to call search_files to find the exact file path, and only then asks to call read_file to read its content.

Our example chain:

The user writes: “How much sugar is needed for the charlotte recipe?”.
The model forms a response and asks OpenClaw (GitHub Copilot, Antigravity, or ChatGPT) to call the search_files tool with the parameter query="charlotte", so that it returns search results, since the model doesn’t know the exact path to the recipe yet.
OpenClaw searches the disk and returns a few options to the model, for example: [{"path":"/old_archives/grandmas_charlotte.txt"}, {"path":"/documents/charlotte_recipe.txt"}]. The model analyzes this list and decides which file is best.
The model, having selected the right path from the list, sends a second request to OpenClaw with a new tool call — this time it’s read_file with the argument path=/documents/charlotte_recipe.txt. In other words, it tells OpenClaw: “read this file using read_file“.
OpenClaw reads the file and sends its contents back to the model. Only after studying the recipe text does the model write the final response to the user.

So, tool calling isn’t necessarily a single action, but a small chain: found -> read -> answered.

The model DOES NOT read the file and DOES NOT search the disk or the internet itself — it simply chooses a tool from the list provided by the agent. Having chosen, it tells the agent: “use this tool”. The agent physically uses it and returns the execution result back to the model.

Technically, the very first step forms a special response (JSON), in which the model says: “Hey, agent (OpenClaw, GitHub Copilot, or ChatGPT), call this function of yours search_files with the argument query="charlotte" and return the result to me - I’ll take a look”.

Here is what that first response from the LLM in OpenClaw looks like:

{
	"role": "assistant",
	"content": null,
	"tool_calls": [
		{
			"id": "call_search_6c3bc42b",
			"type": "function",
			"function": {
				"name": "search_files",
				"arguments": "{"query":"charlotte"}"
			}
		}
	]
}

Step 3: OpenClaw (The Executor) applies the first tool

OpenClaw receives this JSON on the server. It sees: “The model asks to run search_files with the parameter query="charlotte"”.

At this exact moment, the OpenClaw code on the server physically executes a file search for the word “charlotte” on the disk. OpenClaw doesn’t think - it simply runs the function and returns the search result (an array of found paths) back to the model, appending it to the message history.

Step 4: The model analyzes the result and asks to read the file

The model receives the list of found files: [{"path":"/old_archives/grandmas_charlotte.txt"}, {"path":"/documents/charlotte_recipe.txt"}].

It looks at them and makes a decision: “Aha, the recipe I need is most likely in the second file”. And now it outputs a new JSON response with a request to call the second tool — read_file:

{
	"role": "assistant",
	"content": null,
	"tool_calls": [
		{
			"id": "call_read_d88b07ff",
			"type": "function",
			"function": {
				"name": "read_file",
				"arguments": "{"path":"/documents/charlotte_recipe.txt"}"
			}
		}
	]
}

Step 5: OpenClaw reads the file and returns the result

OpenClaw receives this new instruction. The script runs, physically reads the file charlotte_recipe.txt from the disk, and gets the text: "Apples - 1 kg, flour - 200g, sugar - 1 cup".

Now OpenClaw forms a new request to the LLM, passing it the entire accumulated history of this chain:

{
	"messages": [
		{
			"role": "user",
			"content": "How much sugar is needed for the charlotte recipe?"
		},
		{
			"role": "assistant",
			"content": null,
			"tool_calls": [
				{
					"id": "call_search_6c3bc42b",
					"type": "function",
					"function": {
						"name": "search_files",
						"arguments": "{"query":"charlotte"}"
					}
				}
			]
		},
		{
			"role": "tool",
			"tool_call_id": "call_search_6c3bc42b",
			"content": "[{"path":"/old_archives/grandmas_charlotte.txt"}, {"path":"/documents/charlotte_recipe.txt"}]"
		},
		{
			"role": "assistant",
			"content": null,
			"tool_calls": [
				{
					"id": "call_read_d88b07ff",
					"type": "function",
					"function": {
						"name": "read_file",
						"arguments": "{"path":"/documents/charlotte_recipe.txt"}"
					}
				}
			]
		},
		{
			"role": "tool",
			"tool_call_id": "call_read_d88b07ff",
			"content": "Apples - 1 kg, flour - 200g, sugar - 1 cup"
		}
	],
	"tools": [
		{
			"type": "function",
			"function": {
				"name": "search_files",
				"description": "Searches for files by name or pattern",
				"parameters": { "type": "object", "properties": { "query": { "type": "string" } }, "required": ["query"] }
			}
		},
		{
			"type": "function",
			"function": {
				"name": "read_file",
				"description": "Reads file contents",
				"parameters": { "type": "object", "properties": { "path": { "type": "string" } }, "required": ["path"] }
			}
		},
		"..."
	]
}

This request flies back to the LLM model (e.g., OpenAI).

Important nuance: memory and context window

Notice how the messages list is growing? Classic LLMs (at least the vast majority of models prior to 2025-2026) inherently operate “without memory” (they are stateless). Therefore, OpenClaw (and any other agent) attaches all past messages (user requests, tool calls, and their results) with each new interaction.

These messages will accumulate and be passed again and again as long as they fit into the context window (the memory limit of a specific model, e.g., 128 thousand tokens). As soon as the dialogue history grows too large and stops fitting into this limit, the agent begins trimming the oldest messages from the beginning of the history (which is why the model starts to “forget”), to free up space for new steps and stay within the model’s context window limits.

P.S. The AI industry is evolving rapidly. Now there are models emerging with built-in memory (stateful APIs) and agent models that know how to store dialogue context on their side, or even have their own tools “under the hood”. Nevertheless, the ‘stateless communication’ mechanism described above (where the client itself passes the entire history and tool schemas in JSON) is the foundation upon which 99% of current AI applications are built.

Step 6: Final response

The LLM (e.g., OpenAI) looks at the updated history. It sees:

The user asked about the amount of sugar in the charlotte recipe.
I asked to call the search function.
I was returned a list of two files.
I asked to call the read function for the second file.
The function read the file and returned the recipe text.

The model analyzes this text and generates the final human response: “To prepare the charlotte, you will need 1 cup of sugar”.

Here is what this final JSON from the model looks like technically:

{
	"role": "assistant",
	"content": "To prepare the charlotte, you will need 1 cup of sugar"
}

This JSON is returned to OpenClaw, and OpenClaw extracts the text from the content field and displays it to the user in the interface.

OpenClaw, GitHub Copilot, and others all work exactly the same way: they provide the LLM with a list of tools, ask it to make a decision, and then perform the execution locally themselves.