March 4, 2026 6 min read

Tool Calling Explained: Turn Your LLM into an AI Agent

Tool calling (function calling) is the bridge that transforms static LLMs into autonomous agents capable of invoking external APIs, databases, and services. Learn the six-step process that powers ChatGPT plugins, AI assistants, and production agent systems—and how to implement it yourself.

Tool Calling Explained — How to Turn Your LLM into an AI Agent That Actually Does Things

Out-of-the-box LLMs can’t check your calendar, pull live weather data, or query your database. They’re brilliant conversationists trapped in a sensory deprivation chamber, completely isolated from the real world. The result? You get impressive prose about what to do, but zero ability to actually do it.

Tool calling (also called function calling) changes everything. It’s the bridge that transforms a chatbot into an agent—an LLM that can invoke external functions and APIs. Yet most explanations overcomplicate it.

Let’s fix that.

The Core Idea: Your LLM Is a Smart Intern With a Rolodex

Think of tool calling like delegating to a brilliant intern who has no direct system access. When you ask, “What’s the temperature in Paris right now?”, they can’t just know the answer. But here’s what they can do:

  1. Recognize they need the weather API
  2. Write down exactly how to call it: getWeather(city: "Paris")
  3. Hand you that note
  4. Wait for you to run it and give back the result: “65°F, sunny”
  5. Use that data to craft a proper response: “It’s 65 degrees and sunny in Paris”

That’s tool calling. The LLM doesn’t execute functions—it identifies when it needs one, formats the call correctly, and uses the result to complete its answer.

The crucial insight: the model never runs anything. It’s a decision-maker and instruction writer. You, the developer, handle the actual execution.

The Six-Step Dance

Here’s the complete flow:

Step 1: You describe your tools upfront

Before the conversation starts, you give the LLM a structured description of available tools. For a weather API:

{
  "name": "getWeather",
  "description": "Fetches current weather for a given location",
  "parameters": {
    "type": "object",
    "properties": {
      "city": {
        "type": "string",
        "description": "City name"
      }
    },
    "required": ["city"]
  }
}

This travels with every request. The LLM sees both the user’s question and the available tools simultaneously.

Step 2: User asks a question

“What’s the weather in Paris?” gets sent as a standard message. The tool description is included in the same API request.

Step 3: LLM decides to use a tool

Instead of answering directly, the LLM responds with structured instructions:

{
  "tool_call": {
    "function": "getWeather",
    "arguments": {"city": "Paris"}
  }
}

The LLM is saying: “I recognize I need external data, and here’s exactly how to get it.” Nothing has executed yet.

Step 4: You execute the function

Your application code intercepts that tool call, parses it, and invokes the actual weather API. You get back real data:

{"temp": 65, "condition": "sunny"}

This is where the magic happens—your code bridges the LLM’s instructions to actual APIs, databases, or services.

Step 5: You feed the result back

Here’s the part that confuses people: you send a new message to the LLM containing:

  • The original question
  • The tool call the LLM made
  • The actual result from the API

This becomes part of the conversation history. You’re giving the LLM its own notes back with the answer filled in.

Step 6: LLM generates the final answer

Now with real data in context, the LLM responds naturally:

“The temperature in Paris is 65°F and it’s sunny.”

It synthesizes the raw API data into a conversational response.

Why This Matters

This is how ChatGPT Plugins, GPTs, and every AI agent you’ve heard of actually works.

When you see ChatGPT browse the web, analyze data files, or book reservations, it’s using tool calling. The model doesn’t have a built-in web browser—it has access to a web_search tool that it can invoke when needed.

Real-world applications you can build today:

  • Customer support bots that check order status by querying your database
  • Data analysis assistants that run SQL queries and plot results
  • Scheduling agents that check calendar APIs and book meetings
  • DevOps helpers that restart services or check logs when asked

Without tool calling, you’re limited to whatever knowledge was in the training data. With it, your LLM becomes a natural language interface to your entire tech stack.

In production systems, this pattern makes the difference between a research demo and something users actually rely on. I’ve seen radiology assistants query PACS systems, pull patient history, and cross-reference findings—all through tool calling. The LLM provides the intelligence layer; tools provide the muscle.

The Three Things That Trip People Up

1. The LLM doesn’t automatically detect when it needs a tool

The biggest mistake: assuming the LLM “knows” when external data is required. It doesn’t. The decision is based entirely on how you describe your tools.

Vague description: “gets data” → LLM ignores it
Too specific: “only for Paris weather on Tuesdays” → LLM won’t generalize
Just right: “Fetches current weather conditions for any city worldwide” → LLM uses it appropriately

The tool description is a contract between you and the LLM. Write it like you’re explaining to that smart intern exactly when and how to use each tool.

2. Tool calling adds latency

Each tool invocation is a round-trip: LLM → your code → API → your code → LLM. For complex queries needing multiple tools, this stacks up.

Some frameworks now support parallel tool calling to mitigate this. Design your tools to be efficient—one well-designed tool that returns comprehensive data beats three separate calls.

3. The LLM can get the arguments wrong

If your tool expects ISO date strings but the description isn’t clear, the LLM might pass “next Tuesday” instead. Always validate and sanitize tool call arguments before executing them. Treat tool calls like user input—because that’s essentially what they are.

Code Example: Weather Tool in Python

Here’s a minimal implementation using OpenAI’s API:

import openai
import requests

# Step 1: Define your tool
tools = [{
    "type": "function",
    "function": {
        "name": "getWeather",
        "description": "Fetches current weather for a given city",
        "parameters": {
            "type": "object",
            "properties": {
                "city": {"type": "string", "description": "City name"}
            },
            "required": ["city"]
        }
    }
}]

# Step 2: User asks a question
messages = [{"role": "user", "content": "What's the weather in Paris?"}]

# Step 3: LLM decides to use a tool
response = openai.chat.completions.create(
    model="gpt-4",
    messages=messages,
    tools=tools
)

tool_call = response.choices[0].message.tool_calls[0]

# Step 4: You execute the function
def get_weather(city):
    # Actual API call here
    return {"temp": 65, "condition": "sunny"}

result = get_weather(tool_call.function.arguments["city"])

# Step 5: Feed result back
messages.append(response.choices[0].message)
messages.append({
    "role": "tool",
    "tool_call_id": tool_call.id,
    "content": str(result)
})

# Step 6: LLM generates final answer
final_response = openai.chat.completions.create(
    model="gpt-4",
    messages=messages
)

print(final_response.choices[0].message.content)

Summary

Three takeaways:

  1. Tool calling is structured conversation, not magic execution—the LLM formats function calls as JSON, you run them, then feed results back as context for the final answer

  2. Tool descriptions determine success—vague descriptions lead to ignored tools; clear, purposeful descriptions with explicit parameters lead to reliable invocation

  3. This is the foundation of AI agents—every system that lets an LLM “take action” uses this pattern; it’s the difference between a chatbot and an agent


What’s the first tool you’d want to give an LLM access to in your work? A database query function? A calendar API? Something else? Reply and let me know—I’m curious what real-world problems you’re trying to solve.


What’s the first real-world tool calling use case you’d build for your application?

Comments