Blog/AI Automation/Building AI Agents with Tool Calling
POST
August 02, 2025
LAST UPDATEDAugust 02, 2025

Building AI Agents with Tool Calling

Learn to build AI agents with tool calling using Vercel AI SDK. Implement function calling, structured outputs, and agentic loops for autonomous tasks.

Tags

AIAgentsTool CallingVercel AI SDKFunction Calling
Building AI Agents with Tool Calling
5 min read

Building AI Agents with Tool Calling

This is Part 5 of the AI Automation Engineer Roadmap series. This post builds on everything from Part 1 through Part 4.

TL;DR

Tool calling transforms LLMs from text generators into autonomous agents that can interact with APIs, databases, and external systems to complete complex tasks. This post covers how tool calling works with OpenAI and Anthropic, defining tools with Zod schemas, implementing ReAct loops, building multi-step agents with the Vercel AI SDK, error handling, and knowing when agents are overkill.

Why This Matters

Everything we have covered in this series -- LLM fundamentals, context engineering, RAG, and vector databases -- converges here. An AI agent combines all of these capabilities: it reasons about a task (context engineering), retrieves information (RAG), and takes actions (tool calling) in a loop until the task is complete. This is the pattern behind GitHub Copilot's workspace features, customer support bots that actually resolve tickets, and internal tools that automate multi-step workflows. Understanding how to build agents -- and when not to -- is the capstone skill for an AI automation engineer.

Core Concepts

What Are AI Agents?

An AI agent is an LLM that can:

  1. Reason about what needs to be done
  2. Act by calling tools (functions)
  3. Observe the results of those actions
  4. Repeat until the task is complete

The key difference between a chatbot and an agent is autonomy. A chatbot responds to a message. An agent pursues a goal across multiple steps, deciding which tools to use and when to stop.

How Tool Calling Works

When you define tools for an LLM, you are giving it a menu of functions it can call. The model does not execute the functions -- it outputs a structured request saying "I want to call function X with these arguments." Your code executes the function and feeds the result back to the model.

typescript
// The flow:
// 1. You define tools (name, description, parameters)
// 2. User sends a message
// 3. Model decides to call a tool (returns tool name + arguments as JSON)
// 4. Your code executes the tool
// 5. You send the result back to the model
// 6. Model either calls another tool or responds to the user

Tool Calling with OpenAI

typescript
import OpenAI from "openai";
 
const openai = new OpenAI();
 
// Define tools with JSON Schema
const tools: OpenAI.Chat.Completions.ChatCompletionTool[] = [
  {
    type: "function",
    function: {
      name: "get_weather",
      description: "Get the current weather for a location",
      parameters: {
        type: "object",
        properties: {
          location: {
            type: "string",
            description: "City name, e.g., 'San Francisco, CA'",
          },
          unit: {
            type: "string",
            enum: ["celsius", "fahrenheit"],
            description: "Temperature unit",
          },
        },
        required: ["location"],
      },
    },
  },
  {
    type: "function",
    function: {
      name: "search_database",
      description: "Search the product database for items matching a query",
      parameters: {
        type: "object",
        properties: {
          query: { type: "string", description: "Search query" },
          category: { type: "string", description: "Product category filter" },
          limit: { type: "number", description: "Max results to return" },
        },
        required: ["query"],
      },
    },
  },
];
 
// Execute a tool call
async function executeTool(
  name: string,
  args: Record<string, unknown>
): Promise<string> {
  switch (name) {
    case "get_weather":
      // In production, call a real weather API
      return JSON.stringify({
        location: args.location,
        temperature: 72,
        unit: args.unit || "fahrenheit",
        conditions: "sunny",
      });
    case "search_database":
      // In production, query your actual database
      return JSON.stringify({
        results: [
          { name: "Widget Pro", price: 29.99, inStock: true },
          { name: "Widget Basic", price: 9.99, inStock: true },
        ],
        total: 2,
      });
    default:
      throw new Error(`Unknown tool: ${name}`);
  }
}
 
// The agent loop
async function runAgent(userMessage: string): Promise<string> {
  const messages: OpenAI.Chat.Completions.ChatCompletionMessageParam[] = [
    {
      role: "system",
      content: "You are a helpful assistant. Use the available tools when needed.",
    },
    { role: "user", content: userMessage },
  ];
 
  const MAX_ITERATIONS = 10;
 
  for (let i = 0; i < MAX_ITERATIONS; i++) {
    const response = await openai.chat.completions.create({
      model: "gpt-4o",
      messages,
      tools,
    });
 
    const choice = response.choices[0];
 
    // If the model wants to call tools
    if (choice.finish_reason === "tool_calls" && choice.message.tool_calls) {
      // Add the assistant's message (with tool calls) to history
      messages.push(choice.message);
 
      // Execute each tool call and add results
      for (const toolCall of choice.message.tool_calls) {
        const result = await executeTool(
          toolCall.function.name,
          JSON.parse(toolCall.function.arguments)
        );
 
        messages.push({
          role: "tool",
          tool_call_id: toolCall.id,
          content: result,
        });
      }
 
      continue; // Let the model process the results
    }
 
    // Model is done -- return the final response
    return choice.message.content || "No response generated.";
  }
 
  return "Agent reached maximum iterations without completing the task.";
}

Tool Calling with Anthropic

typescript
import Anthropic from "@anthropic-ai/sdk";
 
const anthropic = new Anthropic();
 
const anthropicTools: Anthropic.Tool[] = [
  {
    name: "get_weather",
    description: "Get the current weather for a location",
    input_schema: {
      type: "object" as const,
      properties: {
        location: {
          type: "string",
          description: "City name, e.g., 'San Francisco, CA'",
        },
        unit: {
          type: "string",
          enum: ["celsius", "fahrenheit"],
        },
      },
      required: ["location"],
    },
  },
];
 
async function runClaudeAgent(userMessage: string): Promise<string> {
  const messages: Anthropic.MessageParam[] = [
    { role: "user", content: userMessage },
  ];
 
  const MAX_ITERATIONS = 10;
 
  for (let i = 0; i < MAX_ITERATIONS; i++) {
    const response = await anthropic.messages.create({
      model: "claude-sonnet-4-20250514",
      max_tokens: 1024,
      tools: anthropicTools,
      messages,
    });
 
    // Check if the model wants to use a tool
    if (response.stop_reason === "tool_use") {
      const toolUseBlocks = response.content.filter(
        (block) => block.type === "tool_use"
      );
 
      // Add assistant response to messages
      messages.push({ role: "assistant", content: response.content });
 
      // Execute tools and build results
      const toolResults: Anthropic.ToolResultBlockParam[] = [];
      for (const toolUse of toolUseBlocks) {
        if (toolUse.type === "tool_use") {
          const result = await executeTool(
            toolUse.name,
            toolUse.input as Record<string, unknown>
          );
          toolResults.push({
            type: "tool_result",
            tool_use_id: toolUse.id,
            content: result,
          });
        }
      }
 
      messages.push({ role: "user", content: toolResults });
      continue;
    }
 
    // Model is done
    const textBlock = response.content.find((b) => b.type === "text");
    return textBlock?.text || "No response generated.";
  }
 
  return "Agent reached maximum iterations.";
}

Hands-On Implementation

Defining Tools with Zod and the Vercel AI SDK

The Vercel AI SDK provides the cleanest developer experience for building agents. It handles the ReAct loop, tool execution, and streaming automatically:

typescript
import { generateText, tool } from "ai";
import { openai } from "@ai-sdk/openai";
import { z } from "zod";
 
// Define tools with Zod schemas -- type-safe and self-documenting
const weatherTool = tool({
  description: "Get the current weather for a location",
  parameters: z.object({
    location: z.string().describe("City name, e.g., 'San Francisco, CA'"),
    unit: z.enum(["celsius", "fahrenheit"]).default("fahrenheit"),
  }),
  execute: async ({ location, unit }) => {
    // Call your weather API here
    const response = await fetch(
      `https://api.weather.example.com/current?q=${encodeURIComponent(location)}`
    );
    const data = await response.json();
    return {
      location,
      temperature: unit === "celsius" ? data.temp_c : data.temp_f,
      unit,
      conditions: data.conditions,
    };
  },
});
 
const databaseTool = tool({
  description: "Query the product database. Use this to find products, check inventory, or look up prices.",
  parameters: z.object({
    query: z.string().describe("SQL-like search query"),
    category: z.string().optional().describe("Filter by product category"),
    limit: z.number().default(10).describe("Maximum results to return"),
  }),
  execute: async ({ query, category, limit }) => {
    // Your database query logic here
    const results = await searchProducts({ query, category, limit });
    return results;
  },
});
 
const calculatorTool = tool({
  description: "Perform mathematical calculations",
  parameters: z.object({
    expression: z.string().describe("Mathematical expression to evaluate, e.g., '(29.99 * 3) + 5.99'"),
  }),
  execute: async ({ expression }) => {
    // Using Function is intentional here for a sandboxed calculator
    const result = new Function(`return ${expression}`)();
    return { expression, result };
  },
});

Multi-Step Agents with maxSteps

The maxSteps parameter in the Vercel AI SDK enables true agentic behavior -- the model can call tools multiple times in sequence:

typescript
import { generateText } from "ai";
import { openai } from "@ai-sdk/openai";
 
async function runShoppingAgent(userMessage: string) {
  const result = await generateText({
    model: openai("gpt-4o"),
    system: `You are a shopping assistant. Help users find products, compare prices,
and calculate totals. Use the available tools to look up real data.
Never make up prices or availability -- always check the database.`,
    prompt: userMessage,
    tools: {
      searchProducts: databaseTool,
      calculate: calculatorTool,
      getWeather: weatherTool,
    },
    maxSteps: 5, // Allow up to 5 tool-calling rounds
  });
 
  // The result includes the full conversation with all tool calls
  console.log("Final response:", result.text);
  console.log("Steps taken:", result.steps.length);
  console.log("Total tokens:", result.usage.totalTokens);
 
  // You can inspect each step
  for (const step of result.steps) {
    if (step.toolCalls.length > 0) {
      for (const call of step.toolCalls) {
        console.log(`Called ${call.toolName} with:`, call.args);
      }
    }
  }
 
  return result.text;
}
 
// Example: "Find the 3 cheapest widgets and calculate the total with 8% tax"
// Step 1: Model calls searchProducts({ query: "widgets", limit: 3 })
// Step 2: Model calls calculate({ expression: "(9.99 + 14.99 + 19.99) * 1.08" })
// Step 3: Model responds with the formatted answer

Streaming Agent Responses

For real-time UX, stream the agent's reasoning and tool calls:

typescript
import { streamText } from "ai";
import { openai } from "@ai-sdk/openai";
 
async function streamAgent(userMessage: string) {
  const result = streamText({
    model: openai("gpt-4o"),
    system: "You are a helpful assistant with access to tools.",
    prompt: userMessage,
    tools: {
      searchProducts: databaseTool,
      calculate: calculatorTool,
    },
    maxSteps: 5,
    onStepFinish: (step) => {
      // Called after each reasoning + tool call step
      if (step.toolCalls.length > 0) {
        console.log("Tool calls in this step:", step.toolCalls.map((tc) => tc.toolName));
      }
    },
  });
 
  // Stream text chunks to the client
  for await (const chunk of result.textStream) {
    process.stdout.write(chunk);
  }
 
  console.log("\nDone. Total usage:", (await result.usage).totalTokens);
}

Error Handling in Tool Calls

Tools fail. APIs go down, databases time out, users provide invalid inputs. Robust error handling is non-negotiable:

typescript
import { tool } from "ai";
import { z } from "zod";
 
const robustDatabaseTool = tool({
  description: "Search the product database",
  parameters: z.object({
    query: z.string(),
    limit: z.number().default(10),
  }),
  execute: async ({ query, limit }) => {
    try {
      const results = await searchProducts({ query, limit });
 
      if (results.length === 0) {
        return {
          success: true,
          results: [],
          message: `No products found matching "${query}". Try a broader search term.`,
        };
      }
 
      return { success: true, results };
    } catch (error) {
      // Return error information to the model so it can reason about it
      // Do NOT throw -- that crashes the agent loop
      return {
        success: false,
        error: error instanceof Error ? error.message : "Unknown error",
        suggestion: "The database may be temporarily unavailable. Try again or ask the user to retry later.",
      };
    }
  },
});
 
// Wrapper for adding timeout and retry to any tool
function withTimeout<T>(
  fn: () => Promise<T>,
  timeoutMs: number = 10000
): Promise<T> {
  return Promise.race([
    fn(),
    new Promise<never>((_, reject) =>
      setTimeout(() => reject(new Error("Tool execution timed out")), timeoutMs)
    ),
  ]);
}

Agent Memory Patterns

For multi-turn conversations, agents need memory. Here are three patterns:

typescript
// Pattern 1: Full conversation history (simple, but context grows unbounded)
interface ConversationMemory {
  messages: Array<{ role: string; content: string }>;
}
 
// Pattern 2: Sliding window (keep last N messages)
function slidingWindowMemory(
  messages: Array<{ role: string; content: string }>,
  maxMessages: number = 20
): Array<{ role: string; content: string }> {
  if (messages.length <= maxMessages) return messages;
 
  // Always keep the system message
  const systemMessages = messages.filter((m) => m.role === "system");
  const otherMessages = messages.filter((m) => m.role !== "system");
 
  return [...systemMessages, ...otherMessages.slice(-maxMessages)];
}
 
// Pattern 3: Summary memory (compress old messages into a summary)
async function summaryMemory(
  messages: Array<{ role: string; content: string }>,
  maxMessages: number = 10
): Promise<Array<{ role: string; content: string }>> {
  if (messages.length <= maxMessages) return messages;
 
  const systemMessages = messages.filter((m) => m.role === "system");
  const otherMessages = messages.filter((m) => m.role !== "system");
 
  // Summarize older messages
  const oldMessages = otherMessages.slice(0, -maxMessages);
  const recentMessages = otherMessages.slice(-maxMessages);
 
  const openai = new OpenAI();
  const summaryResponse = await openai.chat.completions.create({
    model: "gpt-4o-mini",
    messages: [
      {
        role: "system",
        content: "Summarize this conversation in 2-3 sentences, preserving key facts and decisions.",
      },
      {
        role: "user",
        content: oldMessages.map((m) => `${m.role}: ${m.content}`).join("\n"),
      },
    ],
    temperature: 0,
    max_tokens: 256,
  });
 
  const summary = summaryResponse.choices[0].message.content;
 
  return [
    ...systemMessages,
    {
      role: "system",
      content: `Previous conversation summary: ${summary}`,
    },
    ...recentMessages,
  ];
}

Best Practices

  1. Give tools descriptive names and descriptions -- The model uses the description to decide when to call a tool. Vague descriptions lead to wrong tool selections.
  2. Return structured data from tools -- Return JSON objects, not formatted strings. Let the model format the output for the user.
  3. Always set maxSteps/max iterations -- Without a cap, a confused agent can loop forever, burning through your API budget.
  4. Log every tool call -- Store the tool name, arguments, result, and latency. This is your primary debugging interface.
  5. Keep tools focused -- One tool should do one thing. A search_and_update_and_notify tool is too complex for the model to use reliably.
  6. Test tool definitions independently -- Before wiring a tool into an agent, verify it works correctly with unit tests.

Common Pitfalls

  • Too many tools: Models degrade in tool selection accuracy beyond 10-15 tools. Group related operations or use a tool router.
  • Missing error handling: If a tool throws an exception, the entire agent loop crashes. Always catch errors and return them as structured data.
  • No stopping condition: Without maxSteps or iteration limits, agents can enter infinite loops of tool calls that accomplish nothing.
  • Over-trusting the agent: Agents will confidently call tools with wrong arguments. Validate all tool inputs with Zod and add guardrails for destructive operations.
  • Using agents when a pipeline suffices: If the steps are always the same (fetch data, transform, generate), use a deterministic pipeline. Agents add value when the steps are dynamic.

What's Next

You have now completed the AI Automation Engineer Roadmap. You understand how LLMs work (Part 1), how to engineer context for reliable outputs (Part 2), how to build RAG pipelines for domain-specific knowledge (Part 3), how to choose and configure vector databases (Part 4), and now how to build autonomous agents with tool calling.

The next step is to build something real. Pick a workflow you do manually -- customer support triage, document processing, data pipeline monitoring -- and automate it with the patterns from this series. Start with a simple pipeline, add RAG for context, and graduate to agents only when the task requires dynamic decision-making.

FAQ

What is tool calling in AI agents?

Tool calling allows LLMs to invoke predefined functions with structured parameters. The model decides which tool to call and with what arguments, your code executes it, and the result is fed back to the model so it can continue reasoning or respond. The model never executes code directly -- it outputs a structured request that your application fulfills.

How does the Vercel AI SDK handle tool calling?

Vercel AI SDK provides a streamlined API for defining tools with Zod schemas, handling tool execution in agentic loops, and streaming results. It supports multi-step tool use via the maxSteps parameter and parallel tool calls across multiple LLM providers. The tool() function combines schema definition and execution logic in a single, type-safe interface.

What is the difference between function calling and tool calling?

They are functionally the same concept. OpenAI originally called it "function calling," then renamed it to "tool calling." Both refer to the LLM's ability to output structured calls to external functions. Anthropic uses "tool use" as their terminology. Regardless of the name, the pattern is identical: define available functions, let the model choose which to call, execute them, and return results.

Collaboration

Need help with a project?

Let's Build It

I help startups and established companies design, build, and scale world-class digital products. From deep technical architecture to pixel-perfect UI — let's bring your vision to life.

SH

Article Author

Sadam Hussain

Senior Full Stack Developer

Senior Full Stack Developer with over 7 years of experience building React, Next.js, Node.js, TypeScript, and AI-powered web platforms.

Related Articles

AI Evaluation for Production Workflows
Mar 21, 20266 min read
AI
Evaluation
LLMOps

AI Evaluation for Production Workflows

Learn how to evaluate AI workflows in production using task-based metrics, human review, regression checks, and business-aligned quality thresholds.

How to Build an AI Workflow in a Production SaaS App
Mar 21, 20267 min read
AI
SaaS
Workflows

How to Build an AI Workflow in a Production SaaS App

A practical guide to designing and shipping AI workflows inside a production SaaS app, with orchestration, fallback logic, evaluation, and user trust considerations.

Building AI Features Safely: Guardrails, Fallbacks, and Human Review
Mar 21, 20266 min read
AI
LLM
Guardrails

Building AI Features Safely: Guardrails, Fallbacks, and Human Review

A production guide to shipping AI features safely with guardrails, confidence thresholds, fallback paths, auditability, and human-in-the-loop review.