AI Agent Tool Use Architecture: Function Calling, ReAct Loops & Structured Outputs
AI Agent Tool Use Architecture#
Large language models become dramatically more useful when they can take actions — reading files, querying databases, calling APIs. Tool use (also called function calling) is the mechanism that turns a chatbot into an agent.
The Core Idea#
Without tools, an LLM can only generate text. With tools, it can:
User: "What's the weather in Tokyo?"
Without tools → "I don't have real-time data..."
With tools → calls get_weather("Tokyo") → "It's 22°C and sunny in Tokyo."
The model doesn't execute the tool itself. It requests a tool call, the runtime executes it, and the result is fed back into the conversation.
Tool Definitions#
Every tool use system starts with tool definitions — structured descriptions the model uses to decide when and how to call a tool.
{
"name": "search_database",
"description": "Search the product database by query string. Returns top 10 matches.",
"parameters": {
"type": "object",
"properties": {
"query": {
"type": "string",
"description": "The search query"
},
"category": {
"type": "string",
"enum": ["electronics", "clothing", "books"],
"description": "Optional category filter"
}
},
"required": ["query"]
}
}
Key principles for good tool definitions:
- Clear descriptions — the model reads these to decide when to use the tool
- Precise parameter types — enums, required fields, format hints
- Scoped responsibility — one tool does one thing well
- Example values in descriptions help the model generate correct inputs
The ReAct Loop#
ReAct (Reasoning + Acting) is the dominant pattern for agentic tool use. The model alternates between thinking and acting:
Step 1: THINK → "I need to find the user's order status. I'll search by email."
Step 2: ACT → call search_orders(email="user@example.com")
Step 3: OBSERVE → [Order #1234, shipped, tracking: XYZ789]
Step 4: THINK → "Found the order. Now I need tracking details."
Step 5: ACT → call get_tracking("XYZ789")
Step 6: OBSERVE → [In transit, arriving March 30]
Step 7: RESPOND → "Your order #1234 is in transit, arriving March 30."
Each iteration the model sees the full history of thoughts, actions, and observations. This lets it plan multi-step workflows dynamically.
Tool Selection Strategies#
When an agent has dozens of tools available, selection becomes critical:
Relevance Filtering#
# Pre-filter tools based on the query before sending to the model
relevant_tools = [t for t in all_tools if is_relevant(t, user_query)]
# Send only relevant tools to reduce confusion and token cost
Hierarchical Tool Organization#
Top-level tools:
- database_tools → search, insert, update, delete
- communication → send_email, send_slack, create_ticket
- file_operations → read_file, write_file, list_directory
The model first selects a category, then gets specific tools within it.
Tool Selection via Embeddings#
For large tool inventories (100+), embed tool descriptions and retrieve the top-k most similar tools based on the user query. This scales better than sending all definitions.
Claude Tool Use API#
Claude's tool use follows a structured message flow:
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
tools=[{
"name": "get_weather",
"description": "Get current weather for a location",
"input_schema": {
"type": "object",
"properties": {
"location": {"type": "string"}
},
"required": ["location"]
}
}],
messages=[{"role": "user", "content": "Weather in Paris?"}]
)
# Response contains tool_use content block
# Execute the tool, then send result back
Claude returns a tool_use block with the tool name and input. You execute it and send back a tool_result message. The model then generates its final response.
GPT Function Calling API#
OpenAI's approach uses a similar pattern with slightly different structure:
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Weather in Paris?"}],
tools=[{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get current weather for a location",
"parameters": {
"type": "object",
"properties": {
"location": {"type": "string"}
},
"required": ["location"]
}
}
}]
)
The model returns tool_calls in the response message. You execute them and append tool role messages with results.
Parallel Tool Calls#
Both Claude and GPT support parallel tool calls — requesting multiple tool executions in a single response:
User: "Compare weather in Tokyo and London"
Model response (single turn):
→ tool_call_1: get_weather("Tokyo")
→ tool_call_2: get_weather("London")
This is faster than sequential calls because both execute concurrently. The runtime collects all results and sends them back together.
When parallel calls help:
- Independent data fetches (weather in two cities)
- Gathering context from multiple sources simultaneously
- Batch operations where order doesn't matter
When to force sequential:
- Second call depends on first call's result
- Write operations that must happen in order
- Transactions requiring consistency
Structured Outputs#
Tool use naturally produces structured data, but you can also force structured outputs for the model's final response:
{
"type": "json_schema",
"json_schema": {
"name": "analysis_result",
"schema": {
"type": "object",
"properties": {
"sentiment": {"type": "string", "enum": ["positive", "negative", "neutral"]},
"confidence": {"type": "number"},
"key_topics": {"type": "array", "items": {"type": "string"}}
},
"required": ["sentiment", "confidence", "key_topics"]
}
}
}
Structured outputs guarantee the response matches your schema — no parsing regex needed.
Error Recovery#
Tools fail. Networks time out. APIs return errors. Robust agents handle this gracefully:
Retry with Backoff#
def execute_tool_with_retry(tool_call, max_retries=3):
for attempt in range(max_retries):
try:
return execute_tool(tool_call)
except ToolError as e:
if attempt == max_retries - 1:
return {"error": str(e)}
time.sleep(2 ** attempt)
Fallback Tools#
Primary: search_vector_db(query)
↓ fails
Fallback: search_keyword_db(query)
↓ fails
Final: return "I couldn't find relevant results"
Error Context for the Model#
When a tool fails, return the error to the model. Good agents adapt:
Tool result: {"error": "Rate limited. Retry after 30 seconds."}
Model thinks: "I'll try a different approach — let me use the cached data tool instead."
Security Considerations#
Tool use introduces real-world side effects. Guard against:
- Prompt injection — malicious inputs that trick the model into calling dangerous tools
- Over-permissioning — give tools minimum necessary permissions
- Confirmation gates — require human approval for destructive operations (delete, send email)
- Input validation — validate tool inputs before execution, not just after
- Rate limiting — cap tool calls per conversation to prevent runaway loops
Architecture Pattern: Tool Use Runtime#
┌─────────────┐ ┌──────────┐ ┌───────────┐
│ User │────▶│ Agent │────▶│ LLM API │
│ Input │ │ Runtime │◀────│ (Claude) │
└─────────────┘ │ │ └───────────┘
│ ┌───────┴──────┐
│ │ Tool Registry │
│ │ - search_db │
│ │ - send_email │
│ │ - read_file │
│ └───────┬──────┘
│ │
│ ┌───────▼──────┐
└──│ Tool Results │
└──────────────┘
The runtime orchestrates the loop: send messages to the LLM, parse tool calls, execute tools, feed results back, repeat until the model generates a final response.
Key Takeaways#
- Tool definitions are the interface — invest in clear descriptions and precise schemas
- ReAct loops let models reason about multi-step problems dynamically
- Parallel tool calls reduce latency for independent operations
- Structured outputs eliminate parsing headaches
- Error recovery is essential — tools fail, good agents adapt
- Security is non-negotiable when tools have real-world side effects
331 articles on software engineering at codelit.io/blog.
Try it on Codelit
Chaos Mode
Simulate node failures and watch cascading impact across your architecture
Related articles
Try these templates
Netflix Video Streaming Architecture
Global video streaming platform with adaptive bitrate, CDN distribution, and recommendation engine.
10 componentsSearch Engine Architecture
Web-scale search with crawling, indexing, ranking, and sub-second query serving.
8 componentsGoogle Search Engine Architecture
Web-scale search with crawling, indexing, PageRank, query processing, ads, and knowledge graph.
10 componentsBuild this architecture
Generate an interactive AI Agent Tool Use Architecture in seconds.
Try it in Codelit →
Comments