www.anthropic.com
Code execution with MCP: building more efficient AI agents - Anthropic
Excerpt
Since launching MCP in November 2024, adoption has been rapid: the community has built thousands of MCP servers, SDKs are available for all major programming languages, and the industry has adopted MCP as the de-facto standard for connecting agents to tools and data. Today developers routinely build agents with access to hundreds or thousands of tools across dozens of MCP servers. However, as the number of connected tools grows, loading all tool definitions upfront and passing intermediate results through the context window slows down agents and increases costs. … ## Excessive token consumption from tools makes agents less efficient As MCP usage scales, there are two common patterns that can increase agent cost and latency: 1. Tool definitions overload the context window; 2. Intermediate tool results consume additional tokens. ### 1. Tool definitions overload the context window Most MCP clients load all tool definitions upfront directly into context, exposing them to the model using a direct tool-calling syntax. These tool definitions might look like: … Tool descriptions occupy more context window space, increasing response time and costs. In cases where agents are connected to thousands of tools, they’ll need to process hundreds of thousands of tokens before reading a request. ### 2. Intermediate tool results consume additional tokens Most MCP clients allow models to directly call MCP tools. For example, you might ask your agent: "Download my meeting transcript from Google Drive and attach it to the Salesforce lead." The model will make calls like: … Every intermediate result must pass through the model. In this example, the full call transcript flows through twice. For a 2-hour sales meeting, that could mean processing an additional 50,000 tokens. Even larger documents may exceed context window limits, breaking the workflow. With large documents or complex data structures, models may be more likely to make mistakes when copying data between tool calls.
Related Pain Points
Schema Overhead Consumes 16-50% of Context Window
9Full tool schemas load into context on every request with no lazy loading, selective injection, or summarization. This causes context window exhaustion before meaningful work begins, with confirmed instances ranging from 45K tokens for a single tool to 1.17M tokens in production deployments.
Context Bloat from Excessive MCP Search Results
6MCP servers can flood conversations with excessive information from searches and operations, quickly exhausting token limits and making conversations unwieldy.