github.com
Releases · PrefectHQ/fastmcp - GitHub
Excerpt
## Code Mode Standard MCP has two scaling problems. The entire tool catalog loads into context upfront — with a large server, that's tens of thousands of tokens before the LLM reads a single word of the user's request. And every tool call is a round-trip: the LLM calls a tool, the result flows back through the context window, the LLM reasons about it, calls another tool, and so on. Intermediate results that only exist to feed the next step still burn tokens every time. … ## Under the Hood Heavy imports are now lazy-loaded, meaningfully reducing startup time for servers that don't use every feature. `fastmcp run` and `dev inspector` gain a `-m`/`--module` flag for module-style invocation, `MCPConfigTransport` now correctly persists sessions across tool calls, and `search_result_serializer` gives you a hook to customize how search results are serialized for markdown output. Eight new contributors, and the usual round of fixes.
Related Pain Points
Schema Overhead Consumes 16-50% of Context Window
9Full tool schemas load into context on every request with no lazy loading, selective injection, or summarization. This causes context window exhaustion before meaningful work begins, with confirmed instances ranging from 45K tokens for a single tool to 1.17M tokens in production deployments.
Inefficient round-trip tool calling with intermediate result token waste
6Every tool call requires a round-trip cycle: LLM calls tool, result flows back through context, LLM reasons, calls next tool. Intermediate results that only feed the next step burn tokens repeatedly, reducing efficiency in multi-step workflows.