Your MCP Server Works. Your Agent Doesn't.
MCP exploded a year ago. Everyone rushed to build servers. The hype was real.
Most MCP servers disappoint.
Developers blame the protocol. On social media it feels like MCP is dying. But enterprise adoption tells a different story. Companies are deploying. Integrations are live. Results are still disappointing.
Why?
Developers treat MCP like a REST API wrapper. Block's team learned this building 60+ servers. Philipp Schmid said it plainly: MCP is a user interface for agents, not developers. Different users. Different design principles.
The protocol works fine. The servers don't.
Here's what goes wrong and how to fix it.
The wrong mental model
When most developers build an MCP server, they open their API docs and start mapping endpoints to tools.
GET /users becomes get_user(). POST /orders becomes create_order(). Clean, logical, familiar.
The problem is that API design is built for human developers.
| Developers | Agents | |
|---|---|---|
| Discovery | Cheap — read docs once | Expensive — schema loads on every request |
| Composability | Mix and match small endpoints | Multi-step tool calls, slow iteration |
| Flexibility | More options, more flexibility | Complexity leads to hallucination |
A developer reads the docs once and moves on. An LLM pays that cost on every single request. Every tool description loaded into the context window. Every intermediate result stored in conversation history.
You are not building an API. You are building a UI for a non-human user. Design it like one.
Mistake 1: Three tools where one should exist
Here is the most common pattern.
A developer builds an MCP server for order tracking. They expose:
get_user_by_email(email)list_orders(user_id)get_order_status(order_id)
Logical. Composable. Exactly how a REST API should work.
Now the agent has to answer: "what's the status of this customer's latest order?"
Three tool calls. Three round-trips. Three sets of results sitting in the context window. The agent has to orchestrate the whole sequence — more tokens, more latency, more surface area for something to go wrong.
The fix is one tool: track_latest_order(email). It calls all three internally and returns a single clean answer: "Order #12345 shipped via FedEx, arriving Thursday."
Same outcome. One call.
Do the orchestration in your code. Not in the LLM's context window. When you find yourself building tools that agents will always call in sequence, merge them. The composability that makes APIs elegant makes MCP servers fragile.
Mistake 2: Dumping raw API responses
The API returns a 47-field JSON object. So the tool returns a 47-field JSON object.
That object now lives in the context window for the rest of the conversation.
Every field the agent does not need is dead weight competing for attention. In long agentic workflows this compounds fast. You are not just wasting tokens on one call. You are degrading every subsequent call in the session.
Philschmid's Gmail example makes this concrete. Before:
def messages_get(message_id, format) -> {"id", "snippet", "payload": {"headers", "body": {"data"}}}
The agent has to parse nested payload objects and decode base64 content just to read an email.
After:
def gmail_read(message_id) -> {"subject", "sender", "body", "attachments"}
Same data. Human-readable. No parsing overhead.
Return exactly what the agent needs to complete the next step. Nothing more. Curating output is not an optimization. It is the job.
Mistake 3: Generic tool names
Your MCP server does not run in isolation. It runs alongside others.
If GitHub and Jira both expose create_issue, the agent guesses which one to call. Sometimes it guesses right.
Block's pattern from 60+ servers: use service-prefixed, action-oriented names.
{service}_{action}_{resource}
So create_issue becomes linear_create_issue. send_message becomes slack_send_message. get_error_details becomes sentry_get_error_details.
The agent finds the right tool faster. Fewer wrong calls. Less wasted context.
Mistake 4: Treating docstrings as formalities
Tool names and descriptions are not documentation. They are prompts.
The model reads your docstring to decide when to call the tool, what to pass in, and what to expect back. A vague docstring is a vague instruction. The model guesses.
Block put it directly: tool names, descriptions, and parameters are treated as prompts for the LLM. Getting them right matters as much as the tool logic itself.
Write them like instructions. Specify:
- when to use this tool, not just what it does
- the expected format of each argument
- what a successful response looks like
- what to try if it fails
That last one matters more than most people realize.
Mistake 5: Throwing exceptions instead of returning context
When a tool fails, most implementations throw an error and let it propagate.
The agent has no idea what went wrong or what to try next.
The fix: return a helpful string instead.
"User not found. Please try searching by email address instead."
That is not an error message. That is an instruction. The agent reads it as an observation and uses it to self-correct on the next turn.
Philschmid's rule: do not throw a Python exception. Return context. The agent sees the error and recovers. Without it, the agent hallucinates a recovery path or stops entirely.
Your error handling is part of your UX.
Mistake 6: Exposing everything your system can do
More tools feels like more capability. It is not.
Every tool you expose adds to the description overhead that loads on every request. More tools means more decisions for the model. More decisions means higher chance of picking the wrong one.
Block and Philschmid land on the same number independently: 5 to 15 tools per server. One server, one job.
If your server handles CRM data, support tickets, billing records, and internal notes, you have built four servers sharing a process. Split them. Scope them. Delete the tools nobody is calling.
LLMs are still weak at planning over many steps. Block's team flags this explicitly. Design your server to require less chaining, not more. Fewer well-scoped tools outperform many poorly-scoped ones every time.
The pattern underneath all of these mistakes
Every mistake here comes from the same place: designing for the system, not for the agent.
REST API principles: composability, flexibility, discoverability. These work for human developers. They do not work for AI agents.
MCP is a UI for a non-human user. Same product thinking, different user.
Merge the tool calls. Strip the responses. Name for discovery. Write docstrings like instructions. Return context on failure. Keep the surface area small.
Build it like the user is an agent. Because it is.
This post was inspired by Philipp Schmid's MCP best practices and Block's playbook for designing MCP servers. Both are worth reading in full.