MCP server design patterns: how to design a robust Model Context Protocol
Tool naming, response format, error handling, idempotency and rate limiting — concrete patterns for an MCP server used in production by AI agents.
MCP server design patterns: how to design a Model Context Protocol that an AI agent uses correctly
Model Context Protocol (MCP) has become, in less than a year, the lingua franca for how an AI agent gets access to tools — databases, APIs, file systems, proprietary systems. The spec is simple, the implementation is deceptively trivial, but the difference between an MCP server an agent uses well and one it uses poorly is enormous. This article presents the patterns we apply internally, distilled from the mistakes we made and from daily observation of how a language model interprets a tool’s schema.
TL;DR
- A tool’s name is not a label, it is an instruction. A bad name doubles the rate of incorrect calls.
- An MCP tool response must be written to be read by a language model, not by a programmer: structured, but with minimal narrative context.
- Idempotency is not optional. An agent will accidentally call the same tool twice and nothing bad must happen.
- Errors must be actionable. “Permission denied” is insufficient; “Permission denied: tool X requires scope Y, currently authenticated as Z” is correct.
- Rate limiting is not a later option. An agent stuck in a loop can consume 10,000 invocations in minutes.
What “a good MCP server” means
An MCP server exposes a set of tools to an AI agent. Server quality is measured in three dimensions: correct call rate (the agent calls the right tool for what the user wants), response usage rate (the agent actually uses the returned information), error recovery rate (the agent realizes when something failed and what to do).
These three rates are not measured in unit testing. They are measured by observing the agent’s conversation logs in production. The patterns described below are distilled from such observations, not from theoretical specifications.
Pattern 1: Tool naming — verb-object with context
The tool name is the first information the language model sees. Tool selection happens based on description and name, in that order. A good name reduces dependence on description.
Anti-pattern. get_data, query, execute, run. The model cannot distinguish between two tools with generic names; it will pick at random based on order or description length.
Pattern. Verb-object with concrete context. Good examples: search_invoices_by_supplier, get_server_status, create_dns_record, list_failed_backups_last_24h. The name is specific enough that the model does not need to read the description to know when it applies.
Practical recommendation: an MCP server should have at most 15-20 tools. Above this, the model starts confusing similar tools. If you have 40 tools, either group them into multiple MCP servers or redesign more general tools with parameters.
Pattern 2: Parameter schema — strict, but with defaults
An MCP tool has a JSON schema for parameters. For a language model, the schema is both documentation and constraint. Two rules:
Use enums where values are limited. priority: "low" | "medium" | "high" is infinitely better than priority: string. The model will not invent a value outside the enum. If you leave a free string, “critical” or “urgent” or “p0” will appear — all valid for a human, all invalid for your code.
Explicit defaults. Any parameter with a reasonable default value should have default in the schema. The model will omit the parameter, which is correct. Without defaults, the model invents values or fails silently.
Example of a good schema for search_invoices:
{
"supplier_name": { "type": "string", "description": "Supplier name. Accepts case-insensitive partial matches." },
"date_from": { "type": "string", "format": "date", "description": "Minimum date (inclusive). ISO 8601 format: 2026-01-01." },
"date_to": { "type": "string", "format": "date", "default": "today" },
"status": { "type": "string", "enum": ["paid", "pending", "overdue", "cancelled"], "default": "pending" },
"limit": { "type": "integer", "minimum": 1, "maximum": 100, "default": 20 }
}
Notice: the description says not just what the parameter does, but how it behaves (case-insensitive partial matches). The model will use these details.
Pattern 3: Response format — structure plus narrative
A language model reads the tool response and then uses it to reply to the user or decide the next step. The response must satisfy both purposes.
Anti-pattern. Return a huge JSON blob. The model will parse it, but consume many tokens, lose context and possibly miss important information.
Opposite anti-pattern. Return only narrative text. The model cannot access the structure, cannot filter, cannot cite.
Pattern. Return a short text that summarizes, followed by structured data with at most 50-100 relevant rows. Include pagination metadata:
Found 47 invoices matching "ACME". Showing 20 most recent.
Results:
- INV-2026-0451 | ACME Corp | 2026-04-28 | 12,450.00 EUR | pending
- INV-2026-0438 | ACME Industries | 2026-04-25 | 3,200.00 EUR | paid
- ...
Pagination: page 1 of 3. Use offset=20 for next page.
Filters applied: supplier_name="ACME", status=any, date_from=null.
The model reads the summary, accesses the data, and knows how to continue if needed.
Pattern 4: Idempotency — no surprises on retry
An AI agent will accidentally call the same tool twice. Causes: timeout followed by retry, ambiguity in plan, model “forgetting” it just performed the action. For read-only tools (search, get, list), idempotency is free. For write tools, it is mandatory.
Pattern. Add an optional idempotency_key parameter. If the agent sends it and a call with the same key was executed in the last N minutes, return the previous result without re-executing the action. This pattern is standard in Stripe payments and transfers 1:1.
create_dns_record(domain, type, value, idempotency_key="iris-plan-9472-step-3")
The second time the agent calls with the same key, it gets the same response without side effects.
For tools that cannot support idempotency (e.g., “send email”), make this fact explicit in the description: “NOT idempotent. Each call sends a new email.” The model will understand and use the tool with more care.
Pattern 5: Error handling — actionable errors
An error returned to an AI agent is not for a developer who will read a stack trace. It is for a language model that will decide what to do next.
Anti-pattern. {"error": "permission denied"}. The model does not know which permission is missing, does not know whether to retry, does not know what to communicate to the user.
Pattern. Structured error with code, human message, and recovery suggestion:
{
"error": {
"code": "INSUFFICIENT_SCOPE",
"message": "Tool 'create_dns_record' requires scope 'dns:write', but the current session has scopes ['dns:read', 'inventory:read'].",
"recovery": "Ask the user to re-authenticate with the missing scope, or use 'list_dns_records' to inspect existing entries."
}
}
The model will read recovery and either communicate to the user what needs to be done, or call another tool.
Error codes should be a small, stable enum: INSUFFICIENT_SCOPE, RATE_LIMITED, RESOURCE_NOT_FOUND, INVALID_PARAMETER, UPSTREAM_UNAVAILABLE, CONFLICT. The model learns over time to respond differently to each code.
Pattern 6: Rate limiting — protection against loops
An AI agent stuck in a loop can rapidly consume resources. Seen in production: an agent that, blocked by an ambiguous error, called the same tool 200 times in 3 minutes, each call hitting an external database with cost-per-query.
Pattern. Rate limiting per agent session, not per IP or per user. An agent session is the correct scope: an agent has a task, it has a budget of invocations for that task. Above the limit, respond with RATE_LIMITED error and a retry_after field.
Typical budget for a session: 50-100 invocations per tool per hour. For expensive tools (external LLM call, slow DB query), 10-20. The model will see the error, understand it must change approach, and not spam.
Bonus: include in each tool’s response an optional quota_remaining field. The model will use this information to prioritize its actions.
Pattern 7: Versioning and backward compatibility
An MCP server that evolves will face the classic problem: clients (agents) using an old schema.
Pattern. Never change the semantics of an existing tool without a new name. If search_invoices changes to accept a new parameter, that is OK (with default). If it changes such that old parameters are interpreted differently, create search_invoices_v2 and keep search_invoices deprecated with a warning in the response.
Language models base their behavior on descriptions they have “learned”. Silent changes break agents already in production.
Pattern 8: Telemetry and debugging
For each tool call, log: timestamp, tool name, parameters, duration, result (success / error code), agent session ID, plan identifier (if the agent works with propose-then-act).
These logs are the only source of truth for improving the server. You will observe:
- Tools never called (sign that the description is poor or the tool is useless)
- Tools called with useless parameters (sign the description misleads)
- Tools that fail often with the same code (sign that preconditions are not explicit)
Minimal dashboard: call rate per tool, error rate per tool and code, p50/p95 duration per tool, sessions with more than 30 invocations (suspected loop).
Three mistakes we made and corrected
Mistake 1: the execute_query tool with SQL string as parameter. The model generated queries that worked on its mental model but not on our database (subtle syntax). Solution: specialized tools (get_user_by_email, count_orders_in_range) with strict schema.
Mistake 2: the update_config tool that accepted any key in the config. The model updated fields it should not have touched. Solution: dedicated tools per critical field (update_dns_ttl, update_email_recipient) with validation on accepted values.
Mistake 3: descriptions that assumed the model knew our internal context. “Update the alert” — which alert? Solution: self-sufficient descriptions, with terms explained in context.
Conclusion
A good MCP server is not a thin wrapper over the internal API. It is a surface designed specifically for consumption by a language model, with naming, schema, response format and error handling adapted to how a model decides. The investment in design pays off in the quality of the agent that uses it — and in quiet nights in which you do not run logs trying to figure out why an agent called the wrong tool 50 times.
Related articles
- The propose-then-act architecture for AI agents
- Pillar IRIS — the CAI Technology orchestrator agent
- Pillar Consulting — AI assessment and design
External sources
- Model Context Protocol — official specification — full reference for transport, schema and MCP semantics
- Anthropic Tool Use documentation — design guide for tool definition in Claude
- Stripe Idempotency — best practices — the canonical model of idempotency keys
- JSON Schema specification (2020-12) — reference for strict schemas with enums and defaults
- OWASP Top 10 for LLM Applications — risks and mitigations at the tools level
Next step
If your team is building or evaluating an MCP server for an AI agent and you would like a technical opinion on schema and response format, we offer a 30-minute consultation at no cost.