CAI Technology
Menu ☰
iris · · 13 min read

MCP server design patterns: how to design a robust Model Context Protocol

Tool naming, response format, error handling, idempotency and rate limiting — concrete patterns for an MCP server used in production by AI agents.

CAI Technology · Last reviewed: 4/30/2026
MCP server design patterns: how to design a robust Model Context Protocol

MCP server design patterns: how to design a Model Context Protocol that an AI agent uses correctly

Model Context Protocol (MCP) has become, in less than a year, the lingua franca for how an AI agent gets access to tools — databases, APIs, file systems, proprietary systems. The spec is simple, the implementation is deceptively trivial, but the difference between an MCP server an agent uses well and one it uses poorly is enormous. This article presents the patterns we apply internally, distilled from the mistakes we made and from daily observation of how a language model interprets a tool’s schema.

TL;DR

What “a good MCP server” means

An MCP server exposes a set of tools to an AI agent. Server quality is measured in three dimensions: correct call rate (the agent calls the right tool for what the user wants), response usage rate (the agent actually uses the returned information), error recovery rate (the agent realizes when something failed and what to do).

These three rates are not measured in unit testing. They are measured by observing the agent’s conversation logs in production. The patterns described below are distilled from such observations, not from theoretical specifications.

Pattern 1: Tool naming — verb-object with context

The tool name is the first information the language model sees. Tool selection happens based on description and name, in that order. A good name reduces dependence on description.

Anti-pattern. get_data, query, execute, run. The model cannot distinguish between two tools with generic names; it will pick at random based on order or description length.

Pattern. Verb-object with concrete context. Good examples: search_invoices_by_supplier, get_server_status, create_dns_record, list_failed_backups_last_24h. The name is specific enough that the model does not need to read the description to know when it applies.

Practical recommendation: an MCP server should have at most 15-20 tools. Above this, the model starts confusing similar tools. If you have 40 tools, either group them into multiple MCP servers or redesign more general tools with parameters.

Pattern 2: Parameter schema — strict, but with defaults

An MCP tool has a JSON schema for parameters. For a language model, the schema is both documentation and constraint. Two rules:

Use enums where values are limited. priority: "low" | "medium" | "high" is infinitely better than priority: string. The model will not invent a value outside the enum. If you leave a free string, “critical” or “urgent” or “p0” will appear — all valid for a human, all invalid for your code.

Explicit defaults. Any parameter with a reasonable default value should have default in the schema. The model will omit the parameter, which is correct. Without defaults, the model invents values or fails silently.

Example of a good schema for search_invoices:

{
  "supplier_name": { "type": "string", "description": "Supplier name. Accepts case-insensitive partial matches." },
  "date_from": { "type": "string", "format": "date", "description": "Minimum date (inclusive). ISO 8601 format: 2026-01-01." },
  "date_to": { "type": "string", "format": "date", "default": "today" },
  "status": { "type": "string", "enum": ["paid", "pending", "overdue", "cancelled"], "default": "pending" },
  "limit": { "type": "integer", "minimum": 1, "maximum": 100, "default": 20 }
}

Notice: the description says not just what the parameter does, but how it behaves (case-insensitive partial matches). The model will use these details.

Pattern 3: Response format — structure plus narrative

A language model reads the tool response and then uses it to reply to the user or decide the next step. The response must satisfy both purposes.

Anti-pattern. Return a huge JSON blob. The model will parse it, but consume many tokens, lose context and possibly miss important information.

Opposite anti-pattern. Return only narrative text. The model cannot access the structure, cannot filter, cannot cite.

Pattern. Return a short text that summarizes, followed by structured data with at most 50-100 relevant rows. Include pagination metadata:

Found 47 invoices matching "ACME". Showing 20 most recent.

Results:
- INV-2026-0451 | ACME Corp | 2026-04-28 | 12,450.00 EUR | pending
- INV-2026-0438 | ACME Industries | 2026-04-25 | 3,200.00 EUR | paid
- ...

Pagination: page 1 of 3. Use offset=20 for next page.
Filters applied: supplier_name="ACME", status=any, date_from=null.

The model reads the summary, accesses the data, and knows how to continue if needed.

Pattern 4: Idempotency — no surprises on retry

An AI agent will accidentally call the same tool twice. Causes: timeout followed by retry, ambiguity in plan, model “forgetting” it just performed the action. For read-only tools (search, get, list), idempotency is free. For write tools, it is mandatory.

Pattern. Add an optional idempotency_key parameter. If the agent sends it and a call with the same key was executed in the last N minutes, return the previous result without re-executing the action. This pattern is standard in Stripe payments and transfers 1:1.

create_dns_record(domain, type, value, idempotency_key="iris-plan-9472-step-3")

The second time the agent calls with the same key, it gets the same response without side effects.

For tools that cannot support idempotency (e.g., “send email”), make this fact explicit in the description: “NOT idempotent. Each call sends a new email.” The model will understand and use the tool with more care.

Pattern 5: Error handling — actionable errors

An error returned to an AI agent is not for a developer who will read a stack trace. It is for a language model that will decide what to do next.

Anti-pattern. {"error": "permission denied"}. The model does not know which permission is missing, does not know whether to retry, does not know what to communicate to the user.

Pattern. Structured error with code, human message, and recovery suggestion:

{
  "error": {
    "code": "INSUFFICIENT_SCOPE",
    "message": "Tool 'create_dns_record' requires scope 'dns:write', but the current session has scopes ['dns:read', 'inventory:read'].",
    "recovery": "Ask the user to re-authenticate with the missing scope, or use 'list_dns_records' to inspect existing entries."
  }
}

The model will read recovery and either communicate to the user what needs to be done, or call another tool.

Error codes should be a small, stable enum: INSUFFICIENT_SCOPE, RATE_LIMITED, RESOURCE_NOT_FOUND, INVALID_PARAMETER, UPSTREAM_UNAVAILABLE, CONFLICT. The model learns over time to respond differently to each code.

Pattern 6: Rate limiting — protection against loops

An AI agent stuck in a loop can rapidly consume resources. Seen in production: an agent that, blocked by an ambiguous error, called the same tool 200 times in 3 minutes, each call hitting an external database with cost-per-query.

Pattern. Rate limiting per agent session, not per IP or per user. An agent session is the correct scope: an agent has a task, it has a budget of invocations for that task. Above the limit, respond with RATE_LIMITED error and a retry_after field.

Typical budget for a session: 50-100 invocations per tool per hour. For expensive tools (external LLM call, slow DB query), 10-20. The model will see the error, understand it must change approach, and not spam.

Bonus: include in each tool’s response an optional quota_remaining field. The model will use this information to prioritize its actions.

Pattern 7: Versioning and backward compatibility

An MCP server that evolves will face the classic problem: clients (agents) using an old schema.

Pattern. Never change the semantics of an existing tool without a new name. If search_invoices changes to accept a new parameter, that is OK (with default). If it changes such that old parameters are interpreted differently, create search_invoices_v2 and keep search_invoices deprecated with a warning in the response.

Language models base their behavior on descriptions they have “learned”. Silent changes break agents already in production.

Pattern 8: Telemetry and debugging

For each tool call, log: timestamp, tool name, parameters, duration, result (success / error code), agent session ID, plan identifier (if the agent works with propose-then-act).

These logs are the only source of truth for improving the server. You will observe:

Minimal dashboard: call rate per tool, error rate per tool and code, p50/p95 duration per tool, sessions with more than 30 invocations (suspected loop).

Three mistakes we made and corrected

Mistake 1: the execute_query tool with SQL string as parameter. The model generated queries that worked on its mental model but not on our database (subtle syntax). Solution: specialized tools (get_user_by_email, count_orders_in_range) with strict schema.

Mistake 2: the update_config tool that accepted any key in the config. The model updated fields it should not have touched. Solution: dedicated tools per critical field (update_dns_ttl, update_email_recipient) with validation on accepted values.

Mistake 3: descriptions that assumed the model knew our internal context. “Update the alert” — which alert? Solution: self-sufficient descriptions, with terms explained in context.

Conclusion

A good MCP server is not a thin wrapper over the internal API. It is a surface designed specifically for consumption by a language model, with naming, schema, response format and error handling adapted to how a model decides. The investment in design pays off in the quality of the agent that uses it — and in quiet nights in which you do not run logs trying to figure out why an agent called the wrong tool 50 times.

External sources

Next step

If your team is building or evaluating an MCP server for an AI agent and you would like a technical opinion on schema and response format, we offer a 30-minute consultation at no cost.

We start with a 30-minute conversation.

Free AI-readiness audit for companies with 50+ employees. We reply within 24 hours.