Designing your first MCP server: a practical guide

The Model Context Protocol (MCP) gets described as "USB-C for AI." That's a fine elevator pitch, but it hides the part that actually matters: an MCP server is a contract between your system and an autonomous caller that will do unexpected things. Design it like an API for a careful junior engineer who never sleeps and occasionally hallucinates.

Here's how I approach a first MCP server for a real product.

Start with the surface, not the model

The temptation is to wire the model up to everything and see what happens. Resist it. The first design decision is what an agent can see versus what it can do.

Resources are read-only views: a catalog, a set of documents, an order. Stable, paginated, cheap to fetch.
Tools are verbs: look up an order, draft a refund, search inventory. Each one is an action with consequences.

Keep the tool list short and boring. Five well-named tools beat twenty clever ones, because every tool is a thing the agent can get wrong.

If you can express something as a resource instead of a tool, do it. Reads are forgiving; actions are not.

Make every tool schema explicit

An agent picks tools based on their descriptions and schemas. Vague schemas produce vague calls. Be specific about types, required fields, and what the tool does not do.

{
  name: "lookup_order",
  description: "Fetch a single order by id for the authenticated merchant. Read-only.",
  inputSchema: {
    type: "object",
    properties: {
      orderId: { type: "string", description: "The merchant-scoped order id." },
    },
    required: ["orderId"],
    additionalProperties: false,
  },
}

Two details earn their keep here: additionalProperties: false stops the agent from inventing fields, and the description says read-only so the model doesn't reach for it when it needs to change something.

Name tools the way you'd name them for a teammate

lookup_order is better than getOrder and far better than order_tool_v2. The name is part of the prompt. Write it for a reader, because that's exactly what it is.

Build the guardrails before the features

Before a single tool does real work, every call should pass through three things:

Auth scoping — the agent acts on behalf of one tenant, and the server enforces it. Never trust an id in the arguments.
Rate limiting — autonomous callers loop. Assume they will.
Audit logging — every tool call, structured, with inputs and outcome.

For anything destructive, return a draft instead of committing.

async function draftRefund({ orderId, amount }: RefundInput, ctx: Context) {
  assertScope(ctx, orderId);          // never trust the argument alone
  const order = await orders.find(orderId, ctx.tenantId);
  return {
    kind: "draft",
    summary: `Refund ${formatMoney(amount)} on order ${order.number}`,
    requiresApproval: true,
  };
}

A draft that a human approves is slower than a direct write, and that is the entire point. The agent proposes; a person disposes.

Prove it with evals

You would not ship an API without tests. An MCP server is worse, because the caller is non-deterministic. Write a small eval harness that runs representative agent transcripts against the server on every change and checks:

tools were called with valid arguments,
scoping held across tenants,
responses stayed within a token budget.

When a regression shows up, it shows up in CI — not in a customer's support queue. Run them with a single command:

npm run evals

Evals are not optional for production MCP work. Without them, every prompt change is a silent risk to every tool.

Return errors the agent can act on

When a tool fails, the agent reads the error and decides what to do next. A stack trace is useless to it; a clear, structured message is gold. Treat error responses as part of the interface, not an afterthought.

Compare these two failures:

// Unhelpful: the agent has no idea what to do next.
throw new Error("400");
 
// Helpful: the agent can correct itself.
return {
  isError: true,
  message: "No order found with that id. Confirm the order id, which looks like 'ord_' followed by 12 characters.",
};

The second one tells the agent exactly how to recover — the right format, the likely mistake, and an implicit instruction to try again with a corrected argument. The difference between those two messages is the difference between an agent that gets stuck and one that self-corrects.

The same principle applies to validation. If a required field is missing, say which field and what it expects. If a value is out of range, say the range. Every error is a chance to teach the caller how to use your server correctly, and a well-designed server spends that chance every time.

Plan for change from day one

Your tools will change. You'll rename one, tighten a schema, deprecate another. Because the caller is a model that was prompted against a specific shape of your server, changes that you'd consider routine in a normal API can quietly break agent behaviour in ways no test catches.

A few habits keep this manageable:

Version the server, not just individual tools. Expose a version string so clients can detect what they're talking to.
Deprecate before you delete. Keep an old tool working, mark it deprecated in its description, and watch the audit log until traffic to it drops to zero.
Treat descriptions as a public contract. Changing a tool's description changes how the model behaves. Review those edits with the same care you'd review a schema change.

The audit log you built for safety doubles as your migration dashboard. It tells you which tools are actually used, by whom, and how often — exactly what you need before you change anything.

None of this is exotic. It's ordinary API discipline, applied to a caller that happens to be non-deterministic. The teams that struggle with MCP are usually the ones who skipped it, betting that the model would be forgiving. It isn't.

A sensible first milestone

For a first server, I aim for this in a couple of weeks:

three to five resources covering the core read paths,
two or three tools, at most one of which writes (and it writes drafts),
auth scoping, rate limiting, and audit logging on every call,
an eval suite wired into CI.

That's a server you can hand to a customer's agent without losing sleep. Everything after that — more tools, richer resources, streaming — is incremental, because the hard part (the contract and the guardrails) is already in place.

If you're building something agent-facing and want a second pair of eyes on the design, that's exactly the kind of work I do. The keyboard shortcut for getting in touch is, regrettably, still Cmd + C on the contact page.

Share

← All posts

Building something in this space?

If this is the kind of work you need done, let's talk. Or get new posts in your inbox.

Book a call