Why MCP needs a trust layer

📅 Apr 24, 2026⏱ 6 min read

The Model Context Protocol turned out to be the right primitive at the right moment. By giving agents a uniform way to discover and call tools, MCP collapsed what would otherwise have been a thousand bespoke integrations into a single contract. The ecosystem responded the way ecosystems do when a sharp primitive shows up: thousands of MCP servers, every major model provider supporting it, and a growing list of frameworks treating it as the default agent-tool interface.

It also, deliberately, left a door open.

The MCP spec defines how an agent talks to a tool. It is intentionally agnostic about three questions that matter enormously the moment you put any of this in production:

Who is the agent on the other end of this call, really?
On whose authority is it acting, and within what constraints?
When the action is done, what evidence exists that it happened the way both sides expected?

For local-only or trusted-network use cases, you can wave at these questions. The agent runs on your laptop, the tool runs on your server, and you've already authenticated yourself as the human in the loop. Nothing breaks.

For everything else, multi-tenant agent platforms, enterprise SaaS exposing MCP servers, agents that spawn sub-agents, agents acting across organizational boundaries, the questions are not optional. And the existing answers are bad.

The answers we have today

Today, when an MCP server wants to know who's calling, the answer is "an API key." API keys identify the integration, not the invocation. A long-lived token sitting in an environment variable says nothing about which agent, running which prompt, on whose behalf, made any specific call. If that key is exfiltrated by a prompt injection through an earlier tool return, every downstream call looks identical to a legitimate one.

When the MCP server wants to know what the agent is allowed to do, the answer is "whatever the API key is scoped to." That's coarse-grained at best, and it ignores the question of delegated authority entirely. Imagine a workflow where the user authorizes Agent A, A spawns sub-agent B for a specific subtask, B calls Tool C. If C is asked to charge the user's credit card, what evidence does it have that the user actually consented to this charge, by this agent, in this context? Today, none.

And when something goes wrong, the agent did something it shouldn't have, or the agent says it did something it didn't, there is no signed evidence to fall back on. Your audit trail is whatever logs you happened to keep, and your dispute resolution is whatever your lawyers can argue.

Why this is finally an acute problem

For a while, you could get away with this. Agents were demos. The blast radius was small.

That changed in 2026. We are now in a world where every Fortune 500 has multiple agent pilots in flight, where regulated industries are being asked to deploy AI into customer-facing workflows, and where a single agent invocation can cascade through five or six tool calls touching real money, real data, and real promises to real customers.

We talk to a lot of these teams. The pattern is consistent: the agent prototype works fine. The model is capable enough. The workflow is valuable. The pilot dies in security review.

Security teams are not asking for anything unreasonable. They want to know which agent did the thing, on whose authority, with what scope, and have evidence that holds up. They're being asked to accept agents into the same trust boundary as employees, with none of the identity infrastructure that makes an employee's actions traceable. So they say no.

What a trust layer looks like

The shape of the answer is increasingly clear, and it's not a model problem, it's a protocol problem. The pieces:

Verifiable agent identity. Every agent invocation carries a signed credential proving who deployed it, what model is behind it, and what code is running. Receivers can verify this without phoning home.
Provable delegation chains. When a user authorizes an agent, that authorization is a signed token. When the agent delegates to a sub-agent, the chain extends. The receiving service can verify the entire chain back to the human principal.
Negotiated, scoped capabilities. A request isn't a blank check. It's a typed assertion of "I want to do X, with these constraints, for Y reason." The service grants or refuses, and the refusal is itself signed.
Signed action receipts. Every meaningful action produces a tamper-evident record: what was promised, what was done, by whom, with a hash of the result. Receipts chain together. Audit becomes verifiable.

None of this is novel cryptography. The primitives have existed for decades, DIDs, verifiable credentials, capability tokens, transparency logs. What's missing is a single coherent protocol that puts them together for the agent use case, composes cleanly with MCP and A2A and OAuth, and ships with SDKs that make adoption a one-liner.

That's what we're building. It's called Handshake. The v0.1 spec is available to design partners now; the full draft will be public when we're confident the shape is right.

If you're at a team that's hit the wall described above, agent pilot stuck in security review, no good answer to "who did what", come talk to us. The protocol will be much better for having you in the room early.