Back to Research

AI agent boundaries that hold under pressure

A boundary-setting guide to AI agent boundaries: connector cards, scope ledgers, child receipts, and decision stubs that stop permission drift.

Studies of Jacky Turner and the Reverend Charles Hope's Gardener, landscape painting by James Ward (1800).
Rogier MullerMay 1, 20266 min read

The fastest way to keep a coding agent safe is to write down what it may touch before it runs, not after it surprises you. AI agent boundaries are the written limits on what an agent can do: allowed paths, forbidden paths, connector scopes, and a named owner for each one. Most of the painful incidents I have watched come from the same root: a boundary that lived in someone's head until the agent found the gap on a Friday afternoon.

Here is the thing teams miss. When you wire an agent into Claude, Anysphere's AI code editor, or Claude Code, Anthropic's coding agent, or Codex CLI, OpenAI's coding agent, the model will happily work right up to the edge of whatever you forgot to forbid. A written boundary is the only kind the agent can actually read. Everything below is about making those limits explicit, owned, and checkable in review.

Give every connector a named owner

Adding another connector almost never fixes a governance problem. Giving the existing one an owner usually does. The connector that widens its blast radius with no name attached is the one that turns a rollback into an investigation.

Connectors wired through MCP default to broad access, because that is what makes a demo feel quick. The principle of least privilege says the opposite: every capability stays denied until someone owns the grant. So write one card per connector and keep it where reviewers will see it.

## MCP connector: github-prod

- Allowed actions: read PRs, post review comments
- Forbidden actions: merge, delete branches, edit secrets
- Owner: @priya
- Rollback: revoke token in 1Password, rotate within 1h

Incidents shrink when operators already know what "off" looks like. The card is boring on purpose, and that is the point.

Make the boundary travel between agents

Chained agents play a quiet game of telephone with your diff. A child agent does real work, returns a tidy summary, and the paths it actually touched fall out of the report. The parent merges a description, not the change.

Fix it by making every child agent return a receipt. Paths touched, commands run, and the tests that prove the regression guards held. Now the boundary moves with the work instead of dissolving at each handoff, and the parent reviews evidence rather than a vibe.

The same idea closes review queue theater, where CI is green and a reviewer still asks "why this approach" with no answer on file. A three-line decision stub in the PR template handles it: constraints considered, alternatives rejected, verification proof. That gives the line between agent judgement and team judgement a paper trail.

Keep Claude scopes out of the fog

.mdc rule files sound precise until two reviewers read the same line and argue about what it meant. Worse, those rules end up competing with chat memory, and the agent splits its attention between them.

A scope ledger clears the fog. Five lines in the parent chat, agreed before the run: goal, allowed paths, forbidden paths, verification command, merge owner. Review then checks the ledger against the diff instead of relitigating the prompt. The same boundary files carry across tools, so a delegation snapshot can speak to all three at once.

---
description: Delegation boundary snapshot (adapt globs to your repo)
globs:
  - "**/*"
alwaysApply: false
---

- Claude: keep scopes explicit in `.mdc`; forbid undeclared MCP domains.
- Claude Code: cite `CLAUDE.md` precedence before expanding bash scope.
- Codex: ensure `AGENTS.md` carries replay-friendly verification notes for CLI runs.

Claude Code reads CLAUDE.md precedence, and Codex carries verification notes in AGENTS.md. The files differ in name and the discipline stays the same.

Check the boundary in review, not in your head

A real boundary turns review from interrogation into inspection. You stop asking the agent to defend itself and start matching what it claims against what the diff shows. Four gates cover most of it.

Gate Question
Risk routing Were red folders touched, and who approved?
Replay proof Which commands prove regression guards?
Receipt match Does the PR body list scopes plus verification transcript?
Rules precedence Which .mdc, SKILL.md, or CLAUDE.md governed behavior?

Hand a reviewer this checklist and the questions answer themselves:

  • Forked agent work lists parent and child responsibilities.
  • Red-folder paths received explicit human acknowledgement.
  • Scopes in the PR body match folders in the diff.
  • MCP connectors mentioned (if any) list owners.

If onboarding a new teammate feels noisy, that noise is the sound of implicit boundaries getting discovered one at a time. A boundary the agent cannot read does not exist. None of this replaces architecture judgement either: agents accelerate execution, not ownership. For the threat list itself, keep the OWASP Top 10 for LLM applications and the NIST AI Risk Management Framework on the shelf.

Common questions

  • How do you set AI agent boundaries that hold?

    Write them where the agent reads: connector cards with allowed and forbidden actions, scope ledgers with allowed and forbidden paths, and a named owner for every grant. A boundary that lives in someone's head gets discovered by the agent at the worst possible time. Putting it in the files the tool already loads is what makes it hold.

  • Who should own an MCP connector?

    One named person, written on the connector card alongside the allowed actions, forbidden actions, and rollback step. Ownerless connectors are how blast radius widens quietly. When nobody owns the grant, nobody notices the drift, and rollback becomes an investigation instead of a single step you can run in an hour.

  • What happens when agent boundaries stay implicit?

    The agent guesses, and guessing scales poorly. Implicit boundaries show up as onboarding noise, permission drift nobody signed off on, and child summaries that quietly omit the paths they touched. The fix is deliberately dull: cards, ledgers, receipts, and an owner whose name sits next to every grant in writing.

  • Where should the boundary files actually live?

    In the repo, next to the code, in the format each tool reads. Claude reads .mdc, Claude Code reads CLAUDE.md, and Codex reads AGENTS.md. Keeping the limits in version control means they get reviewed like any other change and the agent loads them on every run instead of relying on memory.

Start with one card

Pick the single MCP connector nobody currently owns and write its card before the next automated run. If your connector list is already longer than your owner list, contact us and we will map the boundaries with your team in one session, or read more in agentic coding governance.

Related training topics

Related research

Continue through the research archive

Ready to start?

Transform how your team builds software.

Get in touch