Back to Research

Agentic coding governance that holds in review

Agentic coding governance as an operating guide: connector ownership, scope ledgers, decision stubs, and review receipts for MCP-connected engineering teams.

Hunters Near Ruins, landscape painting by Jan Baptist Weenix (1648).
Rogier MullerMay 5, 20266 min read

Agentic coding governance is the small set of written contracts that lets a reviewer defend an agent's merge without replaying the chat. It covers scope ledgers, child receipts, decision stubs, and precedence rules between your rule files. The whole point is one test: can someone unfamiliar trace why a change is safe from the repo alone, not from a conversation that scrolled away?

The merge train clogs for a boring reason. CI is green, and nobody in the queue can say why the change is safe. When that happens with agents like Claude (Anysphere's AI code editor) or Claude Code (Anthropic's coding agent), a silent PR body is not neutral. It is scheduling debt you pay back later, by hand, during PR archaeology.

Write down where the judgment lives

In agent-heavy repos the bottleneck moved. It used to be typing speed. Now it is traceability. Another prompt template will not fix that, because the missing thing is ownership, not phrasing.

Green CI is a good measure of safety right up until it becomes the target. Once agents optimize for passing checks, "green" stops meaning "understood." That gap shows up the moment someone asks why the agent touched a file and the only answer is buried in chat history.

So governance is less about controlling the model and more about leaving a trail. Short artifacts, written before the run, beat long debates after it.

Add four small contracts that survive a crunch week

Each common failure has a fix small enough to keep under deadline pressure. Here are the four we see earn their place most often.

Chained agents return summaries that quietly skip the paths a child agent owned, and the parent merges on faith. A child receipt fixes it: every child returns the paths it touched, the commands it ran, and the tests that prove its regression guards. Now there is something concrete to check.

Reviewers still ask "why this approach?" even when CI is green, and no written answer exists. A decision stub in the PR template forces three lines: constraints considered, rejected alternatives, and verification proof. Debate moves from vibes to explicit tradeoffs.

Rule files compete with chat memory, so .mdc rules in Claude read precise but mean different things to different reviewers. A scope ledger in the parent chat carries five lines: goal, allowed paths, forbidden paths, verification command, merge owner. Review then shifts from arguing about prompts to checking the ledger against the diff.

On shared machines, approving bash commands in Claude Code becomes muscle memory. A supremacy clause at the top of CLAUDE.md states which hooks win, which folders need human eyes, and where temporary overrides live, so a session cannot invent policy mid-run.

Here is a delegation snapshot you can drop in and adapt:

---
description: Delegation boundary snapshot (adapt globs to your repo)
globs:
  - "**/*"
alwaysApply: false
---

- Claude: keep scopes explicit in `.mdc`; forbid undeclared MCP domains.
- Claude Code: cite `CLAUDE.md` precedence before expanding bash scope.
- Codex: ensure `AGENTS.md` carries replay-friendly verification notes for CLI runs.

These four contracts are the working core of agentic coding governance, and they pay off first at the handoff between agents.

Give the reviewer questions with file-backed answers

Governance is real when a reviewer can ask four blunt questions and get answers from files, not from memory. If any answer requires "let me find the chat," the trail is broken.

Gate Question
Rules precedence Which .mdc, SKILL.md, or CLAUDE.md governed behavior?
Connector truth Which MCP servers fired, and were they expected?
Reviewer path Can someone unfamiliar trace intent without chat replay?
Risk routing Were red folders touched, and who approved?

Pair those questions with a short receipt the author fills in before requesting review:

  • Red-folder paths received explicit human acknowledgement.
  • Scopes in the PR body match folders in the diff.
  • Primary-doc links were smoke-checked after publishing edits.
  • MCP connectors mentioned (if any) list owners.

Keep the hard calls with humans

Some decisions never go on autopilot. Threat models, customer promises, and blast-radius calls stay with people. Think of agents as the relief crew that does the digging while the humans standing outside the trench still own the blueprint.

That line is what makes the rest of the governance trustworthy. The receipts prove the routine work; the humans own the parts a receipt can never capture.

Common questions

  • What is agentic coding governance?

    Agentic coding governance is the set of written contracts that lets a reviewer defend an agent's merge without replaying the chat: scope ledgers, child receipts, decision stubs, and precedence rules like a CLAUDE.md supremacy clause. The test is simple. Can someone unfamiliar trace intent from the repo alone, with no access to the original conversation?

  • What should a PR carry before an agent merge is approved?

    Three forced lines from the decision stub: constraints considered, rejected alternatives, and verification proof. The scope receipt adds the rest, scopes in the PR body matching folders in the diff, red-folder paths with explicit human acknowledgement, and any MCP connectors listing their owners. That is usually enough for a clean approval.

  • What is a scope ledger and when do you need one?

    A scope ledger is a five-line note in the parent chat: goal, allowed paths, forbidden paths, verification command, merge owner. You need it once rule files start competing with chat memory, because review then shifts from debating prompts to checking the ledger against the diff. It takes a minute to write and saves an afternoon of guessing.

  • Does governance slow down agent-driven delivery?

    No, the receipts are short by design: five ledger lines, three stub lines, one receipt block per child. The real drag is the alternative, PR archaeology, where reviewers reconstruct intent from chat logs after the fact. Traceability done before the run is cheaper than judgment recovered after it.

  • Which rule file wins when Claude, Claude Code, and Codex disagree?

    Whichever one your precedence rule names first, which is why you write that rule down. State it in CLAUDE.md for Claude Code, in .mdc scope for Claude, and in AGENTS.md for Codex, then make each cite its source before expanding scope. Without a declared order, the agents quietly invent one.

Docs to keep open

A few primary sources worth a tab while you set this up:

Next step

Pick one live repo and add the decision stub to its PR template this week, since it is the cheapest fix with the fastest feedback. If you want help installing the rest without stalling the roadmap, start at the contact page or see how we teach it in our training.

Related training topics

Related research

Continue through the research archive

Ready to start?

Transform how your team builds software.

Get in touch