Does this slow my team down?

No, it usually speeds review up. The receipt moves a few minutes of work from the reviewer, who would otherwise reconstruct intent from a chat log, back to the agent run, where the context is fresh. A short merge with a clear receipt costs less total time than a fast merge that triggers questions a day later.

Do I need different rules for Claude, Claude Code, and Codex?

The handoff receipt stays the same across all three, so reviewers learn one format. The tool files differ because each tool reads a different one: Claude uses .mdc rules, Claude Code uses CLAUDE.md, and Codex uses AGENTS.md. Write the shared receipt first, then anchor each tool's rules to it.

What is the single most important field on the receipt?

Residual risk, the line where the agent flags what it is unsure about. Intent and evidence make a change explainable, but residual risk is what keeps a human owning the uncertain part instead of trusting a green test suite. A receipt that never names a risk is usually hiding one.

How does this connect to MCP and tool access?

The authority gate is where MCP boundaries show up in review. When an agent can reach external tools or data through connectors, the receipt should record which instruction file governed the session and which tools it was allowed to use. That makes an over-broad connector scope visible at review time instead of after an incident. OWASP's Top 10 for LLM Applications is a good map of what to watch.

Is this overkill for a one-line change?

Keep the receipt mandatory even for small diffs, because small agent changes are exactly the ones that get waved through. The receipt can be three lines for a trivial fix. The habit matters more than the length, and a tiny change with a clear intent line still beats a tiny change nobody questioned.

Agentic Coding Breaks At The Handoff

Most teams do not lose control when an agent writes bad code. They lose it ten minutes later, in review, when nobody can explain why the change was shaped that way. A handoff is the moment an agent stops working and a human takes responsibility for the result, and it is the part of agentic coding teams most often skip. Whether you run Claude Code (Anthropic's coding agent), Claude (Anysphere's AI code editor), or Codex CLI (OpenAI's coding agent), the diff is rarely the problem. The missing explanation is.

The diff looks fine. The tests are green. The summary says all the right things. Then a reviewer asks a boring question: "What problem were we solving before the agent touched six files?" That question is where the workflow either holds or quietly falls apart.

Why the handoff, not the prompt, is where things break

You can write a perfect prompt and still ship a change no one can defend. The agent did good work, but the reasoning lived in a chat window that the reviewer never sees. So review turns into archaeology: scrolling a transcript to reconstruct intent.

Tool-specific rules help, but they do not close this gap on their own. Claude reads .mdc rules. Claude Code reads a CLAUDE.md. Codex reads an AGENTS.md. Those files make agents behave better, and they are worth setting up. They still do not guarantee that the next person can explain the change.

The thing agents create that traditional commits do not is a decision trail. When that trail is written down, review is fast. When it lives only in the operator's head, every merge depends on one person being awake to narrate it.

Write a handoff receipt your reviewers will actually read

A good handoff is small. It says what changed, why it changed, what was checked, and what still needs a human's eyes. That is the whole job.

Each tool has a natural place to anchor this. For Claude, start with scope: an .mdc rule should name the folders an agent may touch and the owner who reviews them. For Claude Code, start with authority: CLAUDE.md should separate "allowed to inspect" from "allowed to edit", which matters once hooks or MCP tools are in play. For Codex, start with evidence: AGENTS.md should require command output and touched paths, not a confident summary. Different tools, same team need.

Drop this template at the bottom of any agent-authored PR:

## Agent Handoff Receipt

- Intent:
- Files touched:
- Commands run:
- Evidence:
- Reviewer attention:
- Follow-up risk:

It looks almost too plain, and that is the point. Fancy process does not survive a busy review queue. A six-line receipt does.

Review agent changes with five questions

Once you have the receipt, the review itself gets short. You are checking five things, and each one maps to a question a reviewer can ask out loud.

Gate	Reviewer question
Scope	Did the changed files match the stated task?
Intent	Can someone explain the solution without replaying the chat?
Evidence	Are the tests or checks named, not merely implied?
Authority	Did the agent use only the tools and data it was allowed to use?
Residual risk	Is the uncertain part visible enough for a human to own?

This is where governance stops being a slide and becomes a habit. You are not adding ceremony. You are stopping a fast merge from turning into slow maintenance three weeks later.

Roll this out on one workflow first

Pick one workflow where AI already touches real code. Do not start with the whole company. Write the handoff receipt for that workflow, then tighten the tool rules around it: Claude rules for scope and ownership, Claude Code instructions for tool boundaries, Codex instructions for verification and final reporting.

Then run the same task twice, once with the old workflow and once with the receipt. The useful signal is not how many lines the agent wrote. It is whether a reviewer can defend the merge without asking the operator to narrate the session. If the receipt feels annoying, shorten it. If reviewers still ask what happened, it is not specific enough yet.

We map this to the Review step in our methodology for one reason: evidence beats narration when merges touch shared code.

Common questions

Does this slow my team down? No, it usually speeds review up. The receipt moves a few minutes of work from the reviewer, who would otherwise reconstruct intent from a chat log, back to the agent run, where the context is fresh. A short merge with a clear receipt costs less total time than a fast merge that triggers questions a day later.
Do I need different rules for Claude, Claude Code, and Codex? The handoff receipt stays the same across all three, so reviewers learn one format. The tool files differ because each tool reads a different one: Claude uses .mdc rules, Claude Code uses CLAUDE.md, and Codex uses AGENTS.md. Write the shared receipt first, then anchor each tool's rules to it.
What is the single most important field on the receipt? Residual risk, the line where the agent flags what it is unsure about. Intent and evidence make a change explainable, but residual risk is what keeps a human owning the uncertain part instead of trusting a green test suite. A receipt that never names a risk is usually hiding one.
How does this connect to MCP and tool access? The authority gate is where MCP boundaries show up in review. When an agent can reach external tools or data through connectors, the receipt should record which instruction file governed the session and which tools it was allowed to use. That makes an over-broad connector scope visible at review time instead of after an incident. OWASP's Top 10 for LLM Applications is a good map of what to watch.
Is this overkill for a one-line change? Keep the receipt mandatory even for small diffs, because small agent changes are exactly the ones that get waved through. The receipt can be three lines for a trivial fix. The habit matters more than the length, and a tiny change with a clear intent line still beats a tiny change nobody questioned.

Start with one receipt

Add a single handoff receipt to the workflow your team already uses, then read Agentic coding governance for the wider picture. The goal is not a perfect agent. It is an agent contribution any reviewer can defend.

Agentic Coding Breaks At The Handoff

Why the handoff, not the prompt, is where things break

Write a handoff receipt your reviewers will actually read

Review agent changes with five questions

Roll this out on one workflow first

Common questions

Start with one receipt

Sources worth keeping open

Related training topics

Related research

Best practices for agentic coding in real environments

Codex workspace agents need repo rules

AI coding agents workflow guardrails for browser control

Continue through the research archive

Echo Routes Open-Weight Models for Less

Claude Code 2.1.139 team conventions

Ready to start?

Why the handoff, not the prompt, is where things break

Write a handoff receipt your reviewers will actually read

Review agent changes with five questions

Roll this out on one workflow first

Common questions

Start with one receipt

Sources worth keeping open

Related training topics

Claude Code for engineering teams

Claude Code best practices training for engineering teams

Claude Code review training for engineering teams

Shared agent workflows for code review risk reduction

Related research

Best practices for agentic coding in real environments

Codex workspace agents need repo rules

AI coding agents workflow guardrails for browser control

Continue through the research archive

Echo Routes Open-Weight Models for Less

Claude Code 2.1.139 team conventions

Ready to start?