Agentic Coding Breaks At The Handoff
Most teams do not lose control when an agent writes bad code. They lose it when nobody can explain the change ten minutes later. The handoff is the interface.

Most teams do not lose control when an agent writes bad code. They lose it ten minutes later, in review, when nobody can explain why the change was shaped that way.
The diff looks fine. The tests are green. The summary says all the right things. Then a reviewer asks a boring question: "What problem were we solving before the agent touched six files?"
That question is where the workflow either holds or falls apart.
The real failure mode
Counter-thesis: The next productivity gain will not come from a better prompt. It will come from making handoffs reviewable.
The wrong path: We believed tool-specific rules would be enough. Claude got .mdc files. Claude Code got a CLAUDE.md. Codex got an AGENTS.md. The instructions improved, but the handoff still got muddy.
Diagnosis: Agentic coding creates a new engineering artifact: the decision trail. If that trail is missing, review turns into archaeology.
Thesis: The handoff is the interface.
What a good handoff contains
A useful handoff is small. It says what changed, why it changed, what was checked, and what still deserves human attention.
For Claude, the handoff starts with scope. An .mdc rule should name the folders an agent may touch and the owner who reviews them.
For Claude Code, the handoff starts with authority. CLAUDE.md should separate "allowed to inspect" from "allowed to edit", especially when hooks or MCP tools are available.
For Codex, the handoff starts with evidence. AGENTS.md should require command output, touched paths, and the verification that proves the result is not just a convincing story.
The tools are different. The team need is the same.
## Agent Handoff Receipt
- Intent:
- Files touched:
- Commands run:
- Evidence:
- Reviewer attention:
- Follow-up risk:
This looks almost too plain. That is the point. Fancy process does not survive a busy review queue.
The review checklist
| Gate | Reviewer question |
|---|---|
| Scope | Did the changed files match the stated task? |
| Intent | Can someone explain the solution without replaying the chat? |
| Evidence | Are the tests or checks named, not merely implied? |
| Authority | Did the agent use only the tools and data it was allowed to use? |
| Residual risk | Is the uncertain part visible enough for a human to own? |
This is where agentic coding governance becomes practical. You are not slowing the team down with ceremony. You are preventing a faster merge from becoming slower maintenance.
Where teams should start
Pick one workflow where AI already touches real code. Do not start with the whole company.
Write the handoff receipt first. Then update the tool-specific rules around it:
- Claude rules define scope and ownership.
- Claude Code instructions define tool boundaries and review expectations.
- Codex instructions define verification and final reporting.
After that, run the same task twice: once with the old workflow, once with the receipt. The useful metric is not how many lines the agent wrote. It is whether a reviewer can defend the merge without asking the original operator to narrate the session.
Named fix: Review receipt. Make the receipt mandatory for any agent-authored PR, even if the diff is small.
Named fix: Authority line. Every agent session states which instruction file governed it: .mdc, CLAUDE.md, AGENTS.md, or a local skill file.
Named fix: Evidence block. The final summary includes the commands that were actually run and the checks that failed or were skipped.
We map this discipline to our methodology under Review because evidence beats narration when merges touch shared code.
Synthesis: Agents make code cheaper to produce. They make unexplained code more expensive to own.
Sources worth keeping open
- Claude Code - getting started
- OpenAI Developers - Codex quickstart
- OWASP - Top 10 for Large Language Model Applications
- NIST - AI Risk Management Framework
- Google Search Central - helpful, people-first content
- Google Search Central - generative AI content guidance
What to do next
Start with Agentic coding governance. Add one handoff receipt to the workflow your team already uses. If the receipt feels annoying, shorten it. If reviewers still ask what happened, it is not specific enough.
The goal is not to make every agent perfect. The goal is to make every agent contribution legible.
Best ways to use this research
- Best for: engineering teams comparing Claude, Claude Code, and Codex operating habits under delivery pressure around “Agentic Coding Breaks At The Handoff.”
- Best first artifact: turn the named fix into a shared checklist, repo rule, handoff receipt, or policy table before the next automated run.
- Best comparison angle: compare the workflow across review evidence, connector scope, and handoff friction before adding another agent run; keep the path that leaves the shortest auditable trail.
Related training topics
Related research

Why agentic coding governance beats raw speed
Agentic coding governance beats speed: connector cards, child receipts, decision stubs, and scope ledgers that make agent diffs defensible after merge.

Best practices for agentic coding in real environments
An operating guide to best practices for agentic coding in real environments: rule-file precedence, scope ledgers, replay receipts, connector cards.

How to set up agentic coding workflows and guardrails
A field guide to agentic coding workflows and guardrails: handoff receipts, connector ownership, and review gates for engineering teams under deadline.
Continue through the research archive
Newer research
MCP training for engineering teams
Practical mcp training for engineering teams using agentic coding, review guardrails, and connector boundaries.
Earlier research
Claude Code 2.1.139 team conventions
Claude Code 2.1.139 team conventions: a CLAUDE TOC, red-folder approvals, data-class tags on MCP connectors, and a weekly retro note.