Back to Research

Pick the Review Workflow, Not the Bot

A Claude Code-first guide to AI code review guardrails, MCP boundaries, and a pasteable review receipt.

Landscape near Swansea, South Wales, landscape painting by James Ward (1805).
Rogier MullerJune 21, 20268 min read

The best AI code review setup is not a single product. It is a governed workflow that makes every coding agent follow the same rules, use the same evidence, and leave the same review receipt.

AI code review is the use of a language model or coding agent to inspect changes for correctness, security, maintainability, and fit with team conventions. For Claude Code users, the practical move is to put the workflow into Claude Code, Anthropic’s coding agent, while keeping the rules portable enough for Claude, Anysphere’s AI code editor, OpenAI Codex, OpenAI’s coding agent, and whatever your team tries next.

Choose the review boundary before the tool

Start by deciding what your code review AI is allowed to judge. A good first boundary is: the agent may identify risks, ask for missing tests, check conventions, and summarize tradeoffs, but it may not approve, merge, or rewrite large parts of the PR without a human owner.

This matters because most ai code review tools feel impressive on clean diffs and shaky on ambiguous product intent. A model can spot an unhandled error path, but it cannot reliably know whether the team chose a weird tradeoff on purpose unless that context is written down.

The useful lesson from cross-agent brain projects, as of June 2026, is portability. Teams want one operating model that works across Claude Code, Claude, Codex, and OpenCode-style workflows. The trap is turning that into one giant shared memory file that every agent reads forever. Shared standards are good. Shared stale context is not.

A clean boundary for a review agent looks like this:

The agent checks the diff, the local tests, the repository rules, and the linked issue. It writes a receipt. A human reviewer decides whether the finding blocks the merge.

Put team rules where agents actually read them

For Claude Code, put durable review rules in CLAUDE.md. Keep them short, scoped, and boring. “Run tests before claiming safety” is durable. “Today we are refactoring billing retries” is task context, not memory.

If your repo also serves agents that read AGENTS.md, keep the same policy there in the language those agents expect. Use nested files when a subdirectory has different rules. A payment service, a mobile app, and a documentation site should not inherit the same review checklist just because they live in one monorepo.

A useful CLAUDE.md review rule is concrete:

Before posting an AI review, inspect the diff, list the files touched, run or name the relevant tests, and separate blocking findings from suggestions. Do not mark a review complete without a receipt.

That rule is small enough to survive. It is also specific enough that an engineering manager can audit whether the workflow happened.

For a broader operating model, keep your team’s standards under AI coding governance, not inside one developer’s prompt history. If you want the receipt pattern in more depth, see Code Review Agents Need Receipts.

Give agents tools, not blanket trust

MCP, the Model Context Protocol, is the integration layer that lets agents reach systems such as GitHub, Slack, issue trackers, document stores, databases, and private knowledge bases. Treat each MCP server like production access, not like a convenience plugin.

For review work, most teams need read access to the PR, issue, code search, CI status, and maybe a design doc. They usually do not need write access to production data, broad Slack history, or every private repo. Narrow access makes the agent less magical, but much easier to trust.

Use Claude Code hooks, slash commands, and skills as workflow rails. A /review-pr command can tell the agent to gather context, run the checklist, and write the receipt. A hook or CI check can stop the workflow from posting a final review if required fields are missing.

The trap is giving the agent every tool because one rare review might need it. That creates a bigger security and privacy surface for a tiny productivity gain. In agentic coding, the default should be least privilege with an explicit escalation path.

Make every AI review leave a receipt

For teams asking what is the best ai tool for code review, the better test is whether the tool can produce a useful receipt every time. The receipt is the difference between “the bot said it looks good” and “we can see what it checked, what it skipped, and why it thinks this is safe.”

Paste this into CLAUDE.md, your PR template, or a Claude Code skill for review work. It is intentionally plain. You can make it stricter later.

## AI review receipt

Use this receipt before posting an AI-assisted PR review.

## Scope checked
- PR or branch:
- Files inspected:
- Related issue, spec, or ticket:
- Areas intentionally not reviewed:

## Repository rules applied
- Relevant CLAUDE.md / AGENTS.md rules:
- Architecture constraints:
- Security or privacy constraints:

## Verification
- Tests run:
- Tests not run, with reason:
- Build, typecheck, lint, or CI status:
- Manual checks performed:

## Findings
- Blocking findings:
  - [ ] Finding:
      Evidence:
      Suggested fix:
- Non-blocking suggestions:
  - [ ] Suggestion:
      Why it helps:

## Risk call
- Risk level: low / medium / high
- Main uncertainty:
- Human reviewer needed for:

## Final reviewer note
Do not approve or merge based only on this receipt.
A human owner must decide whether the remaining risk is acceptable.

The review receipt also makes llm code review easier to compare across tools. If Claude Code, Claude, and Codex all produce receipts against the same checklist, you can evaluate the workflow instead of arguing from vibes.

Train the team like the workflow is production code

Do not roll this out as “everyone use AI for reviews now.” Run a short ai coding workshop with one real PR, one review receipt, and one discussion about what the agent missed. That gives engineers a shared standard before habits drift.

Good engineering team training should cover three moves. Write durable rules in CLAUDE.md or AGENTS.md. Keep MCP permissions narrow. Require a receipt before a review is treated as useful input.

The limitation is real: AI review still struggles with intent, product nuance, and hidden coupling. It is best at catching local issues, summarizing risk, and making reviewers faster. It is worst when teams use it to replace ownership.

A simple adoption plan is to start with advisory reviews on low-risk PRs for two weeks. Compare receipts against human findings. Then decide which checks become required and which stay optional.

Common questions

  • What is the best AI tool for code review for an engineering team?

    The best tool is the one that follows your review workflow reliably, not the one with the flashiest demo. Use one receipt template across Claude Code, Claude, Codex, and other code review tools, then compare false positives, missed issues, setup friction, and whether engineers actually trust the output.

  • Should Claude Code approve pull requests by itself?

    No, Claude Code should not be the final approver for production changes. A safe starting rule is that the agent may review, summarize, and suggest fixes, but a human owner must approve and merge after reading the receipt and checking any unresolved risk.

  • Do we need both CLAUDE.md and AGENTS.md?

    Use both only if your team runs agents that read both files. Keep the rules aligned, keep each file short, and prefer nested files for local conventions; one root document usually breaks down once a monorepo has different services, languages, or risk levels.

  • Where should MCP access stop for review agents?

    MCP access should stop at the systems needed to review the change. For most PRs, that means source code, CI, issues, and approved docs; write access, production databases, broad chat history, and unrelated repositories should require a separate, explicit permission path.

  • Can AI review replace human review?

    No, AI review can reduce reviewer load, but it does not replace human judgment. Treat it as a first-pass reviewer that creates a structured receipt; humans still own architecture tradeoffs, product intent, security acceptance, and the final merge decision.

Further reading

Start with one PR

Pick one ordinary pull request this week and require the receipt before the AI review counts. If the receipt is useful, move the rule into CLAUDE.md and teach the team from that real example.

One methodology lens

One useful way to read this through our methodology is the Plan step: delegate first-pass decomposition and dependency mapping, review the sequencing and assumptions, and keep ownership of scope and priorities. If that split is still fuzzy, the workflow usually is too.

Related training topics

Related research

Continue through the research archive

Ready to start?

Transform how your team builds software.

Get in touch