Back to Research

Review Rules for AI Coding Agents

A practical Claude Code convention for aligning teams on AI-assisted code review, MCP boundaries, and review guardrails.

Stonehenge at Sunset, landscape painting by John Constable (1836).
Rogier MullerJune 29, 20269 min read

Align the team by agreeing on where AI assistants may act, what evidence they must leave, and how reviewers check the work. Do not start by choosing a side; start with a shared review workflow, a small tool boundary, and a reversible pilot.

An ai code review workflow for teams is a written convention that tells humans and coding agents what may change, what must be verified, and what reviewers should reject. For Claude Code, Anthropic's coding agent, that convention usually lives in CLAUDE.md, a review checklist, and a few guardrails around MCP, hooks, slash commands, and team skills.

Write the team rule before the prompt

Start with repository rules, not personal preferences. A split team usually has two reasonable fears: one side worries about lost developer productivity, and the other worries about silent regressions, data leaks, and unreadable changes.

Put the durable rules in CLAUDE.md. Keep task-specific intent in the prompt. That separation matters because CLAUDE.md is always-on context, while a prompt is one request on one day.

For example, in a payments service, the team rule might say: “AI may refactor validation code, but may not change payment authorization behavior without a design note and an integration test.” That gives Claude Code room to help without turning every PR into a philosophical argument.

The trap is writing a giant policy nobody reads. A useful CLAUDE.md should fit the repo, name the risky paths, and tell the agent when to stop.

# CLAUDE.md

## Team rules for AI-assisted changes

- Prefer small PRs: one behavior change or one refactor, not both.
- Do not change authentication, authorization, billing, migrations, or data deletion paths without an explicit human plan in the PR description.
- When modifying tests, explain whether the test catches an existing bug, documents intended behavior, or updates changed behavior.
- Use MCP tools only for the task named in the prompt. Do not browse unrelated issues, tickets, docs, or customer records.
- Before final response, list files changed, tests run, and risks still needing human review.

For broader engineering team AI adoption, this belongs in the same family as code ownership, incident review, and release gates. It is governance, but the lightweight kind that engineers will actually use. For a fuller training map, see the related training topic.

Put MCP behind a testable boundary

MCP is a protocol that lets coding agents connect to external tools and context such as GitHub, Slack, databases, file systems, design files, and internal docs. That makes it powerful. It also makes it the place where “the agent changed code” quietly becomes “the agent read or touched company systems.”

Treat each MCP server like an integration, not a magic tunnel. Define what it may read, what it may write, and what traces it should leave.

As of June 2026, Ocarina is a useful example of the direction teams are moving: automating and testing MCP server behavior from YAML without putting a large language model in the loop. The important lesson is not that every team needs that exact project. The lesson is that tool boundaries should be testable without asking a model to behave nicely.

A practical Claude Code pattern is to keep MCP permission notes beside the repo rule:

## MCP permissions

Allowed for routine coding tasks:
- GitHub: read issues, read PRs, comment on the current PR only
- Docs: read engineering docs under /platform and /api

Requires human approval in the prompt:
- Creating or editing issues
- Reading customer tickets
- Running database queries
- Posting to shared Slack channels

Never allowed from an agent session:
- Production writes
- Secrets access
- Customer data export

The trap is giving every agent every integration because setup is easier. Broad access feels convenient for a week, then becomes impossible to audit.

Copy this review convention

Use one copyable artifact first. Then decide who owns it, where it lives, and how reviewers enforce it.

# AI-assisted code review checklist

Paste this into `.github/pull_request_template.md`, `docs/ai-review.md`, or the repo's `CLAUDE.md`.

## Author checklist

- [ ] I used an AI assistant for this PR.
- [ ] The PR description says what the assistant did: draft, refactor, test generation, debugging, review, or docs.
- [ ] The change is small enough for a reviewer to understand without trusting the assistant.
- [ ] I reviewed every changed file myself.
- [ ] I checked generated tests for false confidence, weak assertions, and deleted coverage.
- [ ] I ran the relevant test command and pasted the command below.
- [ ] I listed any files, APIs, or behaviors that need extra human attention.

## Agent boundary checklist

- [ ] The assistant did not change auth, billing, migrations, data deletion, or security-sensitive paths without an explicit plan.
- [ ] MCP tools were limited to the task in the prompt.
- [ ] No secrets, customer data, private tickets, or unrelated docs were exposed to the agent.
- [ ] Any hook, slash command, or skill used by the agent is checked into the repo or named in the PR.

## Reviewer checklist

- [ ] Review the diff as human-authored code. Do not assume the assistant is right.
- [ ] Ask for a smaller PR if behavior changes and refactors are mixed.
- [ ] Verify that tests fail for the bug or behavior being claimed.
- [ ] Check edge cases the assistant is likely to miss: permissions, time zones, retries, concurrency, and rollback.
- [ ] Require a follow-up issue for any risk the author cannot close in this PR.

## Required PR note

AI assistance used: yes/no
Assistant and surface: Claude Code / Claude / Codex / other
Tests run:
Human attention requested:

Adoption should be boring. One senior engineer proposes the first version. The service owners review it like any other repo convention. The final copy lives in CLAUDE.md for agent behavior and in the PR template for human review.

The enforcement rule is simple: reviewers may block a PR when the checklist is missing, vague, or contradicted by the diff. That is the minimum guardrail that keeps ai coding training for teams from becoming a slide deck nobody uses.

If your team already has a review standard, fold this into it rather than adding another ceremony. A useful companion pattern is the scoped convention in A Safer Agent Review Convention.

Train the workflow, not just the tool

Claude Code, Claude, Anysphere's AI code editor, and OpenAI Codex all expose different surfaces, but the team habits should rhyme. Teach the shared workflow first: plan, constrain, generate, verify, review, and record.

A good workshop exercise is to give everyone the same bug in the same repo. Half the group uses Claude Code to draft a fix. Half reviews without seeing the prompt. Then the team compares what evidence actually helped: tests, notes, smaller diffs, or agent transcripts.

This is where team skills help. A Claude skill can package the team’s debugging workflow, test commands, review checklist, and examples. A slash command can start a repeatable “prepare PR” flow. A hook can stop obvious mistakes, such as committing secrets or modifying locked paths.

The trap is measuring adoption by who used the assistant the most. Better signals are smaller PRs, clearer reviewer notes, fewer avoidable regressions, and faster onboarding to repo rules.

Keep the tradeoffs visible

AI-assisted review does not remove human accountability. It changes where humans spend attention.

The upside is real: coding agents are good at finding nearby context, drafting tests, explaining unfamiliar code, and doing repetitive refactors. The cost is also real: agents can overfit to existing patterns, miss product intent, invent confidence, or use tools more broadly than the task requires.

Make the tradeoff visible in the PR. Ask the author to name what the assistant did and what still needs human judgment. Reviewers should be kinder about process mistakes during the pilot and stricter about risky code paths.

The best convention is not the strictest one. It is the one your team can remember on a Friday afternoon.

Common questions

  • What should an ai code review workflow for teams include?

    It should include repo rules, tool boundaries, author disclosure, test evidence, and reviewer rejection criteria. The copyable checklist above covers those five parts in one artifact, which is enough for a first 2–4 week pilot before you add more process.

  • Our engineering team is split on using AI code assistants. How do we align them?

    Align them by running a reversible pilot with shared rules, not by forcing consensus on the tools. Pick one repo, one checklist, one MCP boundary, and one review rule; then judge the workflow by PR quality, test clarity, and reviewer confidence.

  • Should reviewers treat AI-written code differently?

    Reviewers should treat AI-written code as human-submitted code with extra attention to evidence. The useful difference is disclosure: when the PR says which assistant was used, which commands ran, and which files need attention, reviewers can spend less time guessing and more time checking risk.

  • Where should Claude Code team conventions live?

    Put durable repo behavior in CLAUDE.md, human process in the PR template, reusable workflows in skills, and risky automation in hooks. Avoid hiding the convention in chat history; the rule should be versioned, reviewable, and close to the code it governs.

  • How strict should MCP permissions be at the start?

    Start narrower than feels necessary, then widen access after the team sees real use cases. A safe first boundary is read-only access to current-repo context and current-PR discussion, with human approval for tickets, customer data, databases, issue creation, or shared-channel posting.

Further reading

Start with one repo

Pick a real service, add the checklist, narrow the MCP boundary, and run the convention for two weeks. Keep what reviewers actually used, delete what they ignored, and only then roll it out wider.

One methodology lens

One useful way to read this through our methodology is the Plan step: delegate first-pass decomposition and dependency mapping, review the sequencing and assumptions, and keep ownership of scope and priorities. If that split is still fuzzy, the workflow usually is too.

Related training topics

Related research

Continue through the research archive

Ready to start?

Transform how your team builds software.

Get in touch