Back to Research

Make Coding Agents Follow a Workflow

A practical convention for governing coding agents with Claude Code rules, MCP boundaries, and review guardrails.

The Meteor of 1860, landscape painting by Frederic Edwin Church (1860).
Rogier MullerJune 23, 20269 min read

AI coding agents write code autonomously by turning a goal into a loop: inspect the repo, plan changes, edit files, run tools, read results, and repeat until they can hand you a diff. With Claude Code guardrails, MCP limits, and review checks, teams keep that loop explicit: repository rules, tool permissions, checkpoints, and clear handoffs.

Agentic coding governance is the set of conventions that tells coding agents what they may do, when they must stop, and what evidence humans need before merging. For Claude Code, Anthropic's coding agent, that usually means a small mix of CLAUDE.md, skills, hooks, MCP boundaries, slash commands, and code review guardrails.

Good AI coding training for engineering teams should teach this workflow before it teaches prompts. The win is not “let the agent do everything.” The win is “make the agent’s autonomy boring enough to review.”

Write down the agent loop you actually trust

Start with the workflow, not the tool. A coding agent should have a named path from request to pull request: read context, form a plan, make a narrow change, run checks, summarize risk, and ask for review.

This is why state-machine thinking is useful. A small Show HN project framed coding-agent workflows as enforceable states on OpenAI Codex, OpenAI's coding agent; the broader lesson applies across Claude Code, Claude, Anysphere's AI code editor, and other AI IDE setups. Autonomy is easier to govern when each step has an allowed next step.

The trap is writing vague rules like “be careful” or “follow best practices.” Agents cannot review vibes. Give them transitions: “after editing database code, run migrations check,” or “after touching auth, stop for human approval.”

For a deeper pass on boundaries, see How Autonomous Coding Agents Ship Safely. For the broader training lane, keep the related training topic handy.

Put durable rules where the agent will read them

In Claude Code, use CLAUDE.md for durable project memory: architecture constraints, local commands, review expectations, and “do not touch” areas. Keep task-specific instructions in the prompt, not in the repo memory.

A practical pattern is one short root CLAUDE.md, then narrower files near the code they govern. For example, services/billing/CLAUDE.md can say that refund logic needs an integration test and a finance reviewer, while the root file only names the global test command.

The trap is turning CLAUDE.md into a policy landfill. If a human would not read it before editing, the agent probably will not use it well either. Keep it short, local, and written like engineering notes.

Treat MCP as a permission boundary

MCP, the Model Context Protocol, is an integration layer that lets agents connect to tools and data sources such as GitHub, issue trackers, databases, design files, and internal docs. In an agentic coding setup, MCP is not just convenience. It is part of your security and workflow boundary.

Give the agent the smallest tool surface that completes the job. Reading GitHub issues is usually low risk. Writing to production databases, changing cloud settings, or posting broad Slack announcements should require human approval or be unavailable.

The trap is connecting every useful system because the demo looks great. Each MCP server expands what an agent can see or do. Treat that like granting credentials to a new teammate who never gets tired and may misunderstand the task.

Review the agent’s evidence, not its confidence

Your review rule should ask for artifacts. A useful agent handoff says what changed, what commands ran, what failed, what was not tested, and which files deserve extra attention.

This is where developer productivity and governance can work together. The agent can do the repetitive work: run tests, collect logs, update docs, and prepare a clean summary. The human keeps the judgment: whether the change is acceptable for the product and the risk.

The trap is accepting a polished summary without checking the diff. AI coding agents are good at sounding complete. Make the review checklist concrete enough that “looks good” is never the only signal.

Paste this workflow convention

Use this as a starter convention in CLAUDE.md, AGENTS.md, or your team engineering handbook. Then trim it to match your repo.

# Coding Agent Workflow Convention

## Purpose
Use this convention when Claude Code or another coding agent changes this repository.
The agent may work autonomously only inside the states and permissions below.

## Allowed workflow states
- Intake: restate the task, identify likely files, and ask if the request is ambiguous.
- Plan: propose a short plan before editing when the change touches more than 3 files, auth, billing, data deletion, migrations, or public APIs.
- Edit: make the smallest coherent change. Do not mix refactors with behavior changes unless requested.
- Check: run the narrowest useful command first, then broader checks if the narrow check passes.
- Handoff: summarize the diff, commands run, failures, untested areas, and review risks.

## Stop and ask before
- Changing authentication, authorization, billing, encryption, or data retention behavior.
- Creating or modifying database migrations.
- Adding a new dependency or changing build tooling.
- Editing generated files unless the generator command is known and run.
- Using MCP tools that write to external systems.

## Claude Code project memory
- Prefer local repository rules over general advice.
- Follow nested CLAUDE.md files when working inside a scoped directory.
- Keep comments and docs accurate, but do not rewrite unrelated prose.
- If commands fail, report the exact command and the relevant error lines.

## MCP permissions
- Allowed without approval: read GitHub issues, read repository docs, read design specs.
- Approval required: create pull requests, post external comments, update tickets, write to databases.
- Not allowed: production secrets, destructive database actions, cloud permission changes.

## Review checklist
- [ ] The diff matches the requested scope.
- [ ] The agent listed every command it ran.
- [ ] Failing or skipped checks are explained.
- [ ] Security-sensitive areas were human-reviewed.
- [ ] Generated files were produced by a documented command.
- [ ] The PR description names remaining risks or says “No known remaining risks.”

Adoption should be lightweight. One engineer proposes the convention in a normal pull request, the code owners for platform and security review it, and the final version lives beside the code it governs.

The enforcement rule is simple: reviewers can send back any agent-authored PR that does not include the handoff evidence. You do not need a perfect policy on day one. You need a rule the team is willing to apply every time.

Train the team on one workflow at a time

Do not roll out ten agent rules, five MCP servers, and a new review process in the same week. Pick one workflow where the team already knows the right answer, such as documentation updates, small bug fixes, or test generation.

In an AI coding workshop, have engineers run the same task twice: once with no repo convention, once with the convention above. The difference is usually obvious. The second run produces fewer surprise edits and a much better review note.

The trap is measuring only speed. Fast changes that create review debt are not developer productivity. Track whether the agent stayed in scope, whether checks were reproducible, and whether the reviewer could understand the change in under a few minutes.

Common questions

  • How do AI coding agents write code autonomously?

    They work by looping through context gathering, planning, editing, tool use, and result inspection until they can produce a diff or ask for help. The important caveat is that “autonomous” does not mean “unbounded”; a team should define allowed tools, stop points, and review evidence before trusting the loop.

  • Should we model coding-agent work as a state machine?

    Yes, if the team needs repeatable behavior across agents, repos, or risk levels. A state machine can be as small as five states: intake, plan, edit, check, and handoff. The value is not formalism; it is making invalid jumps, like editing production config before review, easy to spot.

  • Where should rules live: CLAUDE.md or AGENTS.md?

    Use the file your tool reliably reads, and keep the same convention mirrored only when multiple agents work in the repo. For Claude Code, CLAUDE.md is the native project-memory artifact; AGENTS.md can be useful as a cross-tool convention file when Codex, Claude, or other coding agents share the repository.

  • How strict should MCP permissions be for coding agents?

    Start strict and loosen only after you see safe repeated use. Read-only MCP access to issues, docs, and repository metadata is a good first step; write access to tickets, pull requests, databases, or production systems should require approval. The boundary should match the blast radius of a mistaken action.

  • Will these guardrails slow the team down?

    They may slow the first few tasks, but they usually reduce review time once the pattern sticks. The useful metric is not raw agent speed; it is cycle time from request to trusted merge. A two-minute handoff checklist is cheap if it prevents a twenty-minute archaeology session in review.

Further reading

Start with one enforced handoff

Add the checklist to one repo, require it on agent-authored PRs, and adjust after ten reviews. That is enough to turn agentic coding from a clever demo into a team workflow.

One methodology lens

One useful way to read this through our methodology is the Plan step: delegate first-pass decomposition and dependency mapping, review the sequencing and assumptions, and keep ownership of scope and priorities. If that split is still fuzzy, the workflow usually is too.

Related training topics

Related research

Continue through the research archive

Ready to start?

Transform how your team builds software.

Get in touch