AI coding agents workflow guardrails for browser control
Workflow guardrails for AI coding agents with browser control: child receipts, decision stubs, scope ledgers, and a supremacy clause reviewers can audit.

Give your AI coding agents workflow guardrails before you give them browser control. A workflow guardrail is a receipt rule you write into the repo, like a child receipt block or a decision stub, that keeps a risky run explainable after it finishes. Claude Code, Anthropic's coding agent, plus Codex and Claude all ship changes faster than a reviewer can reconstruct what happened, so the explanation has to live somewhere other than the chat window.
The reason matters. Once an agent can drive a browser through MCP, a call that looked harmless can pull credentials into the transcript, and the retro apology is always the same. Receipts beat raw autonomy. The fix is not to slow the agent down, it is to make every run leave a trail a stranger can read.
Write a child receipt block for delegated work
Delegation stacks fall apart when each child agent hands back a summary instead of evidence. That is the telephone game, except credentials are on the line. The parent ends up green-lighting a diff it cannot explain.
A child receipt block fixes this. Every child returns three things: the paths it touched, the commands it ran, and the tests that prove the regression guards held. The parent reads the receipt, not a vibe. If a child cannot produce one, the work does not merge.
This is the same receipts-first idea from our note on guardrails for recursive agents, applied one level up the delegation chain.
Force a decision stub into the PR template
CI is green and a reviewer still asks "why this approach?" with no written answer anywhere. The review queue turns into theater: people click approve because the tests pass, not because they understood the choice.
A decision stub moves the debate onto the record. Make the PR template require three lines:
- Constraints considered
- Rejected alternatives
- Verification proof
Now the reviewer checks explicit tradeoffs instead of guessing at intent. That is the part a person can actually verify.
Keep a scope ledger so Claude knows its bounds
Claude, Anysphere's AI code editor, lets you write rules in .mdc files, and the language sounds precise right up until reviewers argue about what it meant. Rules compete with chat memory, and the scope quietly fogs over.
A scope ledger pins it down. Carry a five-line note in the parent chat:
goal: ship the rate-limit fix on the public API
allowed paths: src/api/**, tests/api/**
forbidden paths: src/auth/**, infra/**
verification command: pnpm test --filter api
merge owner: @your-handle
Claude's agent docs describe the machinery for rules and scoping. The ledger decides who owns the call when the machinery and the chat disagree.
State a CLAUDE.md supremacy clause for risky runs
On a shared laptop, Claude Code bash approvals become muscle memory, and permission creeps wider every session. The cure is written precedence. Put a supremacy clause at the top of CLAUDE.md that says which hooks win, which folders require human eyes, and where temporary overrides live.
Here is a delegation boundary snapshot you can drop into a rule file and adapt:
---
description: Delegation boundary snapshot (adapt globs to your repo)
globs:
- "**/*"
alwaysApply: false
---
- Claude: keep scopes explicit in `.mdc`; forbid undeclared MCP domains.
- Claude Code: cite `CLAUDE.md` precedence before expanding bash scope.
- Codex: ensure `AGENTS.md` carries replay-friendly verification notes for CLI runs.
Claude Code's getting started guide covers the setup the clause governs. Once precedence is written down, a session stops inventing policy in the middle of a run.
Check the receipts before you merge
Tooling is load-bearing language. If the repo cannot say what is allowed and what is forbidden, the agent cannot either, and it will guess. Guessing scales badly.
Run a risky PR past these four gates:
| Gate | Question |
|---|---|
| Reviewer path | Can someone unfamiliar trace intent without replaying the chat? |
| Risk routing | Were red folders touched, and who approved? |
| Replay proof | Which commands prove the regression guards held? |
| Receipt match | Does the PR body list scopes plus a verification transcript? |
Then walk a short strip before approval:
- Primary-doc links were smoke-checked after publishing edits.
- MCP connectors named in the PR list their owners.
- Verification command output is pasted or linked.
- Forked agent work lists parent and child responsibilities.
Common questions
-
Why does browser control need guardrails in agents like Claude Code and Codex?
Browser control needs guardrails because an MCP call can look harmless until credentials land in the transcript. The pattern here is receipts over autonomy: a child receipt block, a decision stub in the PR template, a scope ledger, and a
CLAUDE.mdsupremacy clause that settles precedence before the run starts. Each one keeps a risky run explainable after the fact. -
What is a scope ledger in Claude?
A scope ledger is a five-line note in the parent chat: goal, allowed paths, forbidden paths, verification command, and merge owner. It fixes the fog where
.mdclanguage sounds precise until reviewers argue about what it meant. Review then shifts to a clean job: checking the ledger against the actual diff. -
How do teams stop Claude Code permission creep?
Put a supremacy clause at the top of
CLAUDE.md. State which hooks win, which folders require human eyes, and where temporary overrides live. On shared laptops, bash approvals turn into muscle memory, and written precedence is the thing that stops a session from inventing policy mid-run. -
What proof should a PR carry before merge?
A PR should carry a decision stub with three lines: constraints considered, rejected alternatives, and verification proof. The review strip adds the receipts on top: verification command output pasted or linked, MCP connectors listed with owners, and forked work that names both parent and child responsibilities.
-
Where does our methodology fit?
Our methodology is the forcing function. Test proves the behavior is correct, and Review proves the team can explain why the change exists. Receipts are what make the Review step possible, because a reviewer cannot vouch for a diff they had to reconstruct from chat.
Where to go next
Pick one fix and turn it into a shared checklist or repo rule before your next automated run. Our training drills this receipts-first workflow against your own repos and connectors, and the agentic coding governance topic collects the rest of the pattern.
Further reading
Related training topics
Related research

Best practices for agentic coding in real environments
An operating guide to best practices for agentic coding in real environments: rule-file precedence, scope ledgers, replay receipts, connector cards.

Codex workspace agents need repo rules
Codex workspace agents and Claude cloud agents need repo rules: scoped boundary files, connector cards, and replay receipts reviewers can check.

Agentic coding governance that holds in review
Agentic coding governance as an operating guide: connector ownership, scope ledgers, decision stubs, and review receipts for MCP-connected engineering teams.
Continue through the research archive
Newer research
Claude Code 2.1.140: team conventions
Claude Code 2.1.140 team conventions: a skill index for precedence, a hook budget, a CLAUDE TOC, and red-folder approvals reviewers can trace.
Earlier research
Agentic coding governance that holds in review
Agentic coding governance as an operating guide: connector ownership, scope ledgers, decision stubs, and review receipts for MCP-connected engineering teams.