Back to Research

Recursive agents, guardrails that hold

Practical agentic coding governance for team training, MCP boundaries, and reviewable artifacts across tools.

Editorial illustration for Recursive agents, guardrails that hold. The new signal is not just that an agent can do more.
Rogier MullerMay 10, 20265 min read

The situation

The new signal is not just that an agent can do more. It is that one task can now be split into a parent agent plus recursive sub-agents, with the parent handling research, implementation, and verification. That changes how teams think about agentic coding governance: the risk is no longer only bad code, but also bad delegation, hidden tool use, and context drift that never gets reviewed.

For engineering teams running an ai coding workshop or standardizing ai coding training, the useful question is simple: what needs to be written down so multi-agent work stays reviewable? Usually it is not a bigger prompt. It is a small set of durable artifacts: scoped rules, memory files, skills, connector boundaries, and a verification loop people can inspect.

The official signal from Claude is that /orchestrate can recursively spawn agents through the Claude SDK, and that the team used it to research internal skills and improve cold-start performance. Treat that as an early pattern, not a finished operating model. The lesson is broader than one product: once agents can delegate, teams need clear guardrails for what each agent may read, change, and call.

If you are comparing Claude, Claude Code, and Codex, the shared standard is straightforward: keep instructions close to the repo, keep connectors narrow, and make every agent-authored change land in a reviewable artifact. That is the practical center of ai coding governance.

Walkthrough

  1. Start with one team rule that defines the boundary.

    Write down what the parent agent may delegate, what it may not touch, and what must be verified by a human. In Claude, that usually means a scoped .cursor/rules/*.mdc file plus AGENTS.md for repo conventions. In Claude Code, the anchor is CLAUDE.md. In Codex, use AGENTS.md and, when needed, a temporary AGENTS.override.md for short-lived exceptions.

---
description: Agent delegation and review boundary for this repo
globs:
  - "**/*"
alwaysApply: true
---
- Prefer small delegated tasks over broad autonomous edits.
- Do not change auth, billing, or deployment files without explicit review.
- Every agent run must end with a verification step and a summary of changed files.
  1. Give each tool one concrete artifact to own.

    Claude: use a small .mdc rule for repo-specific behavior, and keep broader conventions in AGENTS.md. If you are using background agents or subagents, make the handoff explicit: what the child agent may inspect, and what summary it must return.

    Claude Code: keep durable project memory in CLAUDE.md, then add a skill when the task is repeatable but not always-on. Use hooks for deterministic checks such as formatting, permission boundaries, or logging. If the task depends on external systems, review the MCP connector scope before enabling it.

    Codex: use AGENTS.md for instruction discovery, then run a verification loop in the CLI so the model’s output is checked against tests, lint, or a dry run. If a task needs temporary policy changes, prefer an override file over editing the permanent repo rules.

  2. Make delegation narrow enough to audit.

    Recursive orchestration works best when the child agent has one job: research one file tree, draft one patch, or validate one failure mode. Broad “fix the repo” prompts are where teams lose traceability. A good test is whether a reviewer can answer three questions from the final diff: what changed, why it changed, and what verified it.

  3. Put MCP behind a permission review.

    MCP is useful when the task needs GitHub, Slack, docs, Jira, or a database. It is also where governance gets real. Review connector scope before rollout, and treat least privilege as the default. If a skill or subagent can reach external systems, the evaluation should include connector behavior, not just model output.

  4. Keep the verification loop visible.

    The most reliable pattern across tools is still: delegate, inspect, verify, then merge. For Claude Code, that may mean a review checklist plus hooks. For Codex, it may mean a CLI run that ends in tests or a sandboxed check. For Claude, it may mean a background-agent PR policy that requires a human-readable summary and scoped diff.

A practical team exercise is to take one bloated instruction file and split it into a small rule tree. That usually surfaces the real governance gaps faster than a slide deck. It also fits a useful methodology habit: in the Review step, check the artifact, not the agent’s confidence.

Tradeoffs and limits

Recursive agents can reduce token use and speed up cold starts, but they also increase coordination overhead. If the parent agent delegates poorly, you get fragmented context and summaries that omit the reason a choice was made. If the child agent has too much access, you get hidden side effects.

Skills help with repeatable work, but they are not a substitute for repo rules. Memory files help with continuity, but they can drift if they grow into a junk drawer. Hooks are deterministic, but they can become brittle if they try to enforce policy that should live in review.

MCP boundaries deserve special caution. The more external systems an agent can touch, the more your evaluation needs to include permissions, failure modes, and rollback behavior. That is true whether the surface is an IDE, a terminal CLI, or a browser-connected workflow.

The main limit of the current signal is that it is still an early product note. The performance claims are interesting, but teams should verify them against official docs, changelogs, and their own repo metrics before adopting the pattern broadly.

Further reading

Related training topics

Related research

Continue through the research archive

Ready to start?

Transform how your team builds software.

Get in touch