Back to Research

Local First, CI Second

Shared CI runners slow agent loops and hide state. Keep first-pass verification local, then use CI for final checks.

Editorial illustration for Local First, CI Second. Agentic coding works best when the loop is short, local, and easy to inspect.
Rogier MullerApril 22, 20265 min read

Agentic coding works best when the loop is short, local, and easy to inspect. Shared CI runners push in the opposite direction. They add queue time, hide machine state, and turn verification into a remote step.

That does not make CI obsolete. It means the runner is often the wrong place for the first pass. If an agent is making small code changes, checking them, and iterating quickly, a shared runner is usually too slow and too opaque.

The main issue is feedback quality. Agents do better when they can see the filesystem, rerun the same commands, and keep the environment stable across edits. A remote runner resets that context every time. If the task depends on local state, cached dependencies, or a specific setup path, the agent spends time rediscovering the environment instead of improving the code.

Why shared runners break the loop

Latency is the first problem. Even a short queue can break a tight edit-test cycle. Humans can switch tasks while they wait. Agents do not benefit from waiting.

State loss is the second problem. A runner is usually disposable. That is useful for isolation, but bad for iterative work. If the agent needs to inspect a failing test, patch a file, rerun the same command, and compare output, each step feels like a fresh start.

Observability is the third problem. CI logs show output, but not the full working context: local files, uncommitted changes, environment drift, or the exact command sequence that led to the failure. For agentic work, that missing context matters.

A better pattern

Use the local machine or a dedicated workspace for the first verification loop. Keep the loop close to the code. Then hand off to CI when the change is ready for broader validation.

A workable pattern looks like this:

  • Agent edits code in a local checkout or isolated workspace.
  • Agent runs the smallest useful test or build command locally.
  • Agent fixes failures immediately while the context is still warm.
  • Human reviews the diff and the local output.
  • CI runs after the change is stable, as a final gate.

This is a division of labor. Local execution is for fast iteration. CI is for shared verification.

What to use instead of runner-first workflows

The replacement does not need to be fancy. It needs to be predictable.

A good setup usually has three pieces:

  1. A reproducible local environment.
  2. A narrow command set for the agent to run.
  3. A clear rule for when to escalate to CI.

Reproducibility matters more than raw isolation. If the agent can recreate the same dependencies and commands every time, it can reason about failures. That may be a dev container, a pinned package manager, or a scripted bootstrap. The exact tool matters less than the consistency.

The command set should be small. If the agent has ten ways to validate a change, it will waste time choosing. Give it the one or two checks that actually prove the change. For example: one focused test file, one lint pass, one build command.

Escalation rules should be explicit. A useful rule is: local checks first for every edit, CI only after the diff is coherent. Another is: if the task touches deployment, permissions, or cross-package behavior, run CI earlier. The point is to avoid using the runner as the default place where the agent learns what is broken.

Tradeoffs and limits

Local-first verification is not always enough. It can miss integration failures, platform-specific issues, and problems that only appear in a clean environment. Teams with strict compliance or security constraints may also need more isolation than a developer laptop can provide.

There is also a maintenance cost. A stable local workspace takes discipline. If setup scripts rot, the agent will inherit that mess. If the local environment diverges from production, the feedback loop becomes misleading.

So the recommendation is not “never use runners.” It is “do not make runners the center of the agent loop.” Use them for final checks, cross-environment validation, and release gates. Do not use them as the first place where every small edit has to prove itself.

A simple implementation plan

Start with one repository and one task type. Pick a workflow where agents already spend time waiting on verification.

Then do this:

  • Measure the current edit-to-feedback time.
  • Move the first test run to a local or dedicated workspace.
  • Keep the same command, but run it before CI.
  • Record whether the agent needs fewer retries.
  • Compare the failure modes after a week.

If the local loop is faster and the results are still trustworthy, expand it. If it is faster but flaky, fix the environment before widening use.

That last part is the real test. Speed without trust is not useful. Agentic teams need both.

A small methodology note

This is a Build problem more than a policy problem. The useful step is to change the loop, not just the rule. Our methodology notes on Build are a good reminder to make the smallest working version first, then tighten it with evidence.

Bottom line

Shared runners are good at isolation and final verification. They are poor at fast agent iteration. If your team wants agents to move quickly without losing control, keep the first loop local, keep the commands narrow, and reserve CI for the checks that actually need it.

Related research

Ready to start?

Transform how your team builds software today.

Get in touch