On this tutorial

Agentic SDLC: A Field Manual for Building Software with AI Agents

Foundations

Phases

Synthesis

Capstone

Capstone — a feature, end to end

Coding — the day-to-day with an agent

Most of what's been written about coding with an agent is about the tools — which IDE, which model, which keystroke. Useful, but it skips the harder question: what are the habits that separate the engineers who get 2× from agents from the ones who get 0.5×? Same tool, same model, dramatically different results. The variance isn't about typing speed; it's about how you split work, what you delegate, when you intervene, and how you read what came back.

This is the longest chapter in the foundation half of the series. It's also the most practical. Treat it as a field guide.

What you'll take away from this chapter

The four modes of working with a coding agent and when each one is right
How to size a task before delegating — the "one diff, one purpose" rule
The intervention checklist: when to interrupt, when to let it run, when never to step in
How to read a diff in a way that catches agent-specific failure modes
The five habits that separate top-quartile users from the rest

The four modes

You're never just "using an agent." You're in one of four distinct modes, and the failure modes are different in each.

Four modes on two axes: how much autonomy the agent has, and how clear the task is. Misreading which quadrant you're in is the most common source of wasted time.

Drive

You're at the keyboard, exploring or prototyping. The agent provides inline suggestions, but you decide what to write next. This is the "copilot" experience. Good for: working in unfamiliar territory, prototyping a UI to figure out what it should look like, learning a new library. Bad for: well-specced tasks where the agent could just do it.

Pair

You and the agent take turns. The agent drafts; you review and edit; you ask for the next piece; repeat. This is the dominant mode for day-to-day feature work in 2026. Pair mode produces the best results when the task is clear but the design isn't yet locked.

Explore

You give the agent an open-ended investigation task and let it run. "Why is the staging deploy failing intermittently?" "What's the smallest change to add OAuth here?" The agent does the archaeology; you read the report. Good for bug triage, codebase exploration, design research. Bad when you secretly know the answer and want code instead of a report.

Delegate

You hand over a fully-specified task and let the agent complete it end-to-end. You don't watch the typing; you review the diff at the end. This is where the 2× productivity stories come from — and where the 0.5× horror stories come from too. Delegate mode requires real spec discipline (Ch. 02). With a sloppy spec, you'll review three rounds of wrong work before getting something usable.

Sizing tasks: the "one diff, one purpose" rule

The single most common mistake new agentic developers make is delegating too much in one shot. "Add OAuth, refactor the auth middleware, migrate the user table, and write tests for all of it." The agent will try. It will produce a 3,000-line diff. You'll spend two hours reviewing it, find issues, the agent will fix some and break others, and you'll end up doing it yourself.

One diff, one purpose. A delegated task should produce a diff small enough that the purpose is obvious from the diff alone. If the purpose needs explaining in the PR description beyond a single sentence, the task was too big.

Practically: 50–400 lines of diff. More than that, split the task. The agent itself will help you split — give it the big task and ask it to plan the decomposition before doing the work. Plan mode (Ch. 01) was designed for exactly this.

Split by seam, not by file

Bad split: "First, modify all the type definitions. Then, modify all the handlers. Then, the tests." Each step touches dozens of files; none of them ships value on its own.

Good split: "First, add the new endpoint end-to-end with a stub implementation. Then, wire up the real database query. Then, add caching. Then, add metrics." Each step touches a few files; each one is independently reviewable and could ship.

Splits by seam are how human engineers naturally work. Agents will happily split by file if you let them; insist on seam-based splits and the work goes smoother.

The intervention checklist

The hardest skill in pair and delegate modes is knowing when to interrupt. Interrupt too often and you lose the speedup; let too much drift accumulate and you'll be deep in wrong work. The checklist below codifies what experienced agentic engineers do almost reflexively.

Interrupt immediately	Wait and see	Never interrupt for
Agent modifies files outside the task's scope	First step looks weird but plausible	Style preferences (note them, fix later)
Agent invents an API that doesn't exist	Agent is in repetitive cleanup mode	Small implementation details (read the diff)
Agent's plan misunderstands the goal	Agent is running tests and reacting	"I would have done it differently"
About to do something irreversible	Agent acknowledged uncertainty
You feel the impulse to look something up

The last "interrupt" row deserves attention. When you feel yourself reaching to look something up — Stack Overflow, the docs, the wiki — that's a signal you're losing context. Pause and reorient. Powering through is how you end up an hour deep in something the agent has already drifted on.

Reading agent-produced diffs

Agent diffs read differently from human diffs. Three patterns to watch for, in order of frequency:

The "looks plausible but" pattern

The agent writes code that pattern-matches against examples in the codebase, but uses the pattern in a context where it doesn't apply. The code looks like the rest of the file but doesn't quite do what it should. Hunt for these in the parts of the diff that match the surrounding style most strongly — that's where the agent is most likely to have pattern-matched without thinking.

The "passes tests but does the wrong thing" pattern

The agent writes tests that pass; the tests pass because they test what the code does, not what it should do. Symptom: a test asserting on the implementation rather than the behavior. We'll go deeper on this in Ch. 05.

The "thoroughly correct, totally wrong scope" pattern

The agent did beautiful work — clean code, good tests, clear naming. The work is also five times the size the task needed because the agent over-interpreted the scope. Symptom: a feature task with a refactor inside it; a refactor task with a feature inside it. Send it back with a tighter scope, not "almost there" approval.

What good diffs look like: surprisingly mundane. Small, focused changes that match the task statement. Tests that assert behavior, not implementation. Comments that explain non-obvious choices. New files in conventional locations. If reviewing a diff feels boring, it's probably good.

The five habits

The difference between top-quartile and bottom-quartile users of agents condenses, across maybe a hundred teams I've watched, into five habits.

1 · Read the trajectory, not just the diff

The diff tells you what the agent did. The trajectory tells you how it decided. Spending two minutes scanning the trajectory often reveals the agent went down a wrong path early and recovered — meaning the diff might be correct but the design choice was made under wrong assumptions. Worth a quick look before approving.

2 · Maintain a project prompt that grows with the codebase

Most agent harnesses let you set a project-specific system prompt or instructions file. Use it. Add to it every time you find yourself correcting the agent on the same thing twice. After a few weeks the file knows your team's conventions; the agent stops repeating the same mistakes. The project prompt is shared infrastructure, not personal config — own it as a team.

The file typically grows to include sections like: code style, things to avoid, codebase conventions, and "where to look" pointers by module. Short bullets, no prose. The discipline is to edit it over time, not just append.

3 · Time-box delegation

When you delegate, set a budget: "if this isn't working in 20 minutes of agent time, stop and we'll talk." Without a budget, an agent will sometimes loop on a difficult problem for an hour. With a budget, you intervene early on the hard ones and keep flow on the easy ones.

4 · Keep the senior reviewer's voice on

Imagine a senior engineer reviewing the agent's diff over your shoulder. What would they push back on? Naming, scope, missing edge cases, tests that don't test, comments that don't help. Channel that voice when reviewing. It's the single best filter against shipping mediocre agent output.

5 · Capture good prompts

When a prompt works well — produces a clean diff, the right scope, sensible decisions — save it. Build up a small personal or team library. The patterns that work for you often keep working with small variations. The patterns that fail teach you what to add to the project prompt.

The corollary nobody states

If your agentic sessions don't have moments where the agent catches something you would have missed, you might be reviewing too defensively. If your sessions never have moments where you catch something the agent missed, you might be reviewing too loosely. Aim for sessions where both happen.

This is the practical test for whether the collaboration is healthy. Either side dominating is a smell. The work is collaborative when both parties contribute something the other couldn't.

Practice — before you read the next chapter

If you're new to agentic coding

Pick one task you'd normally do in 30–60 minutes. Use the agent in pair mode for the whole thing — no copy-paste from the chat, no "let me just do this part myself." Note three moments: when you wanted to intervene but didn't, when you wished you had intervened earlier, and when the agent did something better than you would have.

If you're already coding with an agent regularly

For one full day, before each delegated task, write down (a) which of the four modes you're entering, (b) what you expect the diff to look like, and (c) your stop-budget. At end of day, compare expectations to outcomes. The deltas reveal where your intuitions about agent capabilities need updating.

If you lead a team

Pull the project prompt for one of your team's active codebases (if you have one — if not, that's the first finding). When was each entry added, and why? Are there recurring corrections still missing? Treat the file like a small product the team maintains together.

Takeaways

Four modes: drive, pair, explore, delegate. Misreading the mode is the most common source of wasted time.
One diff, one purpose. Split by seam, not by file.
The intervention checklist matters more than raw skill. Knowing when to interrupt is most of the craft.
Read trajectories habitually, not just diffs.
Five habits separate top-quartile users: read trajectories, maintain a project prompt, time-box delegation, keep the senior's voice on, capture good prompts.
Healthy collaboration shows up as both parties catching things the other missed. If only one side is contributing, something is off.

Next chapter: Testing — why agents will happily write tests that pass and don't test anything, and how to set up a workflow that catches it.

Discussion

Design — architecture for agent-maintained code Testing — when the agent writes the tests