Most of what's been written about coding with an agent is about the tools — which IDE, which model, which keystroke. Useful, but it skips the harder question: what are the habits that separate the engineers who get 2× from agents from the ones who get 0.5×? Same tool, same model, dramatically different results. The variance isn't about typing speed; it's about how you split work, what you delegate, when you intervene, and how you read what came back.
This is the longest chapter in the foundation half of the series. It's also the most practical. Treat it as a field guide.
You're never just "using an agent." You're in one of four distinct modes, and the failure modes are different in each.
You're at the keyboard, exploring or prototyping. The agent provides inline suggestions, but you decide what to write next. This is the "copilot" experience. Good for: working in unfamiliar territory, prototyping a UI to figure out what it should look like, learning a new library. Bad for: well-specced tasks where the agent could just do it.
You and the agent take turns. The agent drafts; you review and edit; you ask for the next piece; repeat. This is the dominant mode for day-to-day feature work in 2026. Pair mode produces the best results when the task is clear but the design isn't yet locked.
You give the agent an open-ended investigation task and let it run. "Why is the staging deploy failing intermittently?" "What's the smallest change to add OAuth here?" The agent does the archaeology; you read the report. Good for bug triage, codebase exploration, design research. Bad when you secretly know the answer and want code instead of a report.
You hand over a fully-specified task and let the agent complete it end-to-end. You don't watch the typing; you review the diff at the end. This is where the 2× productivity stories come from — and where the 0.5× horror stories come from too. Delegate mode requires real spec discipline (Ch. 02). With a sloppy spec, you'll review three rounds of wrong work before getting something usable.
The single most common mistake new agentic developers make is delegating too much in one shot. "Add OAuth, refactor the auth middleware, migrate the user table, and write tests for all of it." The agent will try. It will produce a 3,000-line diff. You'll spend two hours reviewing it, find issues, the agent will fix some and break others, and you'll end up doing it yourself.
One diff, one purpose. A delegated task should produce a diff small enough that the purpose is obvious from the diff alone. If the purpose needs explaining in the PR description beyond a single sentence, the task was too big.
Practically: 50–400 lines of diff. More than that, split the task. The agent itself will help you split — give it the big task and ask it to plan the decomposition before doing the work. Plan mode (Ch. 01) was designed for exactly this.
Bad split: "First, modify all the type definitions. Then, modify all the handlers. Then, the tests." Each step touches dozens of files; none of them ships value on its own.
Good split: "First, add the new endpoint end-to-end with a stub implementation. Then, wire up the real database query. Then, add caching. Then, add metrics." Each step touches a few files; each one is independently reviewable and could ship.
Splits by seam are how human engineers naturally work. Agents will happily split by file if you let them; insist on seam-based splits and the work goes smoother.
The hardest skill in pair and delegate modes is knowing when to interrupt. Interrupt too often and you lose the speedup; let too much drift accumulate and you'll be deep in wrong work. The checklist below codifies what experienced agentic engineers do almost reflexively.
| Interrupt immediately | Wait and see | Never interrupt for |
|---|---|---|
| Agent modifies files outside the task's scope | First step looks weird but plausible | Style preferences (note them, fix later) |
| Agent invents an API that doesn't exist | Agent is in repetitive cleanup mode | Small implementation details (read the diff) |
| Agent's plan misunderstands the goal | Agent is running tests and reacting | "I would have done it differently" |
| About to do something irreversible | Agent acknowledged uncertainty | |
| You feel the impulse to look something up |
The last "interrupt" row deserves attention. When you feel yourself reaching to look something up — Stack Overflow, the docs, the wiki — that's a signal you're losing context. Pause and reorient. Powering through is how you end up an hour deep in something the agent has already drifted on.
Agent diffs read differently from human diffs. Three patterns to watch for, in order of frequency:
The agent writes code that pattern-matches against examples in the codebase, but uses the pattern in a context where it doesn't apply. The code looks like the rest of the file but doesn't quite do what it should. Hunt for these in the parts of the diff that match the surrounding style most strongly — that's where the agent is most likely to have pattern-matched without thinking.
The agent writes tests that pass; the tests pass because they test what the code does, not what it should do. Symptom: a test asserting on the implementation rather than the behavior. We'll go deeper on this in Ch. 05.
The agent did beautiful work — clean code, good tests, clear naming. The work is also five times the size the task needed because the agent over-interpreted the scope. Symptom: a feature task with a refactor inside it; a refactor task with a feature inside it. Send it back with a tighter scope, not "almost there" approval.
What good diffs look like: surprisingly mundane. Small, focused changes that match the task statement. Tests that assert behavior, not implementation. Comments that explain non-obvious choices. New files in conventional locations. If reviewing a diff feels boring, it's probably good.
The difference between top-quartile and bottom-quartile users of agents condenses, across maybe a hundred teams I've watched, into five habits.
The diff tells you what the agent did. The trajectory tells you how it decided. Spending two minutes scanning the trajectory often reveals the agent went down a wrong path early and recovered — meaning the diff might be correct but the design choice was made under wrong assumptions. Worth a quick look before approving.
Most agent harnesses let you set a project-specific system prompt or instructions file. Use it. Add to it every time you find yourself correcting the agent on the same thing twice. After a few weeks the file knows your team's conventions; the agent stops repeating the same mistakes. The project prompt is shared infrastructure, not personal config — own it as a team.
The file typically grows to include sections like: code style, things to avoid, codebase conventions, and "where to look" pointers by module. Short bullets, no prose. The discipline is to edit it over time, not just append.
When you delegate, set a budget: "if this isn't working in 20 minutes of agent time, stop and we'll talk." Without a budget, an agent will sometimes loop on a difficult problem for an hour. With a budget, you intervene early on the hard ones and keep flow on the easy ones.
Imagine a senior engineer reviewing the agent's diff over your shoulder. What would they push back on? Naming, scope, missing edge cases, tests that don't test, comments that don't help. Channel that voice when reviewing. It's the single best filter against shipping mediocre agent output.
When a prompt works well — produces a clean diff, the right scope, sensible decisions — save it. Build up a small personal or team library. The patterns that work for you often keep working with small variations. The patterns that fail teach you what to add to the project prompt.
If your agentic sessions don't have moments where the agent catches something you would have missed, you might be reviewing too defensively. If your sessions never have moments where you catch something the agent missed, you might be reviewing too loosely. Aim for sessions where both happen.
This is the practical test for whether the collaboration is healthy. Either side dominating is a smell. The work is collaborative when both parties contribute something the other couldn't.
Pick one task you'd normally do in 30–60 minutes. Use the agent in pair mode for the whole thing — no copy-paste from the chat, no "let me just do this part myself." Note three moments: when you wanted to intervene but didn't, when you wished you had intervened earlier, and when the agent did something better than you would have.
For one full day, before each delegated task, write down (a) which of the four modes you're entering, (b) what you expect the diff to look like, and (c) your stop-budget. At end of day, compare expectations to outcomes. The deltas reveal where your intuitions about agent capabilities need updating.
Pull the project prompt for one of your team's active codebases (if you have one — if not, that's the first finding). When was each entry added, and why? Are there recurring corrections still missing? Treat the file like a small product the team maintains together.
Next chapter: Testing — why agents will happily write tests that pass and don't test anything, and how to set up a workflow that catches it.
Sign in to join the discussion and post comments.
Sign in