Eighteen months ago, "AI in development" meant autocomplete with attitude. Today an agent on your team can read a ticket, plan a change, run the test suite, and open a pull request worth merging. That isn't a faster autocomplete; it's a different category of teammate — and the way engineering organisations build software is bending around it. This series is for the people deciding how that bend should happen.
This first chapter is the one you read to decide whether the rest of the series is worth your time. There is no selling. There is a clear-eyed look at what has actually shifted, what hasn't, and what the next twelve chapters will and will not cover. Both "keep reading" and "this is not for me yet" are reasonable conclusions.
An assistant waits for you to ask. A copilot suggests while you work. An agent takes a goal and works on it without you in the loop on every step. The agent has tools, a sense of when it's stuck, and the ability to stop and ask. Through 2024 and into 2026, the third category went from "demo" to "Tuesday." That is the shift. Everything else in this series follows from it.
It's tempting to attribute the agentic turn to bigger models. That's part of it, but only a third of the story. Three independent lines crossed thresholds around the same time, and missing any one would have kept us in the demo era.
A 200K-token context window is useless if attention degrades after 30K. The breakthrough wasn't window size — it was effective context. Models that can find a relevant function in chapter 4 of your codebase while fixing a bug in chapter 9. Once that worked, agents could hold a project in their head the way a junior engineer can after a week of onboarding.
Function calling existed in 2023, but it was brittle — mangled arguments, runaway retries, no good story for failure. By late 2025 tool use became boring infrastructure: structured outputs, MCP for tool discovery, deterministic retry loops. Agents stopped fumbling at the edges and started doing real work.
An agent doing thirty minutes of work might emit two million tokens. At 2023 prices that was twenty dollars per task — economical only for the highest-value problems. At 2026 prices the same work is well under a dollar. That is the unglamorous economic line that turned a curiosity into a tool reached for daily.
The reason this matters for decision-making: if you're betting on durability, bet on all three holding. Effective context will keep improving — that's where most research investment goes. Tool reliability is now a commodity. Cost will keep dropping, though slower. The conditions that produced the turn are the conditions that will sustain it.
One of the most useful and least-shared things about working with agents in 2026 is which parts of the practice are genuinely solid versus which are still being figured out. Hype tends to flatten this — everything is either "the future is here" or "snake oil." Reality has texture. Throughout this series, every chapter will tell you which bucket the techniques it covers fall into. We will not pretend "emerging" is "now."
| Maturity | What it means for your bets | Examples |
|---|---|---|
| Now | Production-ready. Adopt with normal change-management. | Agent-in-the-loop coding, test generation, documentation, code review augmentation. |
| Near-term | Works but needs care. Pilot before standardising. | Autonomous bug-fix loops, cross-file refactors, spec-to-code for greenfield features, agents in CI. |
| Emerging | Exciting; not bet-the-business yet. Watch closely. | Fully autonomous PR from ticket, multi-agent orchestration for large features, long-horizon maintenance. |
If your organisation is staffing a multi-quarter initiative on the Emerging row, you are taking on research risk, not engineering risk. Worth doing — sometimes — but worth naming.
If you take only one thing from this chapter, take this: a coding agent is a junior engineer who has read your entire codebase, never forgets a syntax rule, can type at a thousand words per minute, and has the judgment of someone who started yesterday. All four are true at once.
This single model predicts an enormous amount of what you will see:
Where the model breaks: motivation and growth. The agent doesn't get better from feedback the way a junior does. It doesn't carry yesterday's lesson into today's task unless you build the carrying mechanism yourself. This is covered in Ch. 08 on maintenance, and it has organisational implications worth thinking about early.
The leverage point. If you have an engineering culture that knows how to manage juniors well — clear specs, scoped tasks, attentive code review, written conventions — that culture transfers directly to managing agents. If you don't, you'll find that out very quickly.
A few things shift in ways worth naming upfront:
The bottleneck moves up the value chain. When typing isn't the constraint, the constraint becomes knowing what to build. Teams strong on execution but weak on judgment will feel this most. Their existing constraint masked a deeper one; agents reveal it.
Code review becomes the high-leverage activity. Senior engineers spend less time at the keyboard and more time evaluating diffs that were generated faster than humans could write them. The quality bar for reviewers gets higher; the bar for typing speed becomes irrelevant.
Written knowledge becomes load-bearing. The tacit knowledge that lived in heads now has to live in the repo, because agents can't acquire it any other way. Teams that hate documentation are at a real disadvantage. Teams that have it can compound on what they've built.
The cost of a bad decision compounds faster. Agents will help you build the wrong thing as quickly as the right thing. The product-and-strategy work that decides direction is now more important relative to execution, not less.
To make this concrete, imagine a bug ticket lands at 10 a.m.: customers in the EU are seeing prices in USD on the checkout page, started after Tuesday's deploy.
In an old-school workflow the engineer opens the codebase, greps for "currency" or "checkout," forms a hypothesis, tests it, repeats. Twenty to ninety minutes depending on the codebase and luck.
In an agentic workflow the engineer hands the agent the ticket and asks it to investigate. The agent reads the recent commits, finds the refactor that changed which field the checkout page reads currency from, traces the impact, proposes the fix, runs the existing tests, and stops before opening a PR. The engineer reads the proposal, notices the cart page has similar logic, asks the agent to check there too. It does. The fix lands in roughly fifteen minutes total.
What just happened: the agent did the boring archaeology — reading commits, tracing data flow, finding the regression. The engineer did the interesting part — recognising that if checkout has this bug, cart probably does too. The agent's strength (thoroughness, speed) and the engineer's (judgment, pattern recognition) composed cleanly.
If that exchange feels familiar, you're already doing agentic SDLC. The rest of this series is about doing it deliberately, at scale, across the whole lifecycle — not just bug fixes.
Will cover: the practice of building software when an agent is part of the team. Concrete patterns, honest failure modes, the unglamorous infrastructure (testing, CI, observability) that agents need around them to be safe at scale, and the organisational decisions that determine whether the practice compounds or quietly degrades.
Will not cover:
This series assumes a working engineering background. More specifically:
What you do not need: ML expertise. You will not train a model in this series. You will not fine-tune one. You will not read a paper. This is engineering practice, not research.
Pick a coding agent (Claude Code, Cursor, Aider — choice doesn't matter much yet). Give it one of these tasks on a personal project: add a CLI flag to a script, write a test suite for an untested file, or refactor a function for readability. Note what surprised you and what was disappointing.
Find a real task — something you'd normally do yourself in an hour. Try to drive it entirely through the agent: don't touch the keyboard except to type instructions and review diffs. Notice where you wanted to take over and ask why.
List the three product or technical decisions on your team's roadmap where "we can't move faster because we don't have engineering capacity" is the stated blocker. Then ask: how many of those would still be blocked if engineering capacity tripled tomorrow? The answer reveals whether your bottleneck is execution or direction.
The point of the exercises in this series isn't to teach you a technique. It's to make sure you've put your hands on the thing before reading the next abstract concept about it. Chapters get more abstract; the exercises stay concrete. Don't skip them.
Next chapter: Foundations — the vocabulary. The terms that recur through every subsequent chapter, organised into five families. The decisions you make about agentic SDLC are made in this language; worth knowing it well.
Sign in to join the discussion and post comments.
Sign in