On this tutorial

Agentic SDLC: A Field Manual for Building Software with AI Agents

Foundations

Phases

Synthesis

Capstone

Capstone — a feature, end to end

Why agentic SDLC, and why now

Eighteen months ago, "AI in development" meant autocomplete with attitude. Today an agent on your team can read a ticket, plan a change, run the test suite, and open a pull request worth merging. That isn't a faster autocomplete; it's a different category of teammate — and the way engineering organisations build software is bending around it. This series is for the people deciding how that bend should happen.

This first chapter is the one you read to decide whether the rest of the series is worth your time. There is no selling. There is a clear-eyed look at what has actually shifted, what hasn't, and what the next twelve chapters will and will not cover. Both "keep reading" and "this is not for me yet" are reasonable conclusions.

What you'll take away from this chapter

The distinction between an assistant, a copilot, and an agent — and why it matters for org-level decisions, not just tool selection
The three convergent shifts in 2025–2026 that made the agentic SDLC viable (it wasn't only better models)
An honest maturity map: what's production-ready today, what's near-term, what's still emerging
The mental model that predicts most agent behavior on the first try
Who this series is for, and who it isn't

The shift, in one paragraph

An assistant waits for you to ask. A copilot suggests while you work. An agent takes a goal and works on it without you in the loop on every step. The agent has tools, a sense of when it's stuck, and the ability to stop and ask. Through 2024 and into 2026, the third category went from "demo" to "Tuesday." That is the shift. Everything else in this series follows from it.

Why now — three lines crossed at once

It's tempting to attribute the agentic turn to bigger models. That's part of it, but only a third of the story. Three independent lines crossed thresholds around the same time, and missing any one would have kept us in the demo era.

Effective context

A 200K-token context window is useless if attention degrades after 30K. The breakthrough wasn't window size — it was effective context. Models that can find a relevant function in chapter 4 of your codebase while fixing a bug in chapter 9. Once that worked, agents could hold a project in their head the way a junior engineer can after a week of onboarding.

Tool reliability

Function calling existed in 2023, but it was brittle — mangled arguments, runaway retries, no good story for failure. By late 2025 tool use became boring infrastructure: structured outputs, MCP for tool discovery, deterministic retry loops. Agents stopped fumbling at the edges and started doing real work.

Cost economics

An agent doing thirty minutes of work might emit two million tokens. At 2023 prices that was twenty dollars per task — economical only for the highest-value problems. At 2026 prices the same work is well under a dollar. That is the unglamorous economic line that turned a curiosity into a tool reached for daily.

None of the three lines crosses the viability threshold alone. The agentic turn happened when all three did, around the same time.

The reason this matters for decision-making: if you're betting on durability, bet on all three holding. Effective context will keep improving — that's where most research investment goes. Tool reliability is now a commodity. Cost will keep dropping, though slower. The conditions that produced the turn are the conditions that will sustain it.

The maturity map — what's safe to bet on today

One of the most useful and least-shared things about working with agents in 2026 is which parts of the practice are genuinely solid versus which are still being figured out. Hype tends to flatten this — everything is either "the future is here" or "snake oil." Reality has texture. Throughout this series, every chapter will tell you which bucket the techniques it covers fall into. We will not pretend "emerging" is "now."

Maturity	What it means for your bets	Examples
Now	Production-ready. Adopt with normal change-management.	Agent-in-the-loop coding, test generation, documentation, code review augmentation.
Near-term	Works but needs care. Pilot before standardising.	Autonomous bug-fix loops, cross-file refactors, spec-to-code for greenfield features, agents in CI.
Emerging	Exciting; not bet-the-business yet. Watch closely.	Fully autonomous PR from ticket, multi-agent orchestration for large features, long-horizon maintenance.

If your organisation is staffing a multi-quarter initiative on the Emerging row, you are taking on research risk, not engineering risk. Worth doing — sometimes — but worth naming.

The mental model that predicts most agent behavior

If you take only one thing from this chapter, take this: a coding agent is a junior engineer who has read your entire codebase, never forgets a syntax rule, can type at a thousand words per minute, and has the judgment of someone who started yesterday. All four are true at once.

This single model predicts an enormous amount of what you will see:

The agent will write code that compiles and passes tests but does the wrong thing — because juniors do that.
The agent will make a "small change" that ripples across the codebase — because juniors do that.
The agent will confidently propose a solution that experienced engineers know is the wrong approach — because juniors do that.
The agent will catch bugs you missed because it actually read every file it touched — because a thorough junior with photographic memory would.

Where the model breaks: motivation and growth. The agent doesn't get better from feedback the way a junior does. It doesn't carry yesterday's lesson into today's task unless you build the carrying mechanism yourself. This is covered in Ch. 08 on maintenance, and it has organisational implications worth thinking about early.

The leverage point. If you have an engineering culture that knows how to manage juniors well — clear specs, scoped tasks, attentive code review, written conventions — that culture transfers directly to managing agents. If you don't, you'll find that out very quickly.

What changes for leadership, specifically

A few things shift in ways worth naming upfront:

The bottleneck moves up the value chain. When typing isn't the constraint, the constraint becomes knowing what to build. Teams strong on execution but weak on judgment will feel this most. Their existing constraint masked a deeper one; agents reveal it.

Code review becomes the high-leverage activity. Senior engineers spend less time at the keyboard and more time evaluating diffs that were generated faster than humans could write them. The quality bar for reviewers gets higher; the bar for typing speed becomes irrelevant.

Written knowledge becomes load-bearing. The tacit knowledge that lived in heads now has to live in the repo, because agents can't acquire it any other way. Teams that hate documentation are at a real disadvantage. Teams that have it can compound on what they've built.

The cost of a bad decision compounds faster. Agents will help you build the wrong thing as quickly as the right thing. The product-and-strategy work that decides direction is now more important relative to execution, not less.

An honest taste — what an agentic workflow looks like

To make this concrete, imagine a bug ticket lands at 10 a.m.: customers in the EU are seeing prices in USD on the checkout page, started after Tuesday's deploy.

In an old-school workflow the engineer opens the codebase, greps for "currency" or "checkout," forms a hypothesis, tests it, repeats. Twenty to ninety minutes depending on the codebase and luck.

In an agentic workflow the engineer hands the agent the ticket and asks it to investigate. The agent reads the recent commits, finds the refactor that changed which field the checkout page reads currency from, traces the impact, proposes the fix, runs the existing tests, and stops before opening a PR. The engineer reads the proposal, notices the cart page has similar logic, asks the agent to check there too. It does. The fix lands in roughly fifteen minutes total.

What just happened: the agent did the boring archaeology — reading commits, tracing data flow, finding the regression. The engineer did the interesting part — recognising that if checkout has this bug, cart probably does too. The agent's strength (thoroughness, speed) and the engineer's (judgment, pattern recognition) composed cleanly.

If that exchange feels familiar, you're already doing agentic SDLC. The rest of this series is about doing it deliberately, at scale, across the whole lifecycle — not just bug fixes.

What this series will and will not cover

Will cover: the practice of building software when an agent is part of the team. Concrete patterns, honest failure modes, the unglamorous infrastructure (testing, CI, observability) that agents need around them to be safe at scale, and the organisational decisions that determine whether the practice compounds or quietly degrades.

Will not cover:

Vendor reviews. Tools are mentioned to illustrate concepts; nothing is ranked.
Model internals. Models are treated as black boxes you call.
Prompt engineering basics. That is the prerequisite series — start there if "system prompt", "few-shot", "chain of thought", and "tool use" aren't already familiar.
Predictions about AGI. We are here to ship software.

Honest prerequisites

This series assumes a working engineering background. More specifically:

You can read code in at least one language. Examples lean toward Python and TypeScript; the patterns transfer.
You've worked with an LLM beyond toy prompts. If you've never written a system prompt or wired up function calling, start with the Prompt Engineering tutorial first.
You've done code review. Not just opened PRs — actually reviewed someone else's work and pushed back on a design choice. Agentic SDLC is mostly code review, with you as the senior.

What you do not need: ML expertise. You will not train a model in this series. You will not fine-tune one. You will not read a paper. This is engineering practice, not research.

Practice — before you read the next chapter

If you're new to agents

Pick a coding agent (Claude Code, Cursor, Aider — choice doesn't matter much yet). Give it one of these tasks on a personal project: add a CLI flag to a script, write a test suite for an untested file, or refactor a function for readability. Note what surprised you and what was disappointing.

If you've used agents casually

Find a real task — something you'd normally do yourself in an hour. Try to drive it entirely through the agent: don't touch the keyboard except to type instructions and review diffs. Notice where you wanted to take over and ask why.

If you lead engineering

List the three product or technical decisions on your team's roadmap where "we can't move faster because we don't have engineering capacity" is the stated blocker. Then ask: how many of those would still be blocked if engineering capacity tripled tomorrow? The answer reveals whether your bottleneck is execution or direction.

The point of the exercises in this series isn't to teach you a technique. It's to make sure you've put your hands on the thing before reading the next abstract concept about it. Chapters get more abstract; the exercises stay concrete. Don't skip them.

Takeaways

The agentic turn is real, and it happened because three things crossed thresholds together: effective context, tool reliability, and cost. Bet on the durability of all three.
Assistant, copilot, agent — they are different categories. Agents take goals; the others take inputs. The rest of this series is about goal-taking systems.
Mental model: junior with photographic memory. Strong on thoroughness and speed, weak on judgment. Compose accordingly.
The maturity map — Now, Near-term, Emerging — should inform where you take engineering risk and where you take research risk. They are different things.
For leadership: the bottleneck moves to direction, review becomes high-leverage, written knowledge becomes load-bearing, bad decisions compound faster.

Next chapter: Foundations — the vocabulary. The terms that recur through every subsequent chapter, organised into five families. The decisions you make about agentic SDLC are made in this language; worth knowing it well.

Discussion

Foundations — the cross-cutting concepts