On this tutorial

Agentic SDLC: A Field Manual for Building Software with AI Agents

Foundations

Phases

Synthesis

Capstone

Capstone — a feature, end to end

Foundations — the cross-cutting concepts

Every field has its own vocabulary, and you cannot read the literature — or argue productively at a staff meeting — until you have it. Agentic SDLC is unusual in that the vocabulary is still being settled. People use the same word for three different things, and three different words for the same thing. This chapter pins down the terms that recur through every chapter after this one, framed as the concepts you need to make decisions, not the implementation details of how they work.

Read it once now and come back as a reference. The exercises aren't optional even if you think you already know all the words.

What you'll take away from this chapter

The eighteen terms that recur throughout the series, grouped into five families
The three architectural patterns that organise most agentic systems — and the choice each one represents
The difference between an agent, a sub-agent, and a tool call, and why confusing them breaks designs
What MCP is, what it isn't, and why it became connective tissue in the 2026 ecosystem
How to read trajectories — the closest thing this field has to a stack trace, and the only artifact that lets you debug "why did the agent decide that"

The eighteen terms, in five families

Eighteen terms in five families. Most agent system descriptions touch terms from at least three.

Family 1 — Structure

Agent

An LLM with access to tools, running in a loop until a goal is met or it's interrupted. That's it. Everything else in this series sits on top of this primitive. The decision you make about "an agent" is what tools it gets and what it's allowed to stop and decide on its own.

Sub-agent

An agent invoked as a tool by another agent. The parent treats the sub-agent's response the same as any other tool response — it doesn't see the sub-agent's internal reasoning. Sub-agents are how you compose specialists: a "code reviewer" sub-agent, a "test runner" sub-agent. The decision behind a sub-agent is whether the role is genuinely distinct enough to need its own system prompt and toolset, or whether you're just running the parent twice for show.

Orchestrator

An agent whose primary job is to coordinate other agents, not to do object-level work itself. Orchestrators plan, delegate, and synthesize. They typically have a small toolset and a long context window. Orchestrators are powerful when they manage genuinely independent work and expensive when they manage work that should have been one agent.

Tool

A function the agent can call. From the agent's perspective, every external action — reading a file, running a test, querying a database, asking the user — is a tool call. The decision behind any tool is its description: that text is the agent's only window into when the tool should be used. Tool descriptions are product copy, not engineering documentation. Written carelessly, they degrade every agent that touches them.

Family 2 — Control

ReAct loop

The dominant control pattern: Reason, Act, observe, repeat. The agent thinks about what to do next, calls a tool, reads the result, thinks again. Most production agents are recognisable ReAct underneath. When you're choosing an agent harness, you're mostly choosing the quality of its ReAct loop.

Plan mode

A variation where the agent first produces a complete plan, gets human approval, then executes. Reduces drift on long tasks. Slower; safer. The decision behind plan mode is whether the task is uncertain enough that a wrong commitment is more expensive than the approval delay.

Human-in-the-loop (HITL)

Any place where the agent stops and asks for human approval. The interesting design question is which actions require HITL — too many and the agent is useless; too few and you're surprised by what shipped. This is one of the most consequential choices in agentic SDLC and we'll come back to it in Ch. 06.

Handoff

Passing control from one agent (or human) to another, with enough context for the receiver to continue. Surprisingly hard. The trajectory of an agent is often megabytes; the handoff has to compress that without losing what matters. Most multi-agent designs fail at handoffs first.

Family 3 — Context

Context window

The total number of tokens the model can consider at once. In 2026 this is typically 200K–1M tokens depending on the model. Effective context — the part the model can actually use well — is always smaller than the nominal window. Optimise for effective, not nominal.

System prompt

The instructions sent before any user message, defining the agent's role, constraints, and tool-use policy. In agentic SDLC, the system prompt does heavy lifting: it sets identity, rules of engagement, and the conventions of your codebase. Treat it as a configuration file owned by the team, not as a chat message owned by an individual.

MCP (Model Context Protocol)

An open protocol introduced by Anthropic in late 2024 that standardises how agents discover and call tools. Before MCP, every agent framework had its own tool format; integrating tool X with framework Y meant writing glue. After MCP, you write a tool server once and any MCP-aware agent can use it.

MCP isn't:

a model — it's a protocol that models speak;
magic — a bad MCP tool is still a bad tool;
required — agents work without it, but you'll want it once you have more than three tools to wire up.

Trajectory

The full sequence of thoughts, tool calls, and observations an agent produced while solving a task. The closest thing in this field to a stack trace. When an agent does something wrong, you read its trajectory. When you evaluate an agent, you compare its trajectory against a known-good one (the "golden trace"). If your tooling doesn't surface trajectories, you cannot debug at scale.

Reading trajectories is a learned skill. Like stack traces or git logs, it feels overwhelming at first and becomes second nature. Make a habit of reading them even on tasks that succeeded — it's how you build intuition for when they're about to fail.

Family 4 — Safety

Guardrails

Rules that constrain what the agent can do, enforced outside the model. Examples: "never modify files outside /src," "never call this API without explicit user confirmation," "abort if cost exceeds $5." Guardrails are not things you tell the model in the system prompt — those are wishes. Guardrails are enforced by the harness. If it's only in the prompt, it's not a guardrail; it's an aspiration.

Sandboxing

Running the agent's actions in an isolated environment so the blast radius of a mistake is contained. In 2026, each agent task typically runs in an ephemeral container with read access to the codebase and write access to nothing real until you approve a diff. Sandboxing is what lets you trust an agent enough to give it interesting tools.

Permission scope

The set of resources the agent is allowed to touch. A well-scoped agent has the minimum access needed for the task. The decision here is the same principle of least authority you'd apply to any service account; the difference is that agents have a habit of finding creative ways to use whatever they can reach, so the bar is higher.

Family 5 — Quality

Eval

A test for the agent itself, not for the code it writes. You give the agent a task with a known-good outcome and measure how often it produces that outcome. Evals are how you know your prompt changes actually helped, instead of just changing the failure modes. Without evals, you cannot tell whether you're getting better or worse over time.

Regression suite

A set of evals you run on every change to the agent's prompt, tools, or model version. Same role as a unit test suite, scaled up. Agents have soft behaviors that drift; the regression suite catches the drift.

Golden trace

A trajectory you've manually reviewed and certified as the way the agent should have handled this task. Useful for both training and evaluation. When a new trajectory diverges from the golden one, the divergence point is usually where the bug or improvement is.

The three architectural patterns

Almost every agentic system in 2026 is one of three shapes. You'll see all three in later chapters; recognising the pattern at a glance is half the battle when evaluating systems other teams have built.

Pattern	Shape	Choose when
Single ReAct agent	One agent, one loop, one toolset	The task is bounded, the work is sequential, and a single context window can hold what's needed
Plan-and-execute	Planner agent → reviewer → executor agent	Wrong commitments are expensive; you want a checkpoint before action
Orchestrator + workers	Coordinator decomposes; sub-agents handle pieces	Work genuinely parallelises and specialists outperform generalists

The honest default in 2026: single ReAct. Most multi-agent designs in the wild would be stronger as one well-prompted agent. Reach for the more elaborate patterns only when you can name the specific weakness of the simple version.

The minimum agent, conceptually

To anchor the vocabulary, picture the smallest agent that actually works. It has four moving parts:

A system prompt that defines its role and rules.
A list of tool descriptions it can call.
A loop that calls the model, executes any tool the model invokes, feeds the result back, and repeats.
A stopping condition — task complete, error, budget exceeded, or human interrupt.

Forty lines of code at most. Everything else in this series is decoration on this core. When you evaluate a complex agent system, ask which of those four parts is doing the real work. The answer is usually the system prompt and the tool descriptions; the harness around them is commodity.

A note on terminology drift

Some of these terms will be renamed by 2027. "Sub-agent" might become "specialist;" "orchestrator" might split into "planner" and "router." The underlying concepts will persist. Learn the concepts, not the labels. When you read a vendor's documentation, mentally translate their jargon into this chapter's vocabulary; the comparison is then like-for-like.

Practice — before you read the next chapter

If you're new to this

Pick any agent product you've read about recently — a blog post, a launch announcement, a tweet. Re-read it with this chapter's vocabulary in mind. Identify which architectural pattern it uses, which tools it appears to expose, and where its guardrails live. Notice which questions the marketing copy doesn't answer; those are usually the load-bearing ones.

If you've built agents before

Take an agent you've already built. List its tools and rate their descriptions on a 1–5 scale (would a stranger know when to call them?). Note where guardrails are enforced versus merely requested in the prompt. You'll usually find at least one place where what you thought was a guardrail is actually a wish.

If you lead a team

Inventory your team's current agent vocabulary. Are people using the same words for the same things? When someone on your team says "agent," does everyone picture roughly the same thing? Misaligned vocabulary slows decisions; the fix is cheap if you catch it early.

Takeaways

Five families of vocabulary — structure, control, context, safety, quality. Most agent descriptions live across three or more.
An agent is just an LLM in a loop with tools. Everything else is decoration on that core idea.
Tool descriptions are product copy, not documentation. Treat them as critical UX for the model.
Guardrails are enforced, not wished for. If it's only in the system prompt, it's not a guardrail.
Trajectories are the closest thing to stack traces. If your tooling doesn't show them, you can't debug at scale.
Default to single ReAct. Reach for multi-agent patterns only when you can name what the single version can't do.

Next chapter: Requirements — writing specs an agent can actually execute against. The first phase of the SDLC, and the one that surprises people most. The specs you write for humans aren't the specs an agent needs.

Discussion

Why agentic SDLC, and why now Requirements — writing specs an agent can execute