Every field has its own vocabulary, and you cannot read the literature — or argue productively at a staff meeting — until you have it. Agentic SDLC is unusual in that the vocabulary is still being settled. People use the same word for three different things, and three different words for the same thing. This chapter pins down the terms that recur through every chapter after this one, framed as the concepts you need to make decisions, not the implementation details of how they work.
Read it once now and come back as a reference. The exercises aren't optional even if you think you already know all the words.
An LLM with access to tools, running in a loop until a goal is met or it's interrupted. That's it. Everything else in this series sits on top of this primitive. The decision you make about "an agent" is what tools it gets and what it's allowed to stop and decide on its own.
An agent invoked as a tool by another agent. The parent treats the sub-agent's response the same as any other tool response — it doesn't see the sub-agent's internal reasoning. Sub-agents are how you compose specialists: a "code reviewer" sub-agent, a "test runner" sub-agent. The decision behind a sub-agent is whether the role is genuinely distinct enough to need its own system prompt and toolset, or whether you're just running the parent twice for show.
An agent whose primary job is to coordinate other agents, not to do object-level work itself. Orchestrators plan, delegate, and synthesize. They typically have a small toolset and a long context window. Orchestrators are powerful when they manage genuinely independent work and expensive when they manage work that should have been one agent.
A function the agent can call. From the agent's perspective, every external action — reading a file, running a test, querying a database, asking the user — is a tool call. The decision behind any tool is its description: that text is the agent's only window into when the tool should be used. Tool descriptions are product copy, not engineering documentation. Written carelessly, they degrade every agent that touches them.
The dominant control pattern: Reason, Act, observe, repeat. The agent thinks about what to do next, calls a tool, reads the result, thinks again. Most production agents are recognisable ReAct underneath. When you're choosing an agent harness, you're mostly choosing the quality of its ReAct loop.
A variation where the agent first produces a complete plan, gets human approval, then executes. Reduces drift on long tasks. Slower; safer. The decision behind plan mode is whether the task is uncertain enough that a wrong commitment is more expensive than the approval delay.
Any place where the agent stops and asks for human approval. The interesting design question is which actions require HITL — too many and the agent is useless; too few and you're surprised by what shipped. This is one of the most consequential choices in agentic SDLC and we'll come back to it in Ch. 06.
Passing control from one agent (or human) to another, with enough context for the receiver to continue. Surprisingly hard. The trajectory of an agent is often megabytes; the handoff has to compress that without losing what matters. Most multi-agent designs fail at handoffs first.
The total number of tokens the model can consider at once. In 2026 this is typically 200K–1M tokens depending on the model. Effective context — the part the model can actually use well — is always smaller than the nominal window. Optimise for effective, not nominal.
The instructions sent before any user message, defining the agent's role, constraints, and tool-use policy. In agentic SDLC, the system prompt does heavy lifting: it sets identity, rules of engagement, and the conventions of your codebase. Treat it as a configuration file owned by the team, not as a chat message owned by an individual.
An open protocol introduced by Anthropic in late 2024 that standardises how agents discover and call tools. Before MCP, every agent framework had its own tool format; integrating tool X with framework Y meant writing glue. After MCP, you write a tool server once and any MCP-aware agent can use it.
MCP isn't:
The full sequence of thoughts, tool calls, and observations an agent produced while solving a task. The closest thing in this field to a stack trace. When an agent does something wrong, you read its trajectory. When you evaluate an agent, you compare its trajectory against a known-good one (the "golden trace"). If your tooling doesn't surface trajectories, you cannot debug at scale.
Reading trajectories is a learned skill. Like stack traces or git logs, it feels overwhelming at first and becomes second nature. Make a habit of reading them even on tasks that succeeded — it's how you build intuition for when they're about to fail.
Rules that constrain what the agent can do, enforced outside the model. Examples: "never modify files outside /src," "never call this API without explicit user confirmation," "abort if cost exceeds $5." Guardrails are not things you tell the model in the system prompt — those are wishes. Guardrails are enforced by the harness. If it's only in the prompt, it's not a guardrail; it's an aspiration.
Running the agent's actions in an isolated environment so the blast radius of a mistake is contained. In 2026, each agent task typically runs in an ephemeral container with read access to the codebase and write access to nothing real until you approve a diff. Sandboxing is what lets you trust an agent enough to give it interesting tools.
The set of resources the agent is allowed to touch. A well-scoped agent has the minimum access needed for the task. The decision here is the same principle of least authority you'd apply to any service account; the difference is that agents have a habit of finding creative ways to use whatever they can reach, so the bar is higher.
A test for the agent itself, not for the code it writes. You give the agent a task with a known-good outcome and measure how often it produces that outcome. Evals are how you know your prompt changes actually helped, instead of just changing the failure modes. Without evals, you cannot tell whether you're getting better or worse over time.
A set of evals you run on every change to the agent's prompt, tools, or model version. Same role as a unit test suite, scaled up. Agents have soft behaviors that drift; the regression suite catches the drift.
A trajectory you've manually reviewed and certified as the way the agent should have handled this task. Useful for both training and evaluation. When a new trajectory diverges from the golden one, the divergence point is usually where the bug or improvement is.
Almost every agentic system in 2026 is one of three shapes. You'll see all three in later chapters; recognising the pattern at a glance is half the battle when evaluating systems other teams have built.
| Pattern | Shape | Choose when |
|---|---|---|
| Single ReAct agent | One agent, one loop, one toolset | The task is bounded, the work is sequential, and a single context window can hold what's needed |
| Plan-and-execute | Planner agent → reviewer → executor agent | Wrong commitments are expensive; you want a checkpoint before action |
| Orchestrator + workers | Coordinator decomposes; sub-agents handle pieces | Work genuinely parallelises and specialists outperform generalists |
The honest default in 2026: single ReAct. Most multi-agent designs in the wild would be stronger as one well-prompted agent. Reach for the more elaborate patterns only when you can name the specific weakness of the simple version.
To anchor the vocabulary, picture the smallest agent that actually works. It has four moving parts:
Forty lines of code at most. Everything else in this series is decoration on this core. When you evaluate a complex agent system, ask which of those four parts is doing the real work. The answer is usually the system prompt and the tool descriptions; the harness around them is commodity.
Some of these terms will be renamed by 2027. "Sub-agent" might become "specialist;" "orchestrator" might split into "planner" and "router." The underlying concepts will persist. Learn the concepts, not the labels. When you read a vendor's documentation, mentally translate their jargon into this chapter's vocabulary; the comparison is then like-for-like.
Pick any agent product you've read about recently — a blog post, a launch announcement, a tweet. Re-read it with this chapter's vocabulary in mind. Identify which architectural pattern it uses, which tools it appears to expose, and where its guardrails live. Notice which questions the marketing copy doesn't answer; those are usually the load-bearing ones.
Take an agent you've already built. List its tools and rate their descriptions on a 1–5 scale (would a stranger know when to call them?). Note where guardrails are enforced versus merely requested in the prompt. You'll usually find at least one place where what you thought was a guardrail is actually a wish.
Inventory your team's current agent vocabulary. Are people using the same words for the same things? When someone on your team says "agent," does everyone picture roughly the same thing? Misaligned vocabulary slows decisions; the fix is cheap if you catch it early.
Next chapter: Requirements — writing specs an agent can actually execute against. The first phase of the SDLC, and the one that surprises people most. The specs you write for humans aren't the specs an agent needs.
Sign in to join the discussion and post comments.
Sign in