After enough agentic systems, patterns emerge. Some are genuine — the same shape works again because it matches the grain of how agents process problems. Some are pattern-shaped delusions — they feel like they should work because of how they look on the whiteboard, and they fall apart in production. And some are diagnostic — they fail, but their failures are useful, because they fail at the boundary of what current agents can do.
This chapter catalogs the patterns I've seen recur most: six that work, three that look like they should and don't, and a couple that fail in instructive ways.
An agent whose job is to investigate, report, and stop. No actions. Just reading, reasoning, and writing a summary. Used for bug triage, codebase exploration, vendor evaluation, security audits, on-call diagnosis.
Why it works: investigation is exactly the task where agents have the most native advantage — read fast, summarise, follow imports without losing track. Removing the action step removes most of the risk. The output is a report a human acts on. The constraint to read-only tools is what makes the pattern reliable: the agent can't accidentally make things worse, and the human reviewing the report gets to apply judgment without first untangling what the agent already did.
An agent that reads a proposed change and gives feedback. PR review is the canonical example; design doc review, RFC review, and SQL-migration review are variants. The output is comments, not code.
Why it works: review is a bounded task with clear inputs (the change) and clear outputs (the comments). The agent's tendency to over-explain becomes useful — it surfaces concerns a tired human reviewer might skip. The trick that makes this not annoying: tier the comments. Critical issues at the top, suggestions in the middle, nits at the bottom. PR authors learn to scan top-first.
An agent that takes a failing signal (test, lint, build, CI red) and proposes a fix in a draft PR. You saw this in Ch. 06 as fix-bot.
Why it works: the input is structured (an error message, a stack trace), the success condition is observable (the failure stops), and the human still approves the change before it merges. All three properties stack to make the pattern robust. The key constraint: the fixer opens drafts, not direct pushes.
An agent that organises existing material: writes the README, drafts the changelog from commits, maintains the architecture diagram, generates the API docs. The source material exists; the agent's job is to compress and structure it.
Why it works: there's no novel generation needed; the agent's hallucination surface is minimal because everything it writes can be traced back to existing artifacts. A librarian agent that runs nightly to update generated docs is one of the highest-ROI uses of agents on teams that don't have dedicated docs effort.
An agent that runs ahead of a planned task: reads the relevant code, surfaces what's there, identifies surprises. The output is consumed by a human (or another agent) before the actual work starts.
The scout doesn't write code. It writes context. The engineer walks in tomorrow with the map already drawn. Particularly valuable before any task expected to take more than a day; the scout's pre-read changes the time estimate and surfaces unknowns when they're still cheap.
Two agents with deliberately different system prompts: one optimistic ("propose a solution"), one critical ("find what's wrong with this proposal"). They take turns; the human reads the conversation and makes the final call.
Why it works: agents have a known weakness in self-critique. They tend to commit to their first plausible approach. The critical agent breaks that — it has no investment in the proposer's plan because it's a different session. The pattern surfaces objections the proposer would have steamrolled.
"One agent to rule them all." A single agent with a huge system prompt, access to every tool, expected to handle anything from "fix this bug" to "design the next feature."
What goes wrong: tool descriptions blur together; the agent gets confused about which tool to use; context windows fill up with irrelevant instructions; quality drops on every individual task. The "general-purpose" agent is mediocre at everything.
Fix: split into role-specific agents. A reviewer with reviewer tools, an investigator with read-only tools, a fixer with fix tools. Each has a focused system prompt. Per-agent quality goes up dramatically.
Three or four agents in series, each passing summary context to the next. Looks tidy on the architecture diagram. In practice each handoff loses information, compounding by step three into a game of telephone. By the time the last agent acts, the original intent is barely recognisable.
Fix: either make the handoff lossless (pass the full prior trajectory, not a summary), or make the chain shorter, or make the steps independent (parallel instead of serial). Long serial chains are almost always a sign of design failure.
An orchestrator agent whose job is to figure out which sub-agent to spawn for a request. Sounds elegant. In practice the coordinator's classification step is the weakest link. It misroutes requests, splits work that shouldn't be split, and adds a layer of indirection that obscures debugging.
Fix: make routing explicit. The user (or the harness) picks which agent to invoke. The orchestrator only orchestrates within a known workflow with predictable branches.
The temptation when an agent isn't doing well at a task is to add more agents. A planner. A validator. A second-opinion agent. Often this makes things worse. Each agent adds coordination overhead, more places for errors to compound, and more cost. The single-agent baseline is usually stronger than the multi-agent improvement.
The exceptions — where multiple agents really do help — share two properties:
The dual-prompt loop satisfies both. The investigator-then-fixer workflow can satisfy both. Most everything else does not.
The "single competent agent" baseline. Before designing a multi-agent system, get the strongest single agent you can build. Most problems that look like they need orchestration just need better prompts. Architectural complexity is often a smell that the simple version wasn't tried hard enough.
Give an agent a task that requires it to operate beyond its capabilities. Sometimes it says "I can't do that." Often it generates plausible-looking output that doesn't actually do the task. The failure isn't obvious; you only discover it when you try to use the output.
Examples: tasks requiring true numerical precision (financial code that processes amounts to the cent and proves it). Tasks requiring deep concurrency reasoning (correct lockless data structures). Tasks requiring rigorous formal verification.
What this teaches: there are domains where the current generation of agents is plausibly competent and actually not. The mitigations are domain-specific testing — you'd already test financial code carefully; now you test it twice — and being aware of which subdomains fall into this bucket for your work.
Sometimes a task that looks straightforward to a human goes badly because the agent doesn't share the implicit constraints. "Rename this variable everywhere it's used." The agent renames it in code, but also in a fixture file where the string happens to be the same, and in a comment that mentions an unrelated concept. Both are technically renames; both are wrong.
What this teaches: "everywhere" is doing more work than it looks. The fix is to specify the bounds: "rename in TypeScript and JavaScript source files only, not in test fixtures, comments, or documentation." Future agents will get better at this; today's still need the explicit scoping.
The patterns aren't mutually exclusive. A typical mature agentic workflow combines them. A bug-fixing pipeline might use the scout to map the affected modules, the investigator to identify the root cause, the dual-prompt loop to evaluate fix candidates, and the fixer to implement the chosen approach in a draft PR. Four patterns; one workflow; humans at every transition.
The key design move is choosing roles that compose. A reviewer that hands off to a fixer is a natural composition. An investigator that hands off to a librarian (write up what we found in the docs) is another. The agents stay specialised; the workflow stays comprehensible.
Pick one pattern from the "patterns that work" section and set it up for your own work. The investigator is the easiest start — no tools to wire up, just a different system prompt. Use it for one week on the next three open-ended questions you face. Notice what changes.
Map your current agent usage to the patterns in this chapter. Are any of your workflows actually anti-patterns in disguise? If you have a "general purpose" agent, can you split it into two or three role-specific ones?
Try the dual-prompt loop on a real design decision your team is facing. Write the two system prompts deliberately. Run a few turns. Compare the result to what you would have written alone. Most engineers find the critic surfaces at least one objection they had quietly suppressed.
Next chapter: Team — code review with non-human PRs, onboarding new engineers into an agentic team, and how seniority gets redefined when typing isn't the bottleneck.
Sign in to join the discussion and post comments.
Sign in