The risk in writing a tooling chapter is that it ages badly. Specifically named products change pricing, features, and ownership. New entrants displace incumbents. The "we just shipped this" of today is the "what happened to them" of next year. So this chapter takes a different approach. Rather than rank products, it categorises them, describes the shape of each category, and tells you what to look for. The category framework should still make sense in three years.
This is the chapter to come back to when you're evaluating a new tool. Read it once for the lay of the land, then use it as a checklist when you're being pitched.
One paragraph per category, focused on the decision you're making when you pick.
1 · IDE / editor agents. Where most engineers meet agents. The decision is fit with your team's existing editor preferences, plus the quality of the diff (small, focused, correct), context handling, and latency. What doesn't matter as much as vendors claim: which model it's "powered by." The harness matters more than the model name.
2 · CLI agents. For engineers who live in tmux. The decision is composability with shell pipelines and the ability to script the agent. Strong case if your team already scripts the dev environment; weak case otherwise.
3 · Background agents. CI bots, schedulers, librarians. The decision is permission scoping (Ch. 06), draft-by-default behavior, and audit trail. A few well-designed background agents are valuable; ten of them is usually a sign someone is automating problems that should be solved differently.
4 · Models & APIs. Honest take in 2026: the top three or four providers are all good enough for most agentic SDLC use cases. The differences that show up in vendor benchmarks rarely show up in your workflow. Pick on price, latency, ergonomics, and ecosystem (SDK quality, tool-use support, structured outputs). Switch if you find a real workflow difference.
5 · MCP servers. The connective tissue of the ecosystem. The decision for most teams is whether to write your own for internal tools. The answer is almost always yes — usually a few hours of work, and the result is reusable across every agent. The investment compounds.
6 · Sandboxes. Isolated execution. The decision is startup speed for interactive workflows, network controls, and cost. If you're running agents on developer laptops only, your sandbox is the developer's container. If you're running them in CI or production, you'll want something more deliberate.
7 · Eval & observability. Younger category. Many teams build their own minimal version (logging trajectories to object storage, querying with a notebook) before adopting a tool. That's reasonable. The category will mature.
8 · Governance & safety. Important for regulated industries, larger organisations, and any team where agents touch production. For smaller teams, often "we'll handle it with existing tools." Worth being deliberate before adopting a dedicated product.
When evaluating any agentic tool, ask in this order:
For a 5–20 engineer team doing serious agentic SDLC in 2026, the minimum stack:
That's the floor. Many strong teams run on exactly this for a year before adding anything else.
The "what would we lose" test. For any tool you're using or considering, ask "what would we lose if we stopped using this tomorrow?" If the answer is "convenience and a few hours of work," the tool is replaceable. If the answer is "we'd have to rebuild our entire workflow," you have lock-in worth being aware of. The honest answers, written down, change purchasing decisions.
Before signing up for any paid tool in this space:
A pitch that the platform will be a full team member — taking tickets, writing code, reviewing PRs, deploying. The demo is impressive. In production, the failure modes are systemic: an autonomous agent making the wrong product call in chapter three means everything after chapter three is wrong, and the audit trail is hard to reconstruct. Teams that adopt these usually end up with a constrained subset of the original promise, with humans inserted at each step.
What to do instead: adopt the constrained subset directly. Use the tool as a reviewer, fixer, or investigator. Skip the "autonomous engineer" framing.
"Our AI just understands what you want." It usually doesn't. The prompts are inside the platform, hidden from you, tuned for the average case. Your team isn't average; your codebase isn't average; the hidden prompts are leaving value on the table.
What to do instead: choose tools that expose their prompts and let you customise them. The project-prompt patterns from Ch. 04 only work when the tool lets you write them.
The model API bill is the cost most teams notice. There are usually two costs they don't:
Map your current stack to the eight categories. Note which categories you're using and which you're not. Don't add anything yet — just know where you are.
For each tool you currently pay for, run the "what would we lose" test. Honest answers, written down. Decide which renewals to question at the next cycle.
Assign someone (might be you) to own "tooling situational awareness" — knowing what's good, what's emerging, what's worth evaluating. The role takes maybe two hours a month done well; it pays for itself many times over.
Next chapter: Capstone — a real-shaped feature built end-to-end with an agent in the loop. Every prior chapter's lessons composed into one workflow, with the time and cost accounting that tells you what to expect.
Sign in to join the discussion and post comments.
Sign in