On this tutorial

Agentic SDLC: A Field Manual for Building Software with AI Agents

Foundations

Phases

Synthesis

Capstone

Capstone — a feature, end to end

Maintenance — the long tail

Most of what's written about agentic SDLC is about the exciting part — building features faster, fixing bugs in minutes, shipping more. The unglamorous truth is that 70% of software engineering is maintenance, and the maintenance question is where the practice is either proven or quietly broken. Six months in, half your codebase has agent-touched files. The agent that wrote them is gone — or, more precisely, it's a fresh session every time, with no memory of what it did last March. Maintenance is now a problem of continuity without memory, and the practices that solve it look meaningfully different from human-only maintenance.

This chapter is about the long tail: months and years after the first agent-assisted commit, when the romance is gone and you just need the system to keep working.

What you'll take away from this chapter

Why "the agent will remember" is the most expensive assumption you can make
The three external memory systems that pay for themselves: the project prompt, the annotated changelog, and the decision log
How to spot and reverse drift — when agent-made changes accumulate that don't match your conventions
The bug-fixing workflow when no one on the team wrote the code originally
The "archaeology mode" for codebases with no living memory

The continuity problem

A human engineer who joined six months ago has internalised a lot. They know that the billing module is more careful than it looks because of an incident in 2024. They know not to refactor the legacy auth module because the rewrite was attempted twice and failed. They know which colleague wrote which module, and who to ask. This tacit knowledge is the connective tissue of a codebase.

An agent has none of it. A fresh session has read what you put in front of it and nothing else. The agent that fixed yesterday's bug doesn't know about today's bug; the agent fixing today's bug doesn't know what yesterday's said about the same module. Every session is day one.

The maintenance question is: how do you compensate? You can't keep the agent in the office over coffee. You have to write down everything that would otherwise live in tacit memory, and you have to write it in a place the agent will actually read.

The work isn't more, it's relocated. What used to live in heads now lives in the repo, where agents (and new humans) can find it.

The three external memory systems

The project prompt

You met this in Ch. 04. It's load-bearing for maintenance. Treat it as a living document — not a write-once setup file. Every recurring correction goes into it. Every "the agent didn't know X about our codebase" moment becomes a line in the file.

A project prompt that survives a year typically has these sections:

Conventions — naming, file layout, formatting, library choices
What to avoid — patterns this codebase has tried and rejected
Domain notes — business rules that aren't obvious from code (regulatory constraints, vendor quirks)
Where to look — module-by-module index of what's where and what to read first
Things in motion — known refactors in progress, modules being deprecated, areas where the patterns are deliberately inconsistent because a migration is underway

The annotated changelog

Not the generated-from-commits changelog. A human-curated record of why things changed — at the level of weeks or releases, not commits. The agent reads this when investigating "when did this start happening" or "is this related to that recent refactor."

Each entry covers a small batch of related changes with one or two sentences of context. "Switched checkout to read currency from user.preferences.currency instead of session.currency — fixed EU-pricing bug, see decision log 2026-05-12." Six months from now, when an agent investigates an odd currency behavior, the trail is immediate.

The decision log

Architecture Decision Records, but lightweight. Five-to-ten-line entries explaining a non-obvious choice. Not every decision; only the ones that future-you (or a future agent) will want to know about.

Three short sections cover most of what's needed: context (what triggered the decision), decision (what was chosen), and why not (what alternative was rejected and why). Five minutes to write; decades of value in a long-lived codebase.

The "would I want to know this in two years" filter. Use it to decide what gets a decision log entry. Trivia about typography in the marketing site: no. The fact that you can't lazy-load the dashboard chart library because of a known incompatibility with their SSR mode: yes.

Drift, and how to catch it

Drift is when agent-made changes accumulate that don't match the conventions you set out — not flagrantly, but a degree at a time. Naming conventions slip. New files appear in odd places. The "we always use repositories for DB access" rule has six exceptions now, all introduced in the last quarter. None of them flagged at PR review because each one was small.

The defenses, in order of effectiveness:

Linters and codemods that enforce conventions. If the convention can be a lint rule, make it one. Agents respect lints they see fail.
Periodic codebase audits. Once a quarter, ask an agent to audit one area against your project prompt. "Does this module follow our conventions? Where does it deviate?" The audit produces a list of small cleanup tickets.
Renaming as a one-shot tax. When you spot drift, fix it in one PR across the codebase. Agents are good at this exact kind of mechanical sweep.
Project prompt updates. If drift is happening, the prompt isn't strong enough. Add specifics.

Drift is reversible if caught early. Codebases that don't catch drift end up with the same problem human-only codebases get — death by a thousand small inconsistencies — just faster.

Bug-fixing in code no one wrote

You're handed a bug in a module no one on the team wrote. The original author was an agent in a session six months ago. Git blame points to a commit by your CI bot. The PR description is two sentences. What now?

The pattern that works has four steps:

Read the decision log and changelog for context. Five-minute scan. Was this module part of a known refactor? Is there a decision log entry that mentions it? Often yes — and you save yourself wandering.
Have a fresh agent session "explain" the module. Ask the agent to read the file and describe what it does, what it depends on, what depends on it, and any non-obvious choices. The agent is good at this and starts from the same blank slate you do.
Verify the non-obvious bits with a human. The agent's summary may flag things that look weird but were intentional. Check before assuming.
Fix, with a decision log entry if you uncovered anything. If the investigation surfaced a non-obvious choice that turned out to be intentional, document it. The next person shouldn't have to re-investigate.

Total time on a typical bug: forty minutes. Without the external memory systems, the same investigation would take hours, and the chance of accidentally re-breaking something else is much higher.

The archaeology workflow

Sometimes you inherit a codebase you've never seen, written largely by agents over months, with weak documentation and no decision log. Maintenance starts with archaeology: building the missing memory before you can safely change anything.

The workflow:

Generate a high-level map. Have the agent walk the directory tree and produce a one-paragraph description of each major module.
Identify the load-bearing files. Files imported many places or containing core domain logic. The agent can spot these by analysing the import graph.
Annotate the load-bearing files first. Add file headers explaining what each does. This is the foundation of the project prompt you don't yet have.
Build a starter project prompt from observed conventions. Have the agent scan for repeated patterns and propose conventions that match what's already in the codebase. Edit and adopt.
Find the suspicious modules. Modules with no tests, modules imported but rarely modified, modules with unusual patterns. These are where bugs hide. Document them with "watch out" comments.

This is a week of work for a medium codebase, more for a large one. It feels like overhead until the first time you have to fix something — at which point the maintenance debt you would have paid is gone.

The compression-versus-comment cycle

A specific maintenance pattern worth naming: agents tend to write code that's slightly more verbose than it needs to be. Helper functions for things that could be inlined. Comments explaining the obvious. None of this is a bug; cumulatively it makes the code wordier than a senior would write.

The pattern: periodic compression passes on modules the agent has touched several times. Ask another agent session to "tighten this — same behavior, fewer lines, no comments that just describe what the code does." Review the diff. Usually 10–25% smaller, often more readable, no behavioral change.

The reverse is also worth doing occasionally — when the code is too dense and missing the comments that explain why. A pass to add those comments, based on the decision log and changelog, pays for itself the next time anyone reads the file.

The team conversation worth having

Maintenance practices fail not because individuals don't care but because the team doesn't have a shared agreement about who owns the external memory. Three questions worth answering explicitly:

Who owns the project prompt? If everyone owns it, nobody does. Usually the tech lead, often with co-owners.
What's the bar for a decision log entry? If everyone has a different bar, you'll end up with either nothing or noise.
When do we do the periodic audit and the compression pass? If it's "when we have time," it never happens. Pick a cadence.

The answers don't have to be elaborate. They have to be agreed.

Practice — before you read the next chapter

If you're new to this

If you don't have a project prompt for any project, create one for your largest active project. Spend an hour. Use the section list earlier. You'll be surprised how much tacit knowledge you write down without effort.

If you have a project prompt but no decision log

Look back at the last three significant non-obvious decisions you made on the project. Write decision-log entries for them, retroactively. Five minutes each. Then commit to writing one prospectively, the next time you make a decision that future-you will want to remember.

If you lead a team

Audit one module that's been touched by agents at least five times. Read git log for the last six months on it. Has anything drifted from the conventions in your project prompt? If yes, plan a corrective sweep. If no, you've earned the right to be smug for one day.

Takeaways

The agent has no tacit memory. Maintenance is the art of relocating tacit knowledge from heads into the repo.
Three memory systems that earn their keep: the project prompt (living conventions), the annotated changelog (why things changed), the decision log (why specific choices were made).
Drift is reversible if caught. Lints catch most of it; quarterly audits catch the rest.
Bug-fixing in agent-written code starts with archaeology: read the logs, have a fresh agent summarise, verify the non-obvious bits with a human.
Periodic compression passes keep agent-written code from accreting verbosity.
The team conversation: who owns the project prompt, what's the bar for a decision log entry, when do we audit. Cheap to agree; expensive to skip.

Next chapter: Patterns — the recurring shapes of agentic systems that work, look-like-work, and fail interestingly.

Discussion

Observability — debugging the unrepeatable Patterns — what works, what fails interestingly