Project: Build a Code Review Bot Using Prompt Chains

In this project you will design a code review bot built entirely from prompts — a multi-pass pipeline that reviews a pull request the way a senior engineer would: style first, then correctness, then security, then a friendly final summary. The deliverable is a folder of prompt files you can wire into any AI tool or CI pipeline.

1. Introduction

AI is genuinely good at code review — but only when you stop asking it to "review my code" and start asking it to do specific kinds of review one at a time. A single prompt that asks for "style, bugs, security, and suggestions" produces a vague essay that misses the real issues. A pipeline of focused prompts catches things human reviewers regularly miss.

We will build the bot for a small Python service, but the chain works for any language. The output is a Markdown review you could literally paste as a comment on a pull request.

2. The Concept Explained

Good human reviewers don't read code linearly — they pass over the same diff several times with different lenses. We will replicate that with five passes:

Context pass — understand what the change is trying to do.
Style pass — naming, readability, idioms.
Bug pass — correctness, edge cases, off-by-ones.
Security pass — injection, secrets, untrusted input.
Summary pass — combine into a kind, prioritised review.

The bot passes over the same diff five times with different lenses, then composes a single human-readable review.

3. The Problem Without a Chain

Single-shot review

Review this code and tell me what's wrong:
{paste diff}

The model writes a soft mixture of style nits and vague concerns. It misses the actual race condition. It misses the SQL string concatenation. It complains about variable names. The signal-to-noise ratio is terrible because the model has no separate lens for each kind of issue.

4. The Solution: A Five-Pass Pipeline

Pass 1 — Context

You are a senior engineer reading a pull request before reviewing it.

In ≤120 words, summarise:
- what this PR is trying to do (from the description and the diff)
- which files / functions changed
- which behaviours are new vs. modified vs. removed
- any architectural concern that is obvious at a glance

Don't suggest fixes yet. This is the "understand the change" step.

PR description:
"""
{paste PR title + description}
"""

Diff:
"""
{paste the unified diff}
"""

Pass 2 — Style

You are a senior engineer doing a STYLE-ONLY pass. Ignore bugs
and security for now.

Look for:
- unclear names (variables, functions, params)
- functions doing more than one thing
- duplicated logic
- comments that explain "what" instead of "why"
- inconsistency with the rest of the file's idioms

For each finding return:
- file + line number(s)
- one-line description
- severity: nit / suggestion / blocker
- the smallest possible fix (≤3 lines of code)

If there are no style issues worth raising, say so explicitly.

Diff:
"""
{paste diff}
"""

Pass 3 — Bugs

You are a senior engineer doing a CORRECTNESS pass. Ignore style.

Walk through the diff line by line. For each function changed,
imagine 3 inputs that might break it: empty, very large, malformed,
or boundary values. Reason about what would happen.

Look specifically for:
- off-by-one errors
- null / None / empty handling
- concurrency / race conditions
- error paths that swallow exceptions
- mismatches between function signature and call sites

For each finding return:
- file + line number(s)
- the failure scenario in one sentence
- severity: nit / suggestion / blocker
- a suggested fix (≤5 lines)

Think step by step before writing your findings.

The "think step by step" line invokes chain-of-thought reasoning, which is exactly where bug-finding accuracy jumps.

Pass 4 — Security

You are a security-focused engineer doing a SECURITY pass.

Look for:
- SQL / command / template injection
- secrets or API keys committed to the diff
- input from external sources used unsanitised
- weak crypto, hardcoded passwords, predictable tokens
- improper auth / permission checks
- logging of sensitive data

For each finding return:
- file + line number(s)
- the threat in one sentence
- severity: nit / suggestion / blocker
- a concrete fix or mitigation

If you find no security issues, say so. Do not invent severity to
look thorough.

Pass 5 — Final summary

You are now writing the final review the author will read.

Inputs:
- Context summary (pass 1)
- Style findings (pass 2)
- Bug findings (pass 3)
- Security findings (pass 4)

Produce a Markdown review with this structure:

### Summary
1-paragraph appreciation of what this PR does well.

### Blockers (must fix before merge)
- ...

### Suggestions (worth considering)
- ...

### Nits (optional)
- ...

### Tests I'd add
3 specific test cases the author should add before merging.

Tone: respectful, specific, no condescension. Reference line numbers.
Do not invent findings — only carry forward what passes 2–4 produced.

Sample blocker entry: "users.py:84 — the SQL query is built with an f-string that includes request.args['email']. This is exploitable via classic SQL injection. Use parameterised queries with the driver's ? placeholder."

5. Step-by-Step Breakdown

Always do a context pass first. A reviewer who does not understand the intent of a change writes either nit-only reviews or panicked ones.
One lens per pass. Style, bugs, and security each require different mental models. Mixing them dilutes all three.
Add "think step by step" to the bug pass. It is the single highest-leverage line in the whole chain — bugs require reasoning, not pattern-matching.
Force severity tags. Without nit / suggestion / blocker tags, every finding looks equally urgent. Authors then ignore everything equally.
Compose the final review. The author should never see five raw passes — only the curated summary with appreciation, blockers, suggestions, nits, and recommended tests.
Wire it into the workflow. The output is plain Markdown — paste as a PR comment, or pipe it into a GitHub Action so the bot reviews every PR automatically.

Note: A prompt-driven review bot is an assistant, not a replacement. It misses subtle architectural issues a human reviewer would catch. Use it to clear the obvious 70% of feedback so human reviewers can focus on the hard 30%.

6. Practice Exercises

Exercise 1

Take a real PR (your own or an open-source one). Run the five-pass chain and read the final review. Compare it to the human review that landed on that PR. Note what the bot caught, what it missed, and what it over-flagged.

Exercise 2

Add a sixth pass — a test-quality pass — that critiques the tests in the diff: are they meaningful, do they assert behaviour or just calls, is there a missing edge case? Compare reviews with and without this extra pass.

Exercise 3

Customise the style-pass rules to match your team's actual style guide: max function length, preferred patterns, banned APIs. Save the customised prompts in a review-bot/ folder in your repo so anyone on the team can reuse them.

7. Key Takeaways

Great reviews come from multiple focused passes, not one giant prompt.
Style, correctness, and security each need their own lens, severity tags, and instructions.
"Think step by step" is the single most useful phrase for the bug pass.
The author should only ever see the curated final review — never the raw passes.
An AI review bot is a force multiplier for human reviewers, not a replacement for them.

Discussion

Project: Create a Social Media Content Calendar with AI Project: Design a Complete Brand Identity Using AI Image Prompts