Graph RAG — when graphs beat vectors

Everything in this series so far has retrieved by similarity — find the chunks that mean something like the question. But a whole class of real questions isn't about similarity at all; it's about relationships. "Who reports to the person who approved this budget?" "What downstream services depend on the database we're about to migrate?" "How is this customer connected to that fraud case?" No amount of semantic similarity answers these, because the answer isn't in any single chunk — it's in the connections between chunks, traversed across several hops. This is where a knowledge graph beats a vector index. It's also where teams most often build elaborate machinery they didn't need, so this chapter is as much about when not to as how.

What you'll take away from this chapter

The specific question shapes vector search structurally cannot answer
What a knowledge graph is, in plain terms — entities and typed relationships
How graph RAG is built, and why the building is the expensive, lossy part
How graph and vector retrieval combine rather than compete
The honest test for whether your problem justifies a graph at all

The questions vectors can't answer

Vector search retrieves a neighbourhood — chunks near the query in meaning. That's perfect for "what is our refund policy" and useless for three shapes of question:

Multi-hop. "Who manages the engineer who wrote the auth module?" requires hopping: auth module → its author → that person's manager. Each hop is a relationship, and no single chunk contains the whole chain.
Relationship. "What depends on this service?" is a question about edges, not content. The answer is a set of connections, which a similarity search has no way to follow.
Aggregation across connections. "What are the main themes across everything this author published?" needs to gather and summarise over a connected set, not retrieve the single most-similar passage.

For these, you need a structure that stores the relationships explicitly and lets you traverse them — which is exactly what a knowledge graph is.

Vector search gives you a fuzzy neighbourhood of similar things. A graph lets you walk explicit, typed relationships hop by hop — which is the only way to answer "the manager of the author of this module."

What a knowledge graph is

Strip away the mystique and a knowledge graph is just entities connected by typed relationships, usually stored as triples: (subject, relationship, object). "(charge_card, calls, gateway.send)", "(Aanya, manages, Ravi)", "(OrderService, depends_on, PaymentDB)". Each triple is one edge. Collect enough of them and you have a graph you can traverse: start at a node, follow edges of the type you care about, arrive at the answer. The query language is traversal — "from this node, follow depends_on edges outward" — not similarity.

How graph RAG is built — and why it's the hard part

Here's the catch that the demos gloss over: you usually don't have a graph. Your data is unstructured documents. To get a graph, you must extract entities and relationships from that text — typically by asking an LLM to read each chunk and emit triples. That extraction step is the expensive, lossy heart of graph RAG.

# Extract (subject, relation, object) triples from text with an LLM,
# then traverse the resulting graph to answer a relationship question.
import json
from collections import defaultdict

EXTRACT = """Extract factual relationships from the text as a JSON array of
[subject, relation, object] triples. Use consistent entity names. Only
relationships explicitly stated. JSON array only.

Text: {text}"""

def extract_triples(text, llm):
    raw = llm.complete(EXTRACT.format(text=text)).strip()
    try:
        triples = json.loads(raw)
        return [tuple(t) for t in triples if len(t) == 3]
    except (json.JSONDecodeError, TypeError):
        return []                              # extraction is fallible — never crash

def build_graph(chunks, llm):
    edges = defaultdict(list)                  # subject -> [(relation, object)]
    for chunk in chunks:
        for s, r, o in extract_triples(chunk, llm):
            edges[s].append((r, o))
    return edges

def traverse(edges, start, relation, hops=2):
    """Follow `relation` edges outward from `start`, up to `hops` deep."""
    frontier, seen, results = [start], {start}, []
    for _ in range(hops):
        nxt = []
        for node in frontier:
            for r, obj in edges.get(node, []):
                if r == relation and obj not in seen:
                    seen.add(obj); results.append((node, r, obj)); nxt.append(obj)
        frontier = nxt
    return results

text = "Ravi wrote the auth module. Aanya manages Ravi. Aanya manages Sara."
edges = build_graph([text], llm)
# who manages the author of the auth module? hop: module -> author -> manager
print(traverse(edges, "Ravi", "managed_by", hops=1) or "follow author->manager")

The code is simple; the difficulty is everything around it. Entity names drift ("Aanya", "A. Krishnamurthy", "the manager" — are they one node or three?). Relationships extracted from different chunks contradict each other. Extraction over a large corpus means an LLM call per chunk, which is slow and costly. And the graph goes stale exactly like any other index when the source changes. The traversal is the easy 10%; building and maintaining a clean graph is the hard 90%.

Graph and vector are partners, not rivals

The strongest systems don't choose. They use vector search to find the entry points — the entities most relevant to the query — then graph traversal to follow the relationships from there. Vector retrieval answers "which nodes are relevant"; the graph answers "what's connected to them." A common pattern also pre-computes summaries of densely-connected regions ("communities") so that broad questions — "what are the major themes here?" — can be answered from region summaries rather than by traversing millions of edges live. Local, specific questions use traversal; global, thematic questions use the summaries. The vector index from Chapter 05 and the graph sit side by side.

Question shape	Best tool	Why
"What is our refund policy?"	Vector	Answer lives in one similar chunk.
"Who manages the author of X?"	Graph	Multi-hop relationship traversal.
"What depends on this service?"	Graph	A question about edges, not content.
"Summarise themes across this author's work"	Graph + summaries	Aggregation over a connected set.
"Explain how OAuth works"	Vector	Conceptual; similarity is enough.

My take. Graph RAG is the most over-reached-for technique in this entire series. It's intellectually attractive and genuinely powerful for the narrow band of relationship and multi-hop questions — and a heavyweight, costly, staleness-prone mistake for the conceptual questions that make up most real traffic. Before building a graph, count how many of your actual queries are truly multi-hop relationship questions. If it's a handful, decomposition (Chapter 08) plus good vector retrieval probably handles them more cheaply than a graph you have to extract, clean, and keep fresh. Build the graph only when relationship questions are the core of what your users ask, not an occasional case.

When this fails

Building a graph for conceptual questions. If your users mostly ask "how does X work," a graph is expensive overhead that vector search already handles better. Match the tool to the question shape.
Entity resolution neglected. If "Aanya", "A. Krishnamurthy", and "the manager" become three separate nodes, traversal breaks — the chain never connects. Entity resolution (merging names that refer to the same thing) is the make-or-break step most prototypes skip.
Trusting lossy extraction. LLM triple extraction misses relationships and invents others. A graph built on unverified extraction is confidently wrong in ways that are hard to spot. Sample and verify extractions; measure against known relationships.
Letting the graph go stale. When source documents change, the extracted graph is out of date — and a wrong relationship is worse than a missing chunk. Re-extract on change, and accept that this is costly.
Graph-only, no vector. Using the graph for everything throws away vector search's strength on the conceptual majority of queries. Use vectors to find entry points and answer similarity questions; reserve the graph for relationships.

Practice — before you read the next chapter

Classify your queries

Take fifty real queries and sort them: conceptual/similarity, versus genuine multi-hop or relationship questions. The proportion in the second bucket is your honest justification (or lack of one) for building a graph. For most products it's small — and that's the most important finding this chapter can give you.

Extract a tiny graph by hand

Take three related documents and extract the (subject, relation, object) triples yourself. Notice how much judgement entity naming takes, and how often two documents describe the same relationship slightly differently. That friction, multiplied by your whole corpus and automated with an imperfect LLM, is the real cost of graph RAG.

Answer one multi-hop question both ways

Take a genuine multi-hop question and try answering it with query decomposition plus vector retrieval (Chapter 08), then imagine answering it with a graph traversal. For two hops, decomposition often wins on cost and simplicity. Knowing where it stops being enough tells you exactly when a graph earns its keep.

Takeaways

Vector search retrieves by similarity and structurally cannot answer multi-hop, relationship, or cross-connection-aggregation questions. Those need a graph.
A knowledge graph is entities joined by typed relationships (triples), queried by traversal rather than similarity.
The hard part is building it: LLM entity/relationship extraction is slow, lossy, and demands serious entity resolution. Traversal is the easy 10%.
Combine the two — vectors to find entry points and answer conceptual questions, the graph to follow relationships; pre-computed community summaries for broad thematic questions.
Graph RAG is widely over-adopted. Build it only when relationship and multi-hop questions are the core of your traffic, not an occasional case — otherwise decomposition plus vectors is cheaper and fresher.

Next chapter: Production — latency, cost, freshness, caching. The specialised verticals are done; now we harden the system for the real world. Where the milliseconds and the dollars actually go, how to cut both with caching, and how to keep an index fresh without rebuilding it every night.

Discussion

RAG for code — AST-aware, symbol-aware, repo-scale Production — latency, cost, freshness, caching