Everything in this series so far has retrieved by similarity — find the chunks that mean something like the question. But a whole class of real questions isn't about similarity at all; it's about relationships. "Who reports to the person who approved this budget?" "What downstream services depend on the database we're about to migrate?" "How is this customer connected to that fraud case?" No amount of semantic similarity answers these, because the answer isn't in any single chunk — it's in the connections between chunks, traversed across several hops. This is where a knowledge graph beats a vector index. It's also where teams most often build elaborate machinery they didn't need, so this chapter is as much about when not to as how.
Vector search retrieves a neighbourhood — chunks near the query in meaning. That's perfect for "what is our refund policy" and useless for three shapes of question:
For these, you need a structure that stores the relationships explicitly and lets you traverse them — which is exactly what a knowledge graph is.
Strip away the mystique and a knowledge graph is just entities connected by typed relationships, usually stored as triples: (subject, relationship, object). "(charge_card, calls, gateway.send)", "(Aanya, manages, Ravi)", "(OrderService, depends_on, PaymentDB)". Each triple is one edge. Collect enough of them and you have a graph you can traverse: start at a node, follow edges of the type you care about, arrive at the answer. The query language is traversal — "from this node, follow depends_on edges outward" — not similarity.
Here's the catch that the demos gloss over: you usually don't have a graph. Your data is unstructured documents. To get a graph, you must extract entities and relationships from that text — typically by asking an LLM to read each chunk and emit triples. That extraction step is the expensive, lossy heart of graph RAG.
# Extract (subject, relation, object) triples from text with an LLM,
# then traverse the resulting graph to answer a relationship question.
import json
from collections import defaultdict
EXTRACT = """Extract factual relationships from the text as a JSON array of
[subject, relation, object] triples. Use consistent entity names. Only
relationships explicitly stated. JSON array only.
Text: {text}"""
def extract_triples(text, llm):
raw = llm.complete(EXTRACT.format(text=text)).strip()
try:
triples = json.loads(raw)
return [tuple(t) for t in triples if len(t) == 3]
except (json.JSONDecodeError, TypeError):
return [] # extraction is fallible — never crash
def build_graph(chunks, llm):
edges = defaultdict(list) # subject -> [(relation, object)]
for chunk in chunks:
for s, r, o in extract_triples(chunk, llm):
edges[s].append((r, o))
return edges
def traverse(edges, start, relation, hops=2):
"""Follow `relation` edges outward from `start`, up to `hops` deep."""
frontier, seen, results = [start], {start}, []
for _ in range(hops):
nxt = []
for node in frontier:
for r, obj in edges.get(node, []):
if r == relation and obj not in seen:
seen.add(obj); results.append((node, r, obj)); nxt.append(obj)
frontier = nxt
return results
text = "Ravi wrote the auth module. Aanya manages Ravi. Aanya manages Sara."
edges = build_graph([text], llm)
# who manages the author of the auth module? hop: module -> author -> manager
print(traverse(edges, "Ravi", "managed_by", hops=1) or "follow author->manager")
The code is simple; the difficulty is everything around it. Entity names drift ("Aanya", "A. Krishnamurthy", "the manager" — are they one node or three?). Relationships extracted from different chunks contradict each other. Extraction over a large corpus means an LLM call per chunk, which is slow and costly. And the graph goes stale exactly like any other index when the source changes. The traversal is the easy 10%; building and maintaining a clean graph is the hard 90%.
The strongest systems don't choose. They use vector search to find the entry points — the entities most relevant to the query — then graph traversal to follow the relationships from there. Vector retrieval answers "which nodes are relevant"; the graph answers "what's connected to them." A common pattern also pre-computes summaries of densely-connected regions ("communities") so that broad questions — "what are the major themes here?" — can be answered from region summaries rather than by traversing millions of edges live. Local, specific questions use traversal; global, thematic questions use the summaries. The vector index from Chapter 05 and the graph sit side by side.
| Question shape | Best tool | Why |
|---|---|---|
| "What is our refund policy?" | Vector | Answer lives in one similar chunk. |
| "Who manages the author of X?" | Graph | Multi-hop relationship traversal. |
| "What depends on this service?" | Graph | A question about edges, not content. |
| "Summarise themes across this author's work" | Graph + summaries | Aggregation over a connected set. |
| "Explain how OAuth works" | Vector | Conceptual; similarity is enough. |
My take. Graph RAG is the most over-reached-for technique in this entire series. It's intellectually attractive and genuinely powerful for the narrow band of relationship and multi-hop questions — and a heavyweight, costly, staleness-prone mistake for the conceptual questions that make up most real traffic. Before building a graph, count how many of your actual queries are truly multi-hop relationship questions. If it's a handful, decomposition (Chapter 08) plus good vector retrieval probably handles them more cheaply than a graph you have to extract, clean, and keep fresh. Build the graph only when relationship questions are the core of what your users ask, not an occasional case.
Take fifty real queries and sort them: conceptual/similarity, versus genuine multi-hop or relationship questions. The proportion in the second bucket is your honest justification (or lack of one) for building a graph. For most products it's small — and that's the most important finding this chapter can give you.
Take three related documents and extract the (subject, relation, object) triples yourself. Notice how much judgement entity naming takes, and how often two documents describe the same relationship slightly differently. That friction, multiplied by your whole corpus and automated with an imperfect LLM, is the real cost of graph RAG.
Take a genuine multi-hop question and try answering it with query decomposition plus vector retrieval (Chapter 08), then imagine answering it with a graph traversal. For two hops, decomposition often wins on cost and simplicity. Knowing where it stops being enough tells you exactly when a graph earns its keep.
Next chapter: Production — latency, cost, freshness, caching. The specialised verticals are done; now we harden the system for the real world. Where the milliseconds and the dollars actually go, how to cut both with caching, and how to keep an index fresh without rebuilding it every night.
Sign in to join the discussion and post comments.
Sign in