Advanced: LangChain Prompt Templates and Chains

LangChain is the most widely used framework for stringing prompts, models, retrievers, and tools into reliable pipelines. Used well, it removes a lot of glue code. Used badly, it adds a confusing layer on top of an already-confusing problem. This tutorial covers what LangChain actually gives you, the parts worth learning, and when plain Python is the better tool.

1. Introduction

If you have built more than a handful of prompts, you have probably written the same code several times: a function that loads a template, fills in variables, calls a model, parses the output, retries on failure, and logs everything. LangChain's pitch is that this is plumbing — and it can be replaced with a small, composable vocabulary of prompt templates, chains, and runnables that snap together with a pipe operator.

This tutorial gives you a clear-eyed tour: the core primitives, the composition pattern, an end-to-end RAG pipeline, and an honest discussion of when LangChain is the right tool and when it adds more weight than it saves.

2. The Concept Explained

LangChain's modern API ("LangChain Expression Language" or LCEL) is built around three primitives:

PromptTemplate / ChatPromptTemplate. A parameterised prompt with named slots, just like Topic 14 but with a standard interface.
Runnable. Any component — a template, a model, a retriever, a parser, a tool — that exposes the same invoke / stream / batch methods.
Composition with |. Two runnables piped together produce a new runnable. The output of the first becomes the input of the second. Chains are just compositions of runnables.

The pipe operator is the centre of the API. Once you internalise that everything-is-a-runnable and the pipe means "send the output here", LangChain becomes mostly autocompleted Lego.

A LangChain RAG pipeline expressed as a single LCEL chain: each | sends the output of one runnable into the next.

3. The Problem Without a Framework

Hand-rolled glue

def answer(question):
    docs   = vector_store.search(question, k=4)
    ctx    = "\n\n".join(d.text for d in docs)
    prompt = SYSTEM_PROMPT.replace("{ctx}", ctx)\
                          .replace("{q}", question)
    raw    = openai.chat.completions.create(
                model=MODEL, messages=[{"role":"user","content":prompt}]
             ).choices[0].message.content
    try:
        return json.loads(raw)
    except Exception:
        # silently swallow, return None
        return None

Works fine on day one. By day fifty you have written four versions of this function across the codebase, each with subtly different retry, logging, and parsing behaviour. Nothing streams. Switching models means editing five files.

4. The Solution: LCEL Composition

Same pipeline, LCEL style

from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import JsonOutputParser
from langchain_openai import ChatOpenAI
from langchain_core.runnables import RunnableParallel

retriever = vector_store.as_retriever(search_kwargs={"k": 4})
model     = ChatOpenAI(model="gpt-x-class", temperature=0)
parser    = JsonOutputParser()

prompt = ChatPromptTemplate.from_messages([
    ("system",
     "Answer ONLY from the context. Reply as JSON: "
     "{{\"answer\": str, \"citations\": [str]}}.\n\n"
     "Context:\n{context}"),
    ("user", "{question}"),
])

def format_docs(docs):
    return "\n\n".join(f"[{d.metadata['id']}] {d.page_content}"
                       for d in docs)

chain = (
    RunnableParallel(
        context  = retriever | format_docs,
        question = lambda x: x,
    )
    | prompt
    | model
    | parser
)

result = chain.invoke("What is our wholesale return policy?")

The whole pipeline is one composable object. .invoke, .stream, .batch, and async variants come for free. Swapping models or retrievers is a one-line change. Logging, tracing, and retries are configurable in one place.

5. Step-by-Step Breakdown

Start with the prompt template. Define your ChatPromptTemplate first. The slot names you choose flow through the entire chain.
Pick the model and parser. Decide on the output shape (free text, JSON, Pydantic model) before composing. The parser is what guarantees the chain returns something your downstream code can use directly.
Compose with |. Read pipelines left-to-right like a Unix pipe. If a step needs multiple inputs (RAG needs both context and question), use RunnableParallel to produce a dict.
Use callbacks and tracing. Wire in LangSmith or your own callback handler from the start. Black-box pipelines are nightmares to debug; traced pipelines almost debug themselves.
Stream where it matters. User-facing chains benefit from .stream for perceived latency. The LCEL composition supports streaming end-to-end with no extra code.
Keep an exit ramp. Don't hide tokens, prompts, or model APIs behind so many layers that you can't drop down to raw HTTP when you need to. LangChain is most useful when you can still see the prompt that ultimately reached the model.

When NOT to use LangChain: If your project has one or two prompts, switches model providers rarely, and doesn't need RAG or agents, plain Python with a small prompt-template helper is usually clearer and easier to debug. LangChain is most valuable when you have multiple pipelines, multiple model providers, or need streaming, batching, and tracing across the whole system.

6. Practice Exercises

Exercise 1

Rebuild your simplest existing prompt as an LCEL chain: template → model → parser. Compare the lines of code, the readability, and the ease of switching models, to your previous implementation.

Exercise 2

Wire a parallel step into the chain — for example, fetch retrieved context and the user's previous answer at the same time using RunnableParallel. Pipe both into the prompt template. This pattern unlocks most of LangChain's real power.

Exercise 3

Add a tracing or callback handler that logs every prompt that hits the model. Run the chain on five questions. Inspect the logged prompts — you will almost certainly find one surprise. That is exactly the point of tracing.

7. Key Takeaways

LangChain's value is composition — everything is a runnable, and runnables snap together with |.
The three primitives to know are PromptTemplate, Runnable, and LCEL composition. Most production pipelines use only these.
Use it when you have multiple chains, multiple model providers, or need streaming, batching, and tracing across a system.
Avoid hiding prompts behind so many layers that you can't inspect or rewrite them quickly. Always keep an exit ramp to raw API calls.
For small, single-prompt projects, plain Python is often the better tool — frameworks are taxes you pay in exchange for scale benefits.

Discussion

Building AI Agents with Prompt Engineering Fundamentals The Future of Prompt Engineering: Where the Field is Heading