On this tutorial

Core Concepts

Control & Refinement

How Context Windows Work and Why They Matter

The context window is the AI's working memory. It is the single most important constraint to understand once you start working on real projects — long documents, long conversations, large amounts of data.

1. Introduction

You may have noticed that AI sometimes "forgets" what you said earlier, or refuses to read a giant document. That is the context window at work. This tutorial shows what it is, how big it gets in modern models, and the practical habits that let you make the most of it.

2. The Concept Explained

Every AI model has a fixed limit on how many tokens it can pay attention to at once. That limit — measured in tokens — is the context window. Everything you send (the system prompt, the conversation history, your latest message, plus any attached documents) shares that same window. The reply the AI generates also counts towards it.

When the conversation gets too long, the oldest messages start falling out of the window. The AI does not "remember" them anymore. This is why long chats can feel forgetful.

The window holds your system prompt, history, latest input, and the reply — all together.

Common context window sizes (rough)

Small (4K – 16K tokens): Older or budget models. Roughly 3K – 12K words. Fine for chats, short documents.
Medium (32K – 128K tokens): Most modern chat models. Can handle a long article, a chapter of a book, or thousands of lines of code.
Large (200K – 1M+ tokens): Long-context models. Can ingest entire books, whole codebases, or huge transcripts in one shot.

The exact numbers move every few months. The principle does not.

3. What Goes Wrong Without Awareness

Long chats start "forgetting" early instructions or facts.
Pasting a huge document leaves no room for the reply — output gets cut off.
Important context gets buried in the middle of the window where attention is weakest.
Costs balloon: in APIs, you pay per token in and out, so unnecessarily long contexts cost real money.

4. Practical Habits for Working With Context Windows

1 — Paste only what is relevant

If your document is 50 pages but only chapter 3 matters, paste only chapter 3. Don't dump everything and hope the AI sorts it out.

2 — Summarise older history when chats get long

Ask the AI itself:

Summarise everything important from our chat so far into a single message I can reuse as context.

Then start a fresh chat with that summary at the top.

3 — Put critical info at the start AND the end

Attention is strongest at the edges. Repeat your most important constraint near the bottom of a long prompt.

4 — Reserve room for the reply

If you ask for a 2,000-word essay, you need around 2,600 tokens of free space. Leave it.

5 — Use file attachments wisely

Many tools turn attached PDFs into tokens behind the scenes. A 100-page PDF can eat tens of thousands of tokens.

Filling the window blindly

Here's a 200-page PDF — please find the
section about pricing strategy and rewrite
it in our company tone.

The AI may run out of space, miss the relevant section, or produce a truncated answer.

Working within the window

Below is the pricing strategy section
(pages 47–52) from our internal handbook.

Rewrite it in our company tone:
- friendly, plain English
- short paragraphs
- bullet lists where helpful
- no jargon

Section:
"""
… paste only pages 47–52 here …
"""

Smaller, focused context. The AI now has plenty of room to write a good rewrite.

5. Step-by-Step: Long-Conversation Survival Kit

Notice when the chat feels "forgetful". That is the window getting full.
Ask the AI to summarise the chat so far. One short summary captures the essentials.
Open a fresh chat and paste the summary first. You just compressed 30 messages into one.
Continue working with full context restored. Repeat as needed.
For huge documents, split them. Process chunk by chunk and stitch results together.

6. Practice Exercises

Exercise 1

Look up the context window of the AI tool you use most. Estimate how many words that is (tokens × 0.75).

Exercise 2

Take a long chat you have had with AI. Ask it:

Summarise everything important from this conversation into a single message I can paste into a new chat.

Use the summary to start fresh.

Exercise 3

Take a 10-page document. First paste the whole thing and ask a specific question. Then paste only the single relevant page and ask the same question. Compare quality.

7. Key Takeaways

The context window is a hard limit on what the AI can "see" at once.
System prompt + history + your input + reply all share the same window.
When the window fills, older content drops off — that is the source of "forgetting".
Paste only what is relevant. Summarise long histories. Put key info at the edges.
Bigger windows are not always better — focused context beats dumped context every time.

Discussion

Temperature, Top-P and Other AI Parameters Explained Simply Iterative Prompting: How to Refine and Improve Your Prompts