In this project you will turn a raw CSV into a polished data analysis report — profile, hypotheses, insights, charts, and an executive summary — using a chain of prompts. The deliverable is a Markdown or PDF report that reads like work from a junior analyst, generated from a dataset and a structured brief.
Analysts spend most of their time not on analysis but on framing — understanding the data, picking the right questions, and writing up findings so a non-technical reader cares. AI can compress all three. You still need to think; the AI just removes the boilerplate.
We will use a fictional but realistic dataset: orders.csv from a small e-commerce business with 12 columns and 18 months of order data. The workflow generalises to any tabular data — sales, marketing, HR, support tickets.
A good analysis report has five parts: profile the data, frame hypotheses, run analyses, visualise the findings, and write the executive summary. Each maps to a separate prompt. Running them as a chain keeps the model focused and gives you a checkpoint at every stage.
One-shot analysis
Here is my CSV. Analyse it and write a report.
What you get is a soft, generic essay full of phrases like "the data suggests interesting trends". There are no specific numbers, no charts, no hypotheses, and worst of all — no auditable calculations. The model has to guess everything about your business in one shot.
Step 1 — Profile prompt
You are a senior data analyst. Profile the dataset below.
Return:
- Row count, column count
- For each column: data type, % missing, 3-row sample, and any
obvious anomaly (e.g. negative prices, future dates)
- A "data quality flags" section with ≤5 issues that would block
analysis
Then propose 5 sensible cleaning steps. Do not run them yet.
Context:
- Business: small e-commerce store selling home decor
- 18 months of order data
- Goal of the analysis: understand which months and product lines
drive revenue
Dataset (first 20 rows shown, full file attached):
"""
order_id,order_date,customer_id,country,product_id,product_line,
units,unit_price,discount_pct,shipping_cost,returned,refund_amount
1001,2024-09-12,C204,UK,P-DEC-019,Lighting,2,29.99,0,4.5,FALSE,0
...
"""
The model returns a clean profile with column types, missing-value percentages, and three or four concrete data-quality flags. You read it and either approve the cleaning steps or adjust them.
Step 2 — Hypothesis prompt
Based on the profile above and our business goal
(understand which months and product lines drive revenue),
propose 6 hypotheses worth testing. For each:
- one-line statement
- the columns and aggregation needed
- a falsifiable expected outcome
- priority (high / medium / low) with one-line reason
Avoid trivial hypotheses ("higher units = higher revenue"). Look
for hypotheses that, if true or false, change a business decision.
Sample output (abbreviated): "H1: Revenue from Lighting drops in Q1 by > 30%. Columns: order_date, product_line, units × unit_price × (1 − discount_pct). Priority: high — affects Q1 stocking…" You now have a real research plan instead of a vague "analyse it".
Step 3 — Analyses prompt
For each high-priority hypothesis, do two things:
1) Generate the pandas (or DuckDB) code that would test it.
Use the column names from the profile. Annotate each line.
2) Given the summary statistics I'll paste back below, write a
2-sentence "what we found" verdict per hypothesis. State
explicitly whether the hypothesis was supported, partially
supported, or rejected.
When stating any number, round to 1 decimal and include units
(£, %, units). Never write "approximately" — write a number.
You run the generated code yourself in a notebook (or ask a code-capable AI to run it), then paste the resulting tables back in. The model writes the verdicts. This separation — model writes code, you run it, model interprets results — is what makes the analysis trustworthy.
Step 4 — Charts prompt
For each supported or partially-supported hypothesis, design one
chart. Return:
- Chart type and why (e.g. line chart — time series; bar chart —
ranked categorical values)
- The matplotlib code to generate it
- A 1-sentence "alt text" describing what the chart shows
- A 1-sentence caption suitable for a business reader
Constraints:
- One idea per chart. No double-axes.
- Title states the finding, not the variable.
Good: "Lighting revenue drops 38% in Q1"
Bad: "Revenue by product line over time"
The titles-as-findings rule is the single biggest upgrade you can make to any business chart. The AI will follow it once you tell it.
Step 5 — Executive summary prompt
Write the executive summary for the report. Audience: the founder,
non-technical, 5 minutes to read.
Structure:
- 1-line headline finding
- 3 bullet "what we learned"
- 3 bullet "what to do next" (each must be a concrete action,
not a platitude)
- 1 paragraph on limitations and what to investigate later
Use the hypotheses verdicts and chart titles as your source of
truth. Do not introduce new numbers. Do not use the words
"leverage", "synergy", or "in conclusion".
Stitch the outputs of steps 1–5 into one Markdown file: profile → hypotheses → analyses → charts → executive summary on top. That's the report.
Find a public dataset (Kaggle, the UK government open data portal, your own export from a tool you use). Run the profile prompt and read the output carefully. Note any anomalies the model spotted that you would have missed.
Generate hypotheses for the same dataset twice — once with a vague "find interesting insights" prompt, and once with the structured hypothesis prompt above. Compare the two lists. The contrast is the most powerful argument for using a chain.
Add a "robustness check" step between analyses and charts: "For each finding, list three reasons it might be wrong (data quality, sample bias, confounders) and one quick check that would rule each out." This single step transforms naive analysis into defensible analysis.
Sign in to join the discussion and post comments.
Sign inAdvanced Prompt Engineering Techniques
Master the powerful techniques AI experts use every day. Chain-of-thought, RAG, agents, function calling, prompt evaluation, and much more — 20 deep-dive tutorials.
Prompt Engineering for Education & Learning
Use AI as your personal tutor. Learn how to study faster, create lesson plans, generate practice questions, master languages, and prepare for competitive exams with smart prompts.
Prompt Engineering for Data Science & Analytics
Supercharge your data workflows with AI. 15 practical tutorials on using prompt engineering for data cleaning, EDA, machine learning, SQL, visualisation, and more.
Prompt Engineering for Specific AI Tools
Tool-by-tool mastery — deep dives into ChatGPT, Claude, Gemini, GitHub Copilot, Midjourney, Stable Diffusion, and more. Learn the exact prompting techniques each platform rewards.
Prompt Engineering for Developers
Use AI as your coding co-pilot. 18 tutorials on writing prompts to generate clean code, debug faster, write tests, build APIs, and ship better software.
Prompt Engineering for Content & Copywriting
Write blogs, ads, emails, and social media content ten times faster with AI. 13 practical tutorials on prompt engineering for content creators and copywriters.