Writing queries is one of the highest-value uses of AI — joins, aggregations, and pipeline stages are exactly the kind of repetitive logic AI handles well. But query generation is also where AI hallucinates the most, inventing column names and functions that do not exist. The fix is rigorous schema-grounding.
Whether you are writing a join across five tables in PostgreSQL or an aggregation pipeline in MongoDB, the same principle holds: AI needs to know exactly what your data looks like before it can query it. This tutorial walks through the schema-first prompt pattern that produces queries you can paste straight into your database client, for both SQL and NoSQL stacks.
A query has three parts: the shape of the data (schema), the question being asked (intent), and the flavour of database (dialect). Skipping any of these forces the AI to guess. Skipping all three is why "write me a query for sales" tends to return SQL that references a sales table you do not have.
Think of yourself as the human translator. You know the data; the AI knows the language. You hand it the dictionary (schema), say what you want to say (intent), and tell it which language to translate into (dialect). Skip any of those and you get nonsense, no matter how fluent the AI sounds.
The shortcut is to describe the query in plain English and hope for the best. The AI then hallucinates plausible-sounding column names that almost never match your schema.
Weak prompt
write a sql query to get the top 5 customers by total purchase
amount in the last 30 days
You will get a confident-looking query that references columns like customers.name and orders.total — names the AI guessed. When you paste it into your client, you get "column does not exist" errors and have to rewrite half of it.
Strong prompt (SQL)
Dialect: PostgreSQL 15
Tables (only relevant columns shown):
customers (
id uuid PRIMARY KEY,
display_name text NOT NULL,
email text NOT NULL,
created_at timestamptz NOT NULL
)
orders (
id uuid PRIMARY KEY,
customer_id uuid REFERENCES customers(id),
status text CHECK (status IN ('pending','paid','refunded')),
total_cents integer NOT NULL,
placed_at timestamptz NOT NULL
)
Goal: return the top 5 customers by total of `paid` order amounts in the
last 30 days. Include customer id, display_name, total spend in dollars
(2 decimal places), and order count.
Constraints:
- Use `now() - interval '30 days'` for the window
- Only `status = 'paid'` orders count
- Order by total spend desc, then by display_name asc as tie-breaker
- Return at most 5 rows
- Use a CTE if it improves readability
- Parameterise nothing — return the literal query I can paste into psql
The schema is grounded, the intent is precise, the dialect is named, and the output format is specified. The AI returns a query that runs on the first try.
Strong prompt (NoSQL — MongoDB)
Database: MongoDB 7 aggregation pipeline (using the Node driver, but I just
want the pipeline array I can paste in)
Collection: orders
Sample document:
{
_id: ObjectId,
customerId: ObjectId,
status: "paid" | "pending" | "refunded",
totalCents: 4250,
placedAt: ISODate,
customer: { displayName: string, email: string } // denormalised
}
Goal: top 5 customers by total spend of `paid` orders in the last 30 days.
Output an aggregation pipeline (array of stages) that returns documents like:
{ customerId, displayName, totalDollars (number, 2 decimals), orderCount }
Sort by totalDollars desc, then displayName asc. Limit 5.
For NoSQL, sharing one realistic sample document is more useful than a formal schema — it shows nesting, naming, and types in one go.
Tip: For production queries, always ask the AI to flag any operations that could scan the entire table. This catches missing indexes and accidental cartesian joins before they hit your database.
Pick a real query you wrote in the past month. Reconstruct the prompt that would have generated it. Run that prompt. Compare AI's version to your hand-written one — which is clearer? Which is faster? Which handles edge cases better?
Take any reporting question your team gets often ("show me X grouped by Y over Z period"). Generate the SQL once with schema, once without. Note how many corrections the no-schema version needs.
For a query you already have, paste it back to the AI with: "Identify any performance risks. Suggest indexes or rewrites that would help, and explain the trade-offs." Try one of its suggestions in a non-production environment.
Sign in to join the discussion and post comments.
Sign inPrompt Engineering for Education & Learning
Use AI as your personal tutor. Learn how to study faster, create lesson plans, generate practice questions, master languages, and prepare for competitive exams with smart prompts.
Prompt Engineering for Content & Copywriting
Write blogs, ads, emails, and social media content ten times faster with AI. 13 practical tutorials on prompt engineering for content creators and copywriters.
Prompt Engineering for Image Generation
Turn words into stunning visuals. Master AI image generation tools like Midjourney, DALL·E 3, and Stable Diffusion with 18 focused tutorials — from first prompt to full brand identity.
Prompt Engineering for Specific AI Tools
Tool-by-tool mastery — deep dives into ChatGPT, Claude, Gemini, GitHub Copilot, Midjourney, Stable Diffusion, and more. Learn the exact prompting techniques each platform rewards.
Prompt Engineering for Business & Productivity
Use AI to work smarter — automate tasks, make better decisions, and communicate professionally. 12 practical business prompt tutorials for professionals.
Prompt Engineering for Data Science & Analytics
Supercharge your data workflows with AI. 15 practical tutorials on using prompt engineering for data cleaning, EDA, machine learning, SQL, visualisation, and more.