How to Use AI for Database Query Generation (SQL & NoSQL)

Writing queries is one of the highest-value uses of AI — joins, aggregations, and pipeline stages are exactly the kind of repetitive logic AI handles well. But query generation is also where AI hallucinates the most, inventing column names and functions that do not exist. The fix is rigorous schema-grounding.

1. Introduction

Whether you are writing a join across five tables in PostgreSQL or an aggregation pipeline in MongoDB, the same principle holds: AI needs to know exactly what your data looks like before it can query it. This tutorial walks through the schema-first prompt pattern that produces queries you can paste straight into your database client, for both SQL and NoSQL stacks.

2. The Concept Explained

A query has three parts: the shape of the data (schema), the question being asked (intent), and the flavour of database (dialect). Skipping any of these forces the AI to guess. Skipping all three is why "write me a query for sales" tends to return SQL that references a sales table you do not have.

Schema, intent, and dialect are the three inputs every query-generation prompt needs.

Think of yourself as the human translator. You know the data; the AI knows the language. You hand it the dictionary (schema), say what you want to say (intent), and tell it which language to translate into (dialect). Skip any of those and you get nonsense, no matter how fluent the AI sounds.

3. The Problem Without This Technique

The shortcut is to describe the query in plain English and hope for the best. The AI then hallucinates plausible-sounding column names that almost never match your schema.

Weak prompt

write a sql query to get the top 5 customers by total purchase
amount in the last 30 days

You will get a confident-looking query that references columns like customers.name and orders.total — names the AI guessed. When you paste it into your client, you get "column does not exist" errors and have to rewrite half of it.

4. The Solution

Strong prompt (SQL)

Dialect: PostgreSQL 15

Tables (only relevant columns shown):

customers (
  id            uuid PRIMARY KEY,
  display_name  text NOT NULL,
  email         text NOT NULL,
  created_at    timestamptz NOT NULL
)

orders (
  id            uuid PRIMARY KEY,
  customer_id   uuid REFERENCES customers(id),
  status        text CHECK (status IN ('pending','paid','refunded')),
  total_cents   integer NOT NULL,
  placed_at     timestamptz NOT NULL
)

Goal: return the top 5 customers by total of `paid` order amounts in the
last 30 days. Include customer id, display_name, total spend in dollars
(2 decimal places), and order count.

Constraints:
- Use `now() - interval '30 days'` for the window
- Only `status = 'paid'` orders count
- Order by total spend desc, then by display_name asc as tie-breaker
- Return at most 5 rows
- Use a CTE if it improves readability
- Parameterise nothing — return the literal query I can paste into psql

The schema is grounded, the intent is precise, the dialect is named, and the output format is specified. The AI returns a query that runs on the first try.

Strong prompt (NoSQL — MongoDB)

Database: MongoDB 7 aggregation pipeline (using the Node driver, but I just
want the pipeline array I can paste in)

Collection: orders
Sample document:
{
  _id: ObjectId,
  customerId: ObjectId,
  status: "paid" | "pending" | "refunded",
  totalCents: 4250,
  placedAt: ISODate,
  customer: { displayName: string, email: string }   // denormalised
}

Goal: top 5 customers by total spend of `paid` orders in the last 30 days.

Output an aggregation pipeline (array of stages) that returns documents like:
  { customerId, displayName, totalDollars (number, 2 decimals), orderCount }

Sort by totalDollars desc, then displayName asc. Limit 5.

For NoSQL, sharing one realistic sample document is more useful than a formal schema — it shows nesting, naming, and types in one go.

5. Step-by-Step Breakdown

State the dialect and version. PostgreSQL 15, MySQL 8, MongoDB 7, DynamoDB. Function names and syntax differ.
Paste the schema or a sample document. Tables and columns for SQL; one realistic document for document stores. Include types and constraints.
State the question in one sentence. "Top 5 customers by spend in last 30 days" is sharper than three paragraphs of explanation.
Pin down the edge rules. Tie-breakers, which statuses count, what time window, currency handling.
Specify the output shape. Column names, formatting, ordering, limits. This is the contract the query must meet.
Ask for an EXPLAIN review afterwards. Once the query runs, follow up: "Here is the EXPLAIN output — can you suggest an index that would speed this up?"

Tip: For production queries, always ask the AI to flag any operations that could scan the entire table. This catches missing indexes and accidental cartesian joins before they hit your database.

6. Practice Exercises

Exercise 1

Pick a real query you wrote in the past month. Reconstruct the prompt that would have generated it. Run that prompt. Compare AI's version to your hand-written one — which is clearer? Which is faster? Which handles edge cases better?

Exercise 2

Take any reporting question your team gets often ("show me X grouped by Y over Z period"). Generate the SQL once with schema, once without. Note how many corrections the no-schema version needs.

Exercise 3

For a query you already have, paste it back to the AI with: "Identify any performance risks. Suggest indexes or rewrites that would help, and explain the trade-offs." Try one of its suggestions in a non-production environment.

7. Key Takeaways

Query generation needs three inputs: schema, intent, and dialect. All three or you get hallucinations.
For SQL, paste relevant tables with column types. For NoSQL, paste one realistic sample document.
State edge rules explicitly: tie-breakers, included statuses, time windows.
Define the output shape before asking for the query — it sets the destination, not just the journey.
Use AI for query review and index suggestions, not just generation. The follow-up question is often the most valuable.

Discussion

Code Refactoring Prompts: Clean Up Legacy Code with AI Building REST APIs with AI-Assisted Prompt Workflows