Indexing and Performance Optimization in SQL - Interview Questions and Answers
An index is a database structure that improves query performance by enabling faster data retrieval.
Example: Creating an index on Aadhaar_Number
for fast lookups
CREATE INDEX idx_aadhaar ON citizens(aadhaar_number);
- Clustered Index: Data is physically sorted based on the index. (Each table can have only one clustered index.)
- Non-clustered Index: Stores pointers to the data, allowing multiple indexes.
- Unique Index: Ensures no duplicate values in a column.
- Full-text Index: Used for searching text-based data efficiently.
Instead of scanning the entire table, an index lets the DBMS find rows faster, reducing I/O operations.
Example: Searching for an PAN number:
SELECT * FROM tax_payers WHERE pan_number = 'ABCDE1234F';
With an index on pan_number
, lookup is instant, instead of scanning millions of records.
- A clustered index determines the physical order of rows in a table.
- Example: An Indian Railways Reservation System, where
PNR_Number
is a clustered index:
CREATE CLUSTERED INDEX idx_pnr ON railway_tickets(pnr_number);
A non-clustered index keeps a separate structure from the data and stores pointers
Example: A bank database where account numbers need quick search
CREATE NONCLUSTERED INDEX idx_acc ON bank_accounts(account_number);
- Use a clustered index for columns frequently used in sorting/searching (e.g., Aadhaar numbers, PNR numbers).
- Use a non-clustered index when a column is used only for lookups but not sorting (e.g., bank account balance).
Only one, since it defines the physical order of data.
A table can have multiple non-clustered indexes.
CREATE INDEX idx_product_name ON products(product_name);
CREATE INDEX idx_category ON products(category);
A covering index contains all the columns needed for a query, avoiding additional table lookups.
Example:
CREATE INDEX idx_cover ON transactions(customer_id, transaction_amount, transaction_date);
An index on multiple columns to optimize specific queries.
Example:
CREATE INDEX idx_order ON orders(customer_id, order_date);
- Slows down INSERTs/UPDATEs as indexes need updating.
- Consumes more storage.
Indexes speed up SELECT queries but slow down INSERT
, UPDATE
, and DELETE
because every index must also be updated
Example: In a bank_accounts
table, if we update balance
, a non-clustered index on balance
must also be updated:
UPDATE bank_accounts SET balance = balance + 500 WHERE account_number = '1234567890';
Too many indexes = performance degradation for updates.
- Implicit Index: Created automatically (e.g., Primary Key and Unique constraints).
- Explicit Index: Created manually using
CREATE INDEX
.
Example (Implicit Index):
CREATE TABLE employees (
emp_id INT PRIMARY KEY, -- Automatically indexed
emp_name VARCHAR(50)
);
Example (Explicit Index)
CREATE INDEX idx_emp_name ON employees(emp_name);
Speeds up lookups by reducing the number of rows scanned.
Example: Searching for an PAN number in an income tax database:
SELECT * FROM tax_payers WHERE pan_number = 'ABCDE1234F';
With an index on pan_number
, lookup is instant, avoiding a full table scan.
- On small tables (as full table scans may be faster).
- On highly updated columns (as updates slow down).
- On columns with many duplicate values (like
gender
orstatus
).
Indexing computed values to optimize function-based queries.
Example: Indexing on UPPERCASE email in an banking system
SELECT * FROM customers WHERE UPPER(email) = 'RAHUL@GMAIL.COM';
This optimizes case-insensitive searches
SELECT * FROM customers WHERE UPPER(email) = 'RAHUL@GMAIL.COM';
Example: Dropping an index on pan_number
:
DROP INDEX idx_pan ON tax_payers;
MySQL
SHOW INDEX FROM bank_accounts;
PostgreSQL
SELECT indexname FROM pg_indexes WHERE tablename = 'bank_accounts';
Used for fast searching in text-heavy columns.
Example: Searching in a large database of Indian legal case records
CREATE FULLTEXT INDEX idx_case_desc ON court_cases(case_description);
Query optimized for full-text search
SELECT * FROM court_cases WHERE MATCH(case_description) AGAINST ('property dispute');
Indexing only a subset of rows for performance optimization.
Example: Indexing only active users in an e-commerce system:
CREATE INDEX idx_active_users ON users(email) WHERE status = 'active';
This makes searching faster for active users while ignoring inactive ones.
MySQL
SHOW INDEX FROM bank_accounts;
PostgreSQL
SELECT * FROM pg_stat_user_indexes WHERE relname = 'transactions';
Index fragmentation happens when indexes become inefficient due to frequent updates/deletes.
Solution:
Rebuild the index in SQL Server
ALTER INDEX idx_acc_number ON bank_accounts REBUILD;
Optimize in MySQL
OPTIMIZE TABLE bank_accounts;
Reverses the stored value to improve lookup speeds for sequential values (e.g., Aadhaar numbers).
Example:
CREATE INDEX idx_reverse_phone ON users(REVERSE(phone_number));
Yes, but it depends on the database system.
PostgreSQL example (Indexing a JSON field in an Indian Aadhaar verification database):
CREATE INDEX idx_json ON citizens USING gin(aadhaar_details);
This speeds up queries on JSON fields.
It shows how a query is executed, helping identify bottlenecks.
Example:
EXPLAIN SELECT * FROM bank_transactions WHERE transaction_id = 'TXN12345';
- Look for Full Table Scans - Use indexes.
- Identify high-cost joins - Optimize with proper keys.
WHERE
filters before aggregation, while HAVING
filters after aggregation.
Example:
SELECT state, COUNT(*) FROM citizens GROUP BY state HAVING COUNT(*) > 100000;
Index the column used in ORDER BY
.
Example:
CREATE INDEX idx_salary ON employees(salary);
SELECT * FROM employees ORDER BY salary DESC;
Use indexes on JOIN keys.
Example (E-commerce database):
CREATE INDEX idx_customer ON orders(customer_id);
SELECT o.order_id, c.customer_name
FROM orders o
JOIN customers c ON o.customer_id = c.customer_id;
CREATE INDEX idx_upi ON transactions(upi_id, transaction_time);
CREATE INDEX idx_aadhaar ON citizens(aadhaar_number);
Partitioning divides large tables into smaller pieces for better performance.
Example: Partitioning UPI transactions by month:
CREATE TABLE upi_transactions (
transaction_id INT,
upi_id VARCHAR(20),
transaction_date DATE
) PARTITION BY RANGE (YEAR(transaction_date));
Normalization reduces redundancy but can slow JOIN-heavy queries.
- Stores query results to speed up repeated queries.
- Example: Frequently used UPI report queries should be cached.
It fetches unnecessary columns, increasing execution time.
Indexed columns improve GROUP BY performance.
SELECT * FROM orders FORCE INDEX(idx_customer) WHERE customer_id = 101;
- Partitioning divides a large table into smaller manageable parts.
- Indexes apply to each partition separately, improving query performance.
Example: Partitioning a sales table by year in an e-commerce database:
CREATE TABLE sales (
sale_id INT,
sale_date DATE,
amount DECIMAL(10,2),
region VARCHAR(50)
) PARTITION BY RANGE (YEAR(sale_date)) (
PARTITION p2022 VALUES LESS THAN (2023),
PARTITION p2023 VALUES LESS THAN (2024)
);
Effect: Queries for 2023
skip irrelevant partitions, improving speed.
- Ordering matters in composite indexes.
- The most frequently filtered column should come first.
Example:
CREATE INDEX idx_customer_region ON orders(customer_id, region);
If queries mostly filter by customer_id
, the index is efficient
SELECT * FROM orders WHERE customer_id = 101;
But if filtering by region
, the index may not be used efficiently.
Indexed Views store results physically, reducing computation on large datasets.
Example: Precomputed total sales per city in an Indian retail system
CREATE VIEW sales_summary AS
SELECT city, SUM(amount) AS total_sales
FROM sales GROUP BY city;
CREATE UNIQUE CLUSTERED INDEX idx_sales_summary ON sales_summary(city);
Benefit: Faster aggregated queries without recalculating each time.
- Index SEEK: Uses an index to quickly find data.
- Index SCAN: Reads the entire index/table.
Example:
SEEK (Efficient)
SELECT * FROM customers WHERE customer_id = 123;
SCAN (Inefficient on large tables)
SELECT * FROM customers WHERE LOWER(name) = 'rahul';
Solution: Create a function-based index
CREATE INDEX idx_lower_name ON customers(LOWER(name));
A covering index includes all columns needed for a query, avoiding extra lookups.
Example: Optimizing customer_orders
query in an Indian UPI payments database.
CREATE INDEX idx_orders ON transactions (customer_id, order_date, amount);
Now this query is fully covered
SELECT order_date, amount FROM transactions WHERE customer_id = 567;
No need to access the table, only the index!
- Probabilistic data structure used for fast lookups in large datasets.
- Reduces disk IO by eliminating unnecessary searches.
- Used in distributed SQL engines (e.g., Apache Spark, PostgreSQL).
- Example: Fast Aadhaar number lookups in voter databases.
Indexing JSON fields optimizes searching inside JSON data
Example: Indexing customer preferences stored in JSON
CREATE INDEX idx_json ON users USING GIN (preferences);
Query optimized for a specific key
SELECT * FROM users WHERE preferences @> '{"language": "Hindi"}';
Use columnstore indexes for analytical workloads.
Example: Optimizing Indian stock market transactions:
CREATE COLUMNSTORE INDEX idx_trades ON stock_trades (trade_date, stock_id, volume);
Benefit: Compresses data, speeds up aggregation queries.
Use pg_stat_user_indexes
to check index usage:
SELECT indexrelname, idx_scan, idx_tup_read, idx_tup_fetch
FROM pg_stat_user_indexes WHERE relname = 'transactions';
High idx_scan
count = Index is effective.
Horizontal Partitioning: Splitting rows (e.g., by state in India)
CREATE TABLE customers_delhi AS SELECT * FROM customers WHERE state = 'Delhi';
Vertical Partitioning: Splitting columns (e.g., separating frequently used columns)
CREATE TABLE customer_contact AS SELECT customer_id, phone, email FROM customers;
- Solution 1: Add indexes for frequent filters.
- Solution 2: Use
LIMIT
for large datasets.
Example: Query optimized for a large tax_payers
table in India
SELECT * FROM tax_payers WHERE state = 'Maharashtra' LIMIT 100;
- MySQL Query Cache stores the result of SELECT queries.
- Redis can cache database results for repeated queries.
Example: Caching an Indian movie ticket booking report
SET SESSION query_cache_type = ON;
SELECT * FROM bookings WHERE city = 'Mumbai';
Next time, it retrieves data instantly from cache.
Sharding splits data across multiple databases for scalability.
Example: Indian railway booking system.
CREATE TABLE bookings_shard_1 PARTITION OF bookings FOR VALUES FROM ('1') TO ('500000');
CREATE TABLE bookings_shard_2 PARTITION OF bookings FOR VALUES FROM ('500001') TO ('1000000');
Indexing per shard speeds up queries on large distributed datasets.
Tutorials
Random Blogs
- Why to learn Digital Marketing?
- Data Analytics: The Power of Data-Driven Decision Making
- Important Mistakes to Avoid While Advertising on Facebook
- Variable Assignment in Python
- Datasets for analyze in Tableau
- OLTP vs. OLAP Databases: Advanced Insights and Query Optimization Techniques
- Python Challenging Programming Exercises Part 1
- Where to Find Free Datasets for Your Next Machine Learning & Data Science Project
- Understanding AI, ML, Data Science, and More: A Beginner's Guide to Choosing Your Career Path
- Top 10 Blogs of Digital Marketing you Must Follow