Indexing and Performance Optimization in SQL - Interview Questions and Answers

An index is a database structure that improves query performance by enabling faster data retrieval.

Example: Creating an index on Aadhaar_Number for fast lookups

CREATE INDEX idx_aadhaar ON citizens(aadhaar_number);

 

  • Clustered Index: Data is physically sorted based on the index. (Each table can have only one clustered index.)
  • Non-clustered Index: Stores pointers to the data, allowing multiple indexes.
  • Unique Index: Ensures no duplicate values in a column.
  • Full-text Index: Used for searching text-based data efficiently.

Instead of scanning the entire table, an index lets the DBMS find rows faster, reducing I/O operations.

Example: Searching for an  PAN number:

SELECT * FROM tax_payers WHERE pan_number = 'ABCDE1234F';

With an index on pan_number, lookup is instant, instead of scanning millions of records.

  • A clustered index determines the physical order of rows in a table.
  • Example: An Indian Railways Reservation System, where PNR_Number is a clustered index:
CREATE CLUSTERED INDEX idx_pnr ON railway_tickets(pnr_number);

 

A non-clustered index keeps a separate structure from the data and stores pointers

Example: A bank database where account numbers need quick search

CREATE NONCLUSTERED INDEX idx_acc ON bank_accounts(account_number);

 

  • Use a clustered index for columns frequently used in sorting/searching (e.g., Aadhaar numbers, PNR numbers).
  • Use a non-clustered index when a column is used only for lookups but not sorting (e.g., bank account balance).

Only one, since it defines the physical order of data.

A table can have multiple non-clustered indexes.

CREATE INDEX idx_product_name ON products(product_name);
CREATE INDEX idx_category ON products(category);

 

A covering index contains all the columns needed for a query, avoiding additional table lookups.

Example:

CREATE INDEX idx_cover ON transactions(customer_id, transaction_amount, transaction_date);

 

An index on multiple columns to optimize specific queries.

Example:

CREATE INDEX idx_order ON orders(customer_id, order_date);

 

  • Slows down INSERTs/UPDATEs as indexes need updating.
  • Consumes more storage.

Indexes speed up SELECT queries but slow down INSERT, UPDATE, and DELETE because every index must also be updated

Example: In a bank_accounts table, if we update balance, a non-clustered index on balance must also be updated:

UPDATE bank_accounts SET balance = balance + 500 WHERE account_number = '1234567890';

Too many indexes = performance degradation for updates.

  • Implicit Index: Created automatically (e.g., Primary Key and Unique constraints).
  • Explicit Index: Created manually using CREATE INDEX.

Example (Implicit Index):

CREATE TABLE employees (
    emp_id INT PRIMARY KEY,  -- Automatically indexed
    emp_name VARCHAR(50)
);

Example (Explicit Index)

CREATE INDEX idx_emp_name ON employees(emp_name);

 

Speeds up lookups by reducing the number of rows scanned.

Example: Searching for an  PAN number in an income tax database:

SELECT * FROM tax_payers WHERE pan_number = 'ABCDE1234F';

With an index on pan_number, lookup is instant, avoiding a full table scan.

  • On small tables (as full table scans may be faster).
  • On highly updated columns (as updates slow down).
  • On columns with many duplicate values (like gender or status).

Indexing computed values to optimize function-based queries.

Example: Indexing on UPPERCASE email in an  banking system

SELECT * FROM customers WHERE UPPER(email) = 'RAHUL@GMAIL.COM';

This optimizes case-insensitive searches

SELECT * FROM customers WHERE UPPER(email) = 'RAHUL@GMAIL.COM';

 

Example: Dropping an index on pan_number:

DROP INDEX idx_pan ON tax_payers;

 

MySQL

SHOW INDEX FROM bank_accounts;

PostgreSQL

SELECT indexname FROM pg_indexes WHERE tablename = 'bank_accounts';

 

Used for fast searching in text-heavy columns.

Example: Searching in a large database of Indian legal case records

CREATE FULLTEXT INDEX idx_case_desc ON court_cases(case_description);

Query optimized for full-text search

SELECT * FROM court_cases WHERE MATCH(case_description) AGAINST ('property dispute');

 

Indexing only a subset of rows for performance optimization.

Example: Indexing only active users in an  e-commerce system:

CREATE INDEX idx_active_users ON users(email) WHERE status = 'active';

This makes searching faster for active users while ignoring inactive ones.

MySQL

SHOW INDEX FROM bank_accounts;

PostgreSQL

SELECT * FROM pg_stat_user_indexes WHERE relname = 'transactions';

 

Index fragmentation happens when indexes become inefficient due to frequent updates/deletes.

Solution:

Rebuild the index in SQL Server

ALTER INDEX idx_acc_number ON bank_accounts REBUILD;

Optimize in MySQL

OPTIMIZE TABLE bank_accounts;

 

Reverses the stored value to improve lookup speeds for sequential values (e.g., Aadhaar numbers).

Example:

CREATE INDEX idx_reverse_phone ON users(REVERSE(phone_number));

 

Yes, but it depends on the database system.

PostgreSQL example (Indexing a JSON field in an Indian Aadhaar verification database):

CREATE INDEX idx_json ON citizens USING gin(aadhaar_details);

This speeds up queries on JSON fields.

It shows how a query is executed, helping identify bottlenecks.

Example:

EXPLAIN SELECT * FROM bank_transactions WHERE transaction_id = 'TXN12345';

 

  • Look for Full Table Scans - Use indexes.
  • Identify high-cost joins - Optimize with proper keys.

WHERE filters before aggregation, while HAVING filters after aggregation.

Example:

SELECT state, COUNT(*) FROM citizens GROUP BY state HAVING COUNT(*) > 100000;

 

Index the column used in ORDER BY.

Example:

CREATE INDEX idx_salary ON employees(salary);
SELECT * FROM employees ORDER BY salary DESC;

 

Use indexes on JOIN keys.

Example (E-commerce database):

CREATE INDEX idx_customer ON orders(customer_id);
SELECT o.order_id, c.customer_name 
FROM orders o 
JOIN customers c ON o.customer_id = c.customer_id;

 

CREATE INDEX idx_upi ON transactions(upi_id, transaction_time);

 

CREATE INDEX idx_aadhaar ON citizens(aadhaar_number);

 

Partitioning divides large tables into smaller pieces for better performance.

Example: Partitioning UPI transactions by month:

CREATE TABLE upi_transactions (
    transaction_id INT,
    upi_id VARCHAR(20),
    transaction_date DATE
) PARTITION BY RANGE (YEAR(transaction_date));

 

Normalization reduces redundancy but can slow JOIN-heavy queries.

  • Stores query results to speed up repeated queries.
  • Example: Frequently used UPI report queries should be cached.

It fetches unnecessary columns, increasing execution time.

Indexed columns improve GROUP BY performance.

SELECT * FROM orders FORCE INDEX(idx_customer) WHERE customer_id = 101;

 

  • Partitioning divides a large table into smaller manageable parts.
  • Indexes apply to each partition separately, improving query performance.

Example: Partitioning a sales table by year in an  e-commerce database:

CREATE TABLE sales (
    sale_id INT,
    sale_date DATE,
    amount DECIMAL(10,2),
    region VARCHAR(50)
) PARTITION BY RANGE (YEAR(sale_date)) (
    PARTITION p2022 VALUES LESS THAN (2023),
    PARTITION p2023 VALUES LESS THAN (2024)
);

Effect: Queries for 2023 skip irrelevant partitions, improving speed.

  • Ordering matters in composite indexes.
  • The most frequently filtered column should come first.

Example:

CREATE INDEX idx_customer_region ON orders(customer_id, region);

If queries mostly filter by customer_id, the index is efficient

SELECT * FROM orders WHERE customer_id = 101;

But if filtering by region, the index may not be used efficiently.

Indexed Views store results physically, reducing computation on large datasets.

Example: Precomputed total sales per city in an Indian retail system

CREATE VIEW sales_summary AS
SELECT city, SUM(amount) AS total_sales
FROM sales GROUP BY city;
CREATE UNIQUE CLUSTERED INDEX idx_sales_summary ON sales_summary(city);

Benefit: Faster aggregated queries without recalculating each time.

  • Index SEEK: Uses an index to quickly find data.
  • Index SCAN: Reads the entire index/table.

Example:

SEEK (Efficient)

SELECT * FROM customers WHERE customer_id = 123;

SCAN (Inefficient on large tables)

SELECT * FROM customers WHERE LOWER(name) = 'rahul';

Solution: Create a function-based index

CREATE INDEX idx_lower_name ON customers(LOWER(name));

 

A covering index includes all columns needed for a query, avoiding extra lookups.

Example: Optimizing customer_orders query in an Indian UPI payments database.

CREATE INDEX idx_orders ON transactions (customer_id, order_date, amount);

Now this query is fully covered

SELECT order_date, amount FROM transactions WHERE customer_id = 567;

No need to access the table, only the index!

  • Probabilistic data structure used for fast lookups in large datasets.
  • Reduces disk IO by eliminating unnecessary searches.
  • Used in distributed SQL engines (e.g., Apache Spark, PostgreSQL).
  • Example: Fast Aadhaar number lookups in voter databases.

Indexing JSON fields optimizes searching inside JSON data

Example: Indexing customer preferences stored in JSON

CREATE INDEX idx_json ON users USING GIN (preferences);

Query optimized for a specific key

SELECT * FROM users WHERE preferences @> '{"language": "Hindi"}';

 

Use columnstore indexes for analytical workloads.

Example: Optimizing Indian stock market transactions:

CREATE COLUMNSTORE INDEX idx_trades ON stock_trades (trade_date, stock_id, volume);

Benefit: Compresses data, speeds up aggregation queries.

Use pg_stat_user_indexes to check index usage:

SELECT indexrelname, idx_scan, idx_tup_read, idx_tup_fetch 
FROM pg_stat_user_indexes WHERE relname = 'transactions';

High idx_scan count = Index is effective.

Horizontal Partitioning: Splitting rows (e.g., by state in India)

CREATE TABLE customers_delhi AS SELECT * FROM customers WHERE state = 'Delhi';

Vertical Partitioning: Splitting columns (e.g., separating frequently used columns)

CREATE TABLE customer_contact AS SELECT customer_id, phone, email FROM customers;

 

  • Solution 1: Add indexes for frequent filters.
  • Solution 2: Use LIMIT for large datasets.

Example: Query optimized for a large tax_payers table in India

SELECT * FROM tax_payers WHERE state = 'Maharashtra' LIMIT 100;

 

  • MySQL Query Cache stores the result of SELECT queries.
  • Redis can cache database results for repeated queries.

Example: Caching an Indian movie ticket booking report

SET SESSION query_cache_type = ON;
SELECT * FROM bookings WHERE city = 'Mumbai';

Next time, it retrieves data instantly from cache.

Sharding splits data across multiple databases for scalability.

Example: Indian railway booking system.

CREATE TABLE bookings_shard_1 PARTITION OF bookings FOR VALUES FROM ('1') TO ('500000');
CREATE TABLE bookings_shard_2 PARTITION OF bookings FOR VALUES FROM ('500001') TO ('1000000');

Indexing per shard speeds up queries on large distributed datasets.

Share   Share