SQL Interview Questions for 2–5 Years Experience - Interview Questions and Answers

Partition by date (e.g., monthly), cluster on timestamp, use a composite primary key (device_id, ts), and possibly use a time-series–optimized engine (e.g., InfluxDB or PostgreSQL’s timescaledb).

Surrogate keys (e.g., auto-increment IDs) decouple the schema from business data and simplify joins; natural keys enforce real-world uniqueness but can be composite and change over time.

Add a boolean is_deleted flag and a deleted_at timestamp, and filter out deleted rows in application queries or via a view.

Narrow tables (fewer columns) improve I/O and fitting rows into pages/blocks; wide tables (many seldom-used columns) may benefit from vertical partitioning or EAV models.

Star: fact table linked to denormalized dimension tables—simpler, faster for OLAP. Snowflake: dimensions normalized into hierarchies—saves space but adds join complexity.

Create a junction table with foreign keys to each parent table, composite PK on both FKs, and optional additional attributes.

Range partitioning (e.g., by date) improves pruneability; list/hash for categorical or uniform distribution; trade-off is maintenance overhead and complexity.

Use a migration tool (Flyway, Liquibase), write idempotent scripts, apply backward-compatible changes first (add columns), then cut over, then drop old objects.

Adjacency list (parent_id), materialized path, nested sets, or using a closure table—choose based on query patterns and update frequency.

For semi-structured data with varying attributes per row; store as JSONB in Postgres or similar, index JSON paths for frequent queries.

Columnstore (e.g., SQL Server’s clustered columnstore) excels at analytics and scans on specific columns; rowstore is better for OLTP, point reads, and updates.

Use temporal tables (system-versioned), triggers to copy old rows to history tables, or event sourcing patterns.

Choose a shard key (e.g., customer_id mod N), route queries based on hash/range, maintain a lookup service, ensure cross-shard transactions are minimized.

Use the smallest appropriate size, prefer fixed over variable where possible, avoid unnecessary precision (e.g., DECIMAL(10,2) only when needed).

They support adding nullable columns; for more complex changes, create new tables and do backfill with ETL pipelines, then swap views.

Use deferred constraints, triggers, or scheduled batch jobs that validate and alert on violations.

Use a tenant_id column on all tables with appropriate row-level security (RLS) or schemas per tenant.

Precompute expensive joins/aggregates and store results; refresh periodically or on demand, balancing staleness vs. performance.

Clustering keys impact physical sort order—choose for frequent range scans; secondary indexes for selective lookups on other columns.

Centralizes table/column definitions, lineage, and business glossaries. Tools like Apache Atlas or custom tables help governance and discovery.

Analyze the execution plan for table scans, check DMVs (sys.dm_db_missing_index_* in SQL Server), and create targeted non-clustered indexes.

An index covers a query when it includes all columns needed by the SELECT—validate by checking that the optimizer uses only the index and no lookups.

Indexes with a WHERE clause to include only rows you query often (e.g., WHERE is_active = 1), reducing index size and improving seek performance.

Check fragmentation via DMV estimates (sys.dm_db_index_physical_stats), rebuild or reorganize indexes based on fragmentation thresholds (e.g., >30%).

Lower fill factor leaves free space on pages for inserts, reducing page splits at the cost of larger index size; choose based on insert patterns.

Statistics describe data distribution; stale stats lead the optimizer to choose poor plans—update them manually or set AUTO_UPDATE_STATISTICS.

Use query hints (OPTION (RECOMPILE)), local variables, or optimize for unknown to prevent the first parameter’s plan from being reused inappropriately.

Discrepancies indicate cardinality estimation issues—address by updating stats, using histogram buckets, or query rewrites.

Hash for large unsorted sets, merge for pre-sorted data with range predicates, nested loops for small outer and indexed inner.

Use row-versioning isolation (e.g., SNAPSHOT), keep transactions short, avoid rangescans without indexes.

Snapshot isolation uses row-versioning, eliminating readers blocking writers; read-committed uses locks, allowing phantoms.

Gather its execution plan, check wait stats, measure I/O and CPU, identify missing indexes or expensive operators.

Predicates that wrap columns in functions (e.g., WHERE UPPER(name) = 'JOHN') prevent index seeks—rewrite to be sargable.

Break into chunks, use minimal logging (bulk-logged recovery), disable/enable indexes, and wrap in manageable transactions.

When predicates on partition keys allow the optimizer to skip entire partitions—improves scan performance.

Spread tempdb across multiple files, monitor contention on allocation bitmaps, and size appropriately.

Clears data and plan caches to simulate cold runs and verify performance impact of caching.

Tools to cap CPU, memory, or I/O per workload group (SQL Server Resource Governor, PostgreSQL’s workload queues).

Use tools like sysbench, BenchmarkSQL, or native profilers; capture metrics—TPS, latency, CPU, I/O.

As a last resort to guide the optimizer when statistics are misleading; use sparingly and document rationale.

ROWS counts physical rows; RANGE groups by value—RANGE can include ties beyond a fixed number of rows.

Use LAG() to detect gaps in timestamps greater than a threshold and SUM() over a CASE to assign session IDs.

Use window functions for moving averages, gaps-and-islands for runs, and LEAD/LAG for deltas or event detection.

Use change data capture or change tracking to pull only new/changed rows and apply MERGE operations on the target.

Use keyset (seek) pagination with WHERE id > last_id ORDER BY id LIMIT n to avoid large offsets.

Allows invoking a subquery per row of the outer table—useful for top-N per group or dynamic computations.

Use a correlated subquery with WHERE Salary >= ALL (SELECT TOP n Salary FROM … WHERE Dept=…).

Combines INSERT/UPDATE/DELETE in one operation based on a join—use for upsert patterns.

Use sketch algorithms for cardinality or membership checks at scale—implement via UDFs or built-in features.

Maintain history with effective and expiry dates, surrogate keys, and a current flag—update by closing old rows and inserting new.

When fact data arrives before its dimension, insert a degenerate or default row in dim table, then update when real data arrives.

Check partition sizes and task durations; if one partition processes much more data, repartition or adjust distribution keys.

Scalar UDFs can force row-by-row execution—inline or rewrite as parameterized views or table-valued functions.

Build the pivot SQL string in a variable by querying distinct values, then execute dynamic SQL.

Compress repeated values via codebooks, reducing I/O and enabling in-memory analytics.

Use functions like MASKED WITH (FUNCTION = 'partial()') in SQL Server or apply hashing/encryption UDFs on sensitive columns.

Use SQL for relational consistency; complement with NoSQL/document stores for unstructured or flexible data—integrate via pipelines.

Use keyset pagination with rememberable cursors or precomputed page tokens stored in cache.

Use system-versioned temporal tables or FOR SYSTEM_TIME AS OF to query historical state.

Track changes for fast incremental refresh of materialized views.

Log Shipping: Manual failover, latency based on backup frequency.
Mirroring: Near real-time synchronous/asynchronous, automatic failover in high safety.
AGs: Multiple secondaries, read-scale, flexible failover, multi-database.

Configure streaming replication, promote replicas for reporting, monitor replication lag.

Synchronous waits for replicas to commit before returning; asynchronous returns immediately—trade-off between safety and latency.

Ensure writes acknowledged by a majority of nodes to prevent split-brain; similar quorum concept in SQL clusters.

Use planned manual failover on secondary, verify clients reconnect, then fail back.

Restore full backup and apply transaction logs up to a specified timestamp, located via log sequence numbers.

Define how long to keep full/differential/log backups for compliance and recovery window.

Restore to a staging instance, run consistency checks (DBCC CHECKDB, pg_restore --list).

Differential: Changes since last full backup.
Incremental: Changes since last incremental—requires chain of backups.

For large databases where file-system or storage array snapshots provide fast, consistent backups with minimal impact.

Track replication lag, error logs, apply rate vs. write rate, alert on thresholds.

Multi-master allows writes on multiple nodes (risk of conflicts), single-master centralizes writes for simplicity.

More frequent checkpoints reduce recovery time at the cost of increased I/O during normal operation.

Certain DDLs require manual intervention or database in single-user mode; test scripts beforehand.

Blue-green deployments, feature toggles, online schema migration tools (gh-ost, pt-online-schema-change).

Use multi-region replicas with asynchronous replication, handle conflicts or use conflict-free replication models.

Occurs when nodes lose quorum and operate independently—prevent with majority quorum, fencing mechanisms.

Provide tie-breaking votes in clustering/quorum to avoid split-brain.

Use incremental backups, data archiving, cloud snapshots, and parallel backup tools to reduce window.

Step-by-step guide: RTO/RPO targets, recovery steps, contact lists, failover procedures, validation tests.

PostgreSQL: CREATE POLICY with FOR SELECT USING (predicate).
SQL Server: Security policies with predicates or predicates UDFs.

TDE: Encrypts data at rest on disk.
Always Encrypted: Client-side encryption; data stored encrypted in transit and at rest.

Masks sensitive columns at query time based on user roles, without altering stored data.

Implement data minimization, encryption, masking, audit logging, and support right-to-be-forgotten via soft delete or pseudonymization.

Use least privilege, Just-In-Time (JIT) elevation, multi-factor authentication, and audit all elevated sessions.

Use built-in change data capture/track features (CDC in SQL Server, logical decoding in PostgreSQL).

Store keys in HSM or Azure Key Vault/AWS KMS, rotate regularly, use infrastructure-as-code for provisioning.

Split roles so no single person has both development and production access, use role-based access control (RBAC).

Use audit logs to identify usage, revoke at schema or object level, test impact in staging before production.

Monitor error logs for suspicious patterns, use WAFs, enable query auditing, block queries containing dangerous characters.

Integrate with identity providers (Azure AD, LDAP), use MFA for RDP/SSH and certificate-based authentication.

Enable TLS/SSL encryption for client-server connections, use certificates from trusted CAs.

Grant only necessary permissions; regularly review roles and permissions using DMVs or catalog views, automate with scripts.

Hardware or software-based secure enclaves to process sensitive data in-memory without exposing it to OS or DBAs.

Use metadata repositories, comments, automated lineage tools (Apache Atlas, Collibra), and maintain documentation.

Policies use native DB audit frameworks for consistent logging; triggers are table-specific and can miss cross-DB events.

Use backup partitioning by date, encrypt backups, document retention policies, and restore+delete+re-backup affected sets.

Encrypt backup files, store in access-controlled repositories (S3 with KMS), and monitor integrity.

Real-time tracking of SQL statements, access patterns, anomaly detection—tools like IBM Guardium or open source equivalents.

Conduct regular DR drills, security penetration tests, compliance audits (SOC 2, ISO 27001), and document findings.

Share Share Add to Bookmark

Input and Output in Python

30 Questions

Generators in Python

51 Questions

Pandas

70 Questions

Attributes & Methods in Python

50 Questions

Control Flow in Python

50 Questions

Multithreading and Multiprocessing in Python

75 Questions

Fundamental SQL Concepts

50 Questions

Error Handling and Exceptions in Python

50 Questions

NumPy

70 Questions

PostgreSQL

21 Questions

Modules and Packages in Python

50 Questions

Lists in Python

50 Questions

Yii Framework

23 Questions

Probability and Statistics

10 Questions

Requests in Python

50 Questions

File Handling in Python

55 Questions

← Previous Interview

SQL Interview Questions for 1–2 Years Experience

Next Interview →

SQL Interview Questions for 2–5 Years Experience - Interview Questions and Answers

Input and Output in Python

Generators in Python

Pandas

Attributes & Methods in Python

Control Flow in Python

Multithreading and Multiprocessing in Python

Fundamental SQL Concepts

Error Handling and Exceptions in Python

NumPy

PostgreSQL

Modules and Packages in Python

Lists in Python

Yii Framework

Probability and Statistics

Requests in Python

File Handling in Python

SQL Interview Questions for 1–2 Years Experience

SQL Interview Questions for 5+ Years Experience

Tutorials

Random Blogs

Categories

SQL Interview Questions for 2–5 Years Experience - Interview Questions and Answers

1. How would you design a table to efficiently store and query time-series data?

2. Explain when and how to use surrogate keys vs. natural keys.

3. How do you implement soft deletes in your schema?

4. When would you choose a narrow table over a wide table design?

5. Describe a star vs. snowflake schema in data warehousing.

6. How do you enforce a many-to-many relationship in SQL?

7. Explain table partitioning strategies and trade-offs.

8. What’s the approach to versioning schema changes in production?

9. How do you model hierarchical data (e.g., org chart) in a relational table?

10. When and how would you use JSON or XML columns instead of normalized tables?

11. Describe columnstore vs. rowstore and use cases for each.

12. How do you enforce data versioning/audit trails at the database level?

13. Explain how you’d shard a large table across multiple servers.

14. What considerations go into choosing data types for performance?

15. How to handle schema evolution for BigQuery or Redshift?

16. Describe strategies to enforce cross-row or cross-table business rules.

17. How do you design for multi-tenant data isolation in a single database?

18. Explain the use of materialized views and when to refresh them.

19. How do you decide between clustering keys vs. secondary indexes?

20. What’s the role of a data dictionary or metadata repository?

21. How do you identify and fix a missing index?

22. Explain index covering and how to validate it.

23. What are filtered indexes and when to use them?

24. How do you detect and resolve index fragmentation?

25. Explain the impact of fill factor on B-tree performance.

26. What’s the role of statistics, and how do stale stats affect performance?

27. How to troubleshoot parameter sniffing issues?

28. Describe the differences between an execution plan’s estimated vs. actual rows.

29. When would you choose a hash vs. merge vs. nested loops join?

30. How do you minimize locking and blocking in high-throughput OLTP?

31. Explain read-committed snapshot isolation vs. read-committed.

32. How to analyze a slow-running query?

33. What’s the impact of non-sargable predicates?

34. How do you optimize batch deletes/inserts?

35. Explain table partition elimination (pruning).

36. How do you manage tempdb or equivalent scratch space?

37. Describe how to use “DBCC DROPCLEANBUFFERS; DBCC FREEPROCCACHE;” for testing.

38. What are resource governor/workload management features?

39. How do you benchmark SQL performance?

40. When to use hint hints like OPTIMIZE FOR or FORCE ORDER?

41. Explain advanced window frames (e.g., RANGE vs. ROWS).

42. How do you implement sessionization (e.g., user sessions) with SQL?

43. Describe time-series analytics patterns in SQL.

44. What’s the approach to incremental ETL with SQL?

45. How to implement pagination efficiently in large result sets?

46. Explain the use of lateral joins (CROSS APPLY/ LATERAL).

47. How to write a high-performance “Top N per group” query without window functions?

48. What is the SQL MERGE statement, and when would you use it?

49. Describe approximate query processing (e.g., HyperLogLog, Bloom filters).

50. How do you handle slowly changing dimensions (SCD Type 2) in SQL?

51. Explain the concept of “late arriving dimensions” in data warehousing.

52. How to detect data skew in a distributed SQL engine?

53. What are UDF performance pitfalls, and how to avoid them?

54. How to implement dynamic pivoting when column list isn’t known at compile time?

55. What’s dictionary encoding and column compression in columnstores?

56. How do you perform data anonymization or masking in SQL?

57. Explain polyglot persistence and how SQL fits in.

58. How to handle deep pagination in an interface without performance degradation?

59. Describe handling temporal queries (periods, AS OF).

60. What’s the role of materialized view logs in Oracle?

61. Compare log shipping vs. database mirroring vs. AlwaysOn Availability Groups.

62. How do you implement read replicas in MySQL/PostgreSQL?

63. What’s the difference between synchronous and asynchronous replication?

64. Explain majority commit in MongoDB vs. SQL cluster quorum.

65. How to orchestrate failover testing without downtime?

66. Describe point-in-time recovery (PITR).

67. What’s the role of backup retention policies?

68. How do you validate a backup’s integrity?

69. Explain incremental vs. differential backups.

70. When would you use snapshot backups (storage-level)?

71. How do you monitor replication health?

72. Explain cluster multi-master vs. single-master replication.

73. What’s the impact of checkpoint frequency on recovery time?

74. How to handle DDL operations in AlwaysOn Availability Groups?

75. Describe solution for zero-downtime schema changes

76. How do you implement geo-distributed SQL clusters?

77. Explain the concept of split-brain and how to prevent it.

78. What’s the role of witness or arbiter nodes?

79. How to design a backup strategy for petabyte-scale data?