SQL Interview Questions for 2–5 Years Experience - Interview Questions and Answers
Partition by date (e.g., monthly), cluster on timestamp, use a composite primary key (device_id, ts)
, and possibly use a time-series–optimized engine (e.g., InfluxDB or PostgreSQL’s timescaledb).
Surrogate keys (e.g., auto-increment IDs) decouple the schema from business data and simplify joins; natural keys enforce real-world uniqueness but can be composite and change over time.
Add a boolean is_deleted
flag and a deleted_at
timestamp, and filter out deleted rows in application queries or via a view.
Narrow tables (fewer columns) improve I/O and fitting rows into pages/blocks; wide tables (many seldom-used columns) may benefit from vertical partitioning or EAV models.
Star: fact table linked to denormalized dimension tables—simpler, faster for OLAP. Snowflake: dimensions normalized into hierarchies—saves space but adds join complexity.
Create a junction table with foreign keys to each parent table, composite PK on both FKs, and optional additional attributes.
Range partitioning (e.g., by date) improves pruneability; list/hash for categorical or uniform distribution; trade-off is maintenance overhead and complexity.
Use a migration tool (Flyway, Liquibase), write idempotent scripts, apply backward-compatible changes first (add columns), then cut over, then drop old objects.
Adjacency list (parent_id), materialized path, nested sets, or using a closure table—choose based on query patterns and update frequency.
For semi-structured data with varying attributes per row; store as JSONB
in Postgres or similar, index JSON paths for frequent queries.
Columnstore (e.g., SQL Server’s clustered columnstore) excels at analytics and scans on specific columns; rowstore is better for OLTP, point reads, and updates.
Use temporal tables (system-versioned), triggers to copy old rows to history tables, or event sourcing patterns.
Choose a shard key (e.g., customer_id mod N), route queries based on hash/range, maintain a lookup service, ensure cross-shard transactions are minimized.
Use the smallest appropriate size, prefer fixed over variable where possible, avoid unnecessary precision (e.g., DECIMAL(10,2)
only when needed).
They support adding nullable columns; for more complex changes, create new tables and do backfill with ETL pipelines, then swap views.
Use deferred constraints, triggers, or scheduled batch jobs that validate and alert on violations.
Use a tenant_id column on all tables with appropriate row-level security (RLS) or schemas per tenant.
Precompute expensive joins/aggregates and store results; refresh periodically or on demand, balancing staleness vs. performance.
Clustering keys impact physical sort order—choose for frequent range scans; secondary indexes for selective lookups on other columns.
Centralizes table/column definitions, lineage, and business glossaries. Tools like Apache Atlas or custom tables help governance and discovery.
Analyze the execution plan for table scans, check DMVs (sys.dm_db_missing_index_*
in SQL Server), and create targeted non-clustered indexes.
An index covers a query when it includes all columns needed by the SELECT—validate by checking that the optimizer uses only the index and no lookups.
Indexes with a WHERE clause to include only rows you query often (e.g., WHERE is_active = 1
), reducing index size and improving seek performance.
Check fragmentation via DMV estimates (sys.dm_db_index_physical_stats
), rebuild or reorganize indexes based on fragmentation thresholds (e.g., >30%).
Lower fill factor leaves free space on pages for inserts, reducing page splits at the cost of larger index size; choose based on insert patterns.
Statistics describe data distribution; stale stats lead the optimizer to choose poor plans—update them manually or set AUTO_UPDATE_STATISTICS.
Use query hints (OPTION (RECOMPILE)
), local variables, or optimize for unknown to prevent the first parameter’s plan from being reused inappropriately.
Discrepancies indicate cardinality estimation issues—address by updating stats, using histogram buckets, or query rewrites.
Hash for large unsorted sets, merge for pre-sorted data with range predicates, nested loops for small outer and indexed inner.
Use row-versioning isolation (e.g., SNAPSHOT), keep transactions short, avoid rangescans without indexes.
Snapshot isolation uses row-versioning, eliminating readers blocking writers; read-committed uses locks, allowing phantoms.
Gather its execution plan, check wait stats, measure I/O and CPU, identify missing indexes or expensive operators.
Predicates that wrap columns in functions (e.g., WHERE UPPER(name) = 'JOHN'
) prevent index seeks—rewrite to be sargable.
Break into chunks, use minimal logging (bulk-logged recovery), disable/enable indexes, and wrap in manageable transactions.
When predicates on partition keys allow the optimizer to skip entire partitions—improves scan performance.
Spread tempdb across multiple files, monitor contention on allocation bitmaps, and size appropriately.
Clears data and plan caches to simulate cold runs and verify performance impact of caching.
Tools to cap CPU, memory, or I/O per workload group (SQL Server Resource Governor, PostgreSQL’s workload queues).
Use tools like sysbench
, BenchmarkSQL
, or native profilers; capture metrics—TPS, latency, CPU, I/O.
As a last resort to guide the optimizer when statistics are misleading; use sparingly and document rationale.
ROWS
counts physical rows; RANGE
groups by value—RANGE can include ties beyond a fixed number of rows.
Use LAG()
to detect gaps in timestamps greater than a threshold and SUM()
over a CASE to assign session IDs.
Use window functions for moving averages, gaps-and-islands for runs, and LEAD
/LAG
for deltas or event detection.
Use change data capture or change tracking to pull only new/changed rows and apply MERGE operations on the target.
Use keyset (seek) pagination with WHERE id > last_id ORDER BY id LIMIT n
to avoid large offsets.
Allows invoking a subquery per row of the outer table—useful for top-N per group or dynamic computations.
Use a correlated subquery with WHERE Salary >= ALL (SELECT TOP n Salary FROM … WHERE Dept=…)
.
Combines INSERT/UPDATE/DELETE in one operation based on a join—use for upsert patterns.
Use sketch algorithms for cardinality or membership checks at scale—implement via UDFs or built-in features.
Maintain history with effective and expiry dates, surrogate keys, and a current flag—update by closing old rows and inserting new.
When fact data arrives before its dimension, insert a degenerate or default row in dim table, then update when real data arrives.
Check partition sizes and task durations; if one partition processes much more data, repartition or adjust distribution keys.
Scalar UDFs can force row-by-row execution—inline or rewrite as parameterized views or table-valued functions.
Build the pivot SQL string in a variable by querying distinct values, then execute dynamic SQL.
Compress repeated values via codebooks, reducing I/O and enabling in-memory analytics.
Use functions like MASKED WITH (FUNCTION = 'partial()')
in SQL Server or apply hashing/encryption UDFs on sensitive columns.
Use SQL for relational consistency; complement with NoSQL/document stores for unstructured or flexible data—integrate via pipelines.
Use keyset pagination with rememberable cursors or precomputed page tokens stored in cache.
Use system-versioned temporal tables or FOR SYSTEM_TIME AS OF
to query historical state.
Track changes for fast incremental refresh of materialized views.
- Log Shipping: Manual failover, latency based on backup frequency.
- Mirroring: Near real-time synchronous/asynchronous, automatic failover in high safety.
- AGs: Multiple secondaries, read-scale, flexible failover, multi-database.
Configure streaming replication, promote replicas for reporting, monitor replication lag.
Synchronous waits for replicas to commit before returning; asynchronous returns immediately—trade-off between safety and latency.
Ensure writes acknowledged by a majority of nodes to prevent split-brain; similar quorum concept in SQL clusters.
Use planned manual failover on secondary, verify clients reconnect, then fail back.
Restore full backup and apply transaction logs up to a specified timestamp, located via log sequence numbers.
Define how long to keep full/differential/log backups for compliance and recovery window.
Restore to a staging instance, run consistency checks (DBCC CHECKDB, pg_restore --list).
- Differential: Changes since last full backup.
- Incremental: Changes since last incremental—requires chain of backups.
For large databases where file-system or storage array snapshots provide fast, consistent backups with minimal impact.
Track replication lag, error logs, apply rate vs. write rate, alert on thresholds.
Multi-master allows writes on multiple nodes (risk of conflicts), single-master centralizes writes for simplicity.
More frequent checkpoints reduce recovery time at the cost of increased I/O during normal operation.
Certain DDLs require manual intervention or database in single-user mode; test scripts beforehand.
Blue-green deployments, feature toggles, online schema migration tools (gh-ost, pt-online-schema-change).
Use multi-region replicas with asynchronous replication, handle conflicts or use conflict-free replication models.
Occurs when nodes lose quorum and operate independently—prevent with majority quorum, fencing mechanisms.
Provide tie-breaking votes in clustering/quorum to avoid split-brain.
Use incremental backups, data archiving, cloud snapshots, and parallel backup tools to reduce window.
Step-by-step guide: RTO/RPO targets, recovery steps, contact lists, failover procedures, validation tests.
- PostgreSQL:
CREATE POLICY
withFOR SELECT USING (predicate)
. - SQL Server: Security policies with predicates or predicates UDFs.
- TDE: Encrypts data at rest on disk.
- Always Encrypted: Client-side encryption; data stored encrypted in transit and at rest.
Masks sensitive columns at query time based on user roles, without altering stored data.
Implement data minimization, encryption, masking, audit logging, and support right-to-be-forgotten via soft delete or pseudonymization.
Use least privilege, Just-In-Time (JIT) elevation, multi-factor authentication, and audit all elevated sessions.
Use built-in change data capture/track features (CDC in SQL Server, logical decoding in PostgreSQL).
Store keys in HSM or Azure Key Vault/AWS KMS, rotate regularly, use infrastructure-as-code for provisioning.
Split roles so no single person has both development and production access, use role-based access control (RBAC).
Use audit logs to identify usage, revoke at schema or object level, test impact in staging before production.
Monitor error logs for suspicious patterns, use WAFs, enable query auditing, block queries containing dangerous characters.
Integrate with identity providers (Azure AD, LDAP), use MFA for RDP/SSH and certificate-based authentication.
Enable TLS/SSL encryption for client-server connections, use certificates from trusted CAs.
Grant only necessary permissions; regularly review roles and permissions using DMVs or catalog views, automate with scripts.
Hardware or software-based secure enclaves to process sensitive data in-memory without exposing it to OS or DBAs.
Use metadata repositories, comments, automated lineage tools (Apache Atlas, Collibra), and maintain documentation.
Policies use native DB audit frameworks for consistent logging; triggers are table-specific and can miss cross-DB events.
Use backup partitioning by date, encrypt backups, document retention policies, and restore+delete+re-backup affected sets.
Encrypt backup files, store in access-controlled repositories (S3 with KMS), and monitor integrity.
Real-time tracking of SQL statements, access patterns, anomaly detection—tools like IBM Guardium or open source equivalents.
Conduct regular DR drills, security penetration tests, compliance audits (SOC 2, ISO 27001), and document findings.
Tutorials
Random Blogs
- What Is SEO and Why Is It Important?
- The Ultimate Guide to Data Science: Everything You Need to Know
- AI & Space Exploration – AI’s Role in Deep Space Missions and Planetary Research
- Understanding Data Lake, Data Warehouse, Data Mart, and Data Lakehouse – And Why We Need Them
- SQL Joins Explained: A Complete Guide with Examples
- Datasets for Natural Language Processing
- Data Analytics: The Power of Data-Driven Decision Making
- Important Mistakes to Avoid While Advertising on Facebook
- String Operations in Python
- Convert RBG Image to Gray Scale Image Using CV2