SQL Interview Questions for 5+ Years Experience - Interview Questions and Answers
Use a distributed SQL database (e.g., Google Spanner or Azure Cosmos DB with SQL API) with synchronous replication and Paxos/Raft consensus to guarantee consistency across regions.
Apply backward-compatible changes first (add nullable columns, create new tables), deploy code to use new fields, backfill data asynchronously, then remove old fields once fully migrated.
Normalize for data integrity and write performance; denormalize (materialized views, aggregated tables) for read-heavy workloads and reporting, balancing storage and ETL costs.
Use a tenant_id
column everywhere plus Row-Level Security (RLS) policies to enforce isolation, combined with separate connection pools per tenant for throttling and monitoring.
- Type 1: Overwrite old values (no history).
- Type 2: Insert new row with versioning columns (
effective_date
,expiry_date
). - Type 3: Add new columns for “previous” values.
Choose based on audit requirements, storage constraints, and query complexity.
- Adjacency list: Simple, efficient for inserts/updates, requires recursive queries.
- Nested sets: Fast ancestor/descendant queries, expensive for updates.
- Closure table: Best balance for both, stores all paths, requires more storage.
Use a normalized OLTP database for transactional workloads, ETL data into a columnstore OLAP store (e.g., Synapse, Redshift), and use change data capture to keep it near-real-time.
Tag PII columns, build anonymization routines (replace with tokens), and maintain deletion cascades via foreign keys or application logic, plus scrub backups as needed.
UUIDs avoid coordination and allow offline ID generation, but are larger and unordered; sequences give compact, ordered keys but need central coordination or hi/lo algorithms.
Use system-versioned temporal tables (SQL Server/PostgreSQL) to automatically track history, or add valid_from
/valid_to
columns and enforce with triggers or application logic.
Avoid cross-shard FKs—denormalize or use application-level joins; or implement distributed transactions and two-phase commit, accepting the performance trade-off.
Columnar: best for analytics, compression, bulk scans. Row: best for OLTP, point lookups, low-latency transactions.
Use spatial types (GEOGRAPHY
, GEOMETRY
), create spatial indexes, and leverage built-in functions (ST_Distance, ST_Within) for efficient queries.
Partition old data by time, move partitions to cheaper storage (archive tables or external systems), and provide views/unions for transparent access.
Implement backpressure via flow-control, increase acknowledgment timeouts, or switch to asynchronous replication with conflict resolution for non-critical data.
Use graph stores (Neo4j, Cosmos Gremlin) for deep traversals and pattern matching; use SQL with recursive CTEs for shallow to moderate hierarchies.
Use partitioning by time, minimal indexes to speed writes, and clustered columnstore index for analytics on historical data.
Project growth based on historical trends, monitor key metrics (storage, IOPS, CPU), simulate load tests, and pre-provision extra capacity or scale-out clusters.
Store flags and targeting rules in tables, use UDFs to evaluate user/context, cache results in application, and refresh on changes.
Reverse-engineer current usage, plan incremental migrations (add new table(s)), sync data via triggers/ETL, update code, and deprecate old tables.
Monitor I/O/CPU per object via DMVs (SQL Server) or pg_stat_user_tables
, look for high latch waits, and analyze index usage patterns.
Used OPTION (RECOMPILE)
for that query, introduced OPTIMIZE FOR UNKNOWN
, or local variables to avoid reusing a bad plan.
Increased number of tempdb files to match CPU cores, monitored PAGELATCH waits, and optimized code to minimize temporary object creation.
Check wait stats: SOS_SCHEDULER_YIELD
for CPU, PAGEIOLATCH_*
for I/O; correlate with perf counters and OS metrics.
Rewrite predicates to be sargable (move functions to constants), create computed columns and index them, or use persisted computed columns.
Identified SELECT columns (including WHERE and ORDER BY), created a non-clustered index on key columns and included other needed columns to eliminate lookups.
Migrated hot tables/procedures to memory-optimized tables and natively compiled procedures, reducing latencies for high-throughput workloads.
Large row-level locks escalate to page or table locks—use ALTER TABLE … SET (LOCK_ESCALATION = AUTO|DISABLE|TABLE)
or batch operations to prevent escalation.
Use Resource Governor/workload management to cap CPU/memory, implement query timeouts or cancellation policies.
Use plan guides, optimize parameterization, precompile commonly executed statements, and maintain a plan cache with appropriate size.
Trace flags (e.g., 4269 for MIN_GRANT in SQL Server) or session SET options can change cardinality estimation, join elimination, or locking behavior—use carefully.
Use benchmarks like TPC-C or HammerDB, simulate concurrent users, measure transactions/sec, latency, and resource utilization.
Use GIN or GiST indexes on JSONB in Postgres, create expression indexes on specific JSON paths, and avoid full document scans.
Use parallel bulk loads, minimal logging, partition switching, pre-staged data in external tables, and staged transforms in staging schema.
Use optimistic concurrency, row-versioning isolation levels, minimal locking, appropriate indexing, and scale-out via sharding or read replicas.
Synchronous ensures zero data loss but adds commit latency; asynchronous reduces latency but risks data loss on primary failure.
Use OPTION (OPTIMIZE FOR (@param = value))
hints, or add WITH RECOMPILE
only for specific error-prone branches.
Created statistics only on active subsets (e.g., WHERE is_active = 1
), improving cardinality estimates for those queries without global stat overhead.
Manually sample data, update statistics with fullscan or filter, and use multi-column stats or extended statistics for correlated columns.
Setting short lock timeouts (SET LOCK_TIMEOUT
) allows queries to fail quickly instead of blocking indefinitely; use where appropriate.
Used them for ad-hoc analytics on data lakes, paying only per-query, and combining with materialized views or BI tools for frequent queries.
Inventory features, use Data Migration Assistant to assess compatibility, choose VM vs. managed DB, perform schema/data migration, validate performance.
Configure geo-replication, set up auto-failover groups (Azure) or global clusters (Aurora), test failover drills, and monitor RPO/RTO.
Use partition elimination via proper predicate on partition keys, align partitions with underlying storage units, and co-locate hot partitions.
Enable TDE or storage encryption on managed service, enforce TLS connections, manage keys in cloud KMS, and audit certificate usage.
Right-size compute, use reserved instances or savings plans, shut down dev/test outside business hours, archive cold data to cheaper storage.
Use built-in monitoring (Azure Metrics, CloudWatch), set alerts on key metrics (DTU/CPU, I/O latency, replication lag), and integrate with PagerDuty.
Use transactional replication or data sync for on-prem to Azure SQL, implement VPN/ExpressRoute, and ensure network security (NSGs, firewalls).
Deploy rolling migrations via feature flags, version the API, schedule small changes, and use techniques like expand-contract-remove.
Use Redis or Memcached for hot query results, implement CDN for static data, and apply cache-aside patterns in application.
Use parameterized calls from functions, manage connection pooling (e.g., RDS Proxy), and secure credentials via secrets manager.
Services like Aurora Global Database offer multi-master—handle conflict resolution, consistent latency monitoring, and application-level routing.
Ensure data is stored only in approved regions, use regional endpoints, and implement row-level tagging in multi-region setups.
Store credentials in AWS Secrets Manager / Azure Key Vault, rotate regularly, and grant least privilege to applications.
Can impact OLTP latency—offload to replicas or use read scale-out; consider separate analytic pools or data warehouse.
Rely on automated backups, define retention policies, test point-in-time restores, and handle cross-region restores for DR.
Use Azure CLI / AWS CLI in scripts or Terraform, automate failover testing, scale operations, health checks via ARM/CloudFormation.
Use batching, write sharding, provisioned IOPS, and optimize schema/indexes for bulk loads.
Managed offloads patching/backup/scaling; self-managed gives full control but requires maintenance overhead.
Created asynchronous replicas, directed read-only traffic via connection strings or proxies, and monitored replication lag to avoid stale reads.
Use tools like Flyway or Liquibase as part of build, run migrations in test stage, promote scripts through environments, and include rollback scripts.
Compare live schema vs. expected (source control) via tools (Schema Compare, Redgate), alert on discrepancies, and optionally auto-sync.
Use disposable environments (containers, ephemeral dev DBs), run migration scripts, execute integration tests against real-like data.
Inject secrets at runtime via environment variables or secret managers; avoid hard-coded credentials in Git; use templates with placeholders.
Keep code in source control, tag releases, script out object definitions, and include change history in commit messages.
Use idempotent down-scripts, wrap in transaction (where supported), or restore from backup/snapshot taken before migration.
Regularly run schema comparison jobs, enforce policies in CI/CD, block deployments if drift detected.
Schedule jobs to capture execution stats, wait stats, and plan cache metrics; store in history tables and visualize trends.
Use PR-based reviews for migration scripts, automated testing, and gated deployments requiring manual sign-off for prod.
Use anonymized subsets of prod data, synthetic data generation, or data virtualization to ensure realistic testing without PII exposure.
Deploy new version behind feature flag, route a small percentage of traffic for canary testing, monitor errors/performance before full rollout.
Use quick snapshots or clones (Azure SQL clone DB, AWS snapshots), teardown and recreate per pipeline run to ensure consistency.
Log duration, success/failure, schema changes count, rollback rates; integrate with dashboards (Grafana, CloudWatch).
Define DB servers, firewalls, backups, and scaling policies in ARM templates, Terraform, or CloudFormation for reproducible environments.
Periodic audits using policy as code (Azure Policy, AWS Config), auto-remediation scripts, and alerts on non-compliance.
Use seed scripts for schema + reference data, containerized DB images, and migration tools to bring to latest version.
Run representative queries against a scaled-down test DB, measure latency, and fail build if thresholds are breached.
Inject failures (lock contention, network partition, node crashes) in non-prod to validate resilience, use tools like Chaos Toolkit.
Use metrics-based autoscaling rules (e.g., CPU > 70%), scripts or managed autoscale policies for compute and storage.
Identify unused tables/procs via access stats, announce deprecation, set read-only views, drop after a grace period.
Use Dynamic Data Masking or column-level encryption with key management; combine with RLS to restrict row access.
Integrate vulnerability scanning (SQL injection patterns, misconfigurations) in CI/CD, static code analysis for SQL scripts, and secure defaults.
Leverage envelope encryption (master key + data keys), rotate data keys regularly, and use key versioning to decrypt old data until re-encryption.
Enable audit logs for DDL, security changes, and SELECT on high-sensitivity tables; ship logs to SIEM; alert on anomalous patterns.
Maintain a compliance matrix, tag data by sensitivity, apply region-based data storage, implement required controls per standard, and schedule audits.
Encrypt backups with service-side or client-side encryption, use secure transfer (TLS/SFTP), and store in IAM-protected buckets.
Run automated scans (e.g., CIS benchmarks, AWS Inspector), remediate findings (unpatched versions, weak passwords), and track in issue tracker.
Maintain an “emergency admin” role audited separately, require multi-party approval and post-incident review for any use.
- Encryption: Reversible with keys; secures entire field.
- Tokenization: Replaces value with token, real data stored in vault; irreversible without lookup, ideal for PCI.
Use automated scanners (e.g., Microsoft Purview), classify columns by sensitivity tags, and enforce policies based on classification.
Follow ROLE-based access, audit usage patterns, rotate credentials, and use managed identities where possible.
Use storage WORM (Write Once Read Many) or S3 Object Lock, and configure backups to be tamper-proof.
Enforce parameterization, use API gateways or WAFs, sanitize inputs, and scan code for concatenation patterns.
Capture queries in audit logs, centralize logs, and tag queries by origin; enforce policies via proxies or gateways.
Provide a hardware or software boundary where sensitive operations occur, isolating keys and plaintext from the OS and DBAs.
Keep audit trails immutable, but separate PII from audit logs or pseudonymize it to satisfy both requirements.
(Your personal experience: detection, containment, root-cause analysis, patching, and communication steps.)
Use automated discovery tools, store definitions in a metadata repository, and surface in data catalog for data stewards.
Implement scheduled jobs or TTL settings on tables, track retention metadata per table, and alert on overdue data.
Use policy-as-code tools (e.g., CIS, Prisma Cloud) to scan infra and DB configs before deployment, block violations, and report on dashboards.
Integration of ML models in the database engine (e.g., predictive analytics via MADlib, SQL Server ML Services) and auto-tuning via AI-driven advisors.
Treat schema + migrations as code in Git repos, trigger pipelines on PR merges, use Terraform/ARM for infrastructure, drift detection automated.
Benchmark critical workloads, assess feature gaps (transactions, indexing, SQL dialect), check community/support, and run proof-of-concept.
Cloud platform expertise, data engineering (ETL/ELT), distributed systems fundamentals, infrastructure as code, data governance, and ML/data science basics.
Conduct brown-bag sessions, code reviews of SQL, document standards, pair-program on migrations/performance tuning, and build a knowledge base.
Tutorials
Random Blogs
- Understanding AI, ML, Data Science, and More: A Beginner's Guide to Choosing Your Career Path
- What Is SEO and Why Is It Important?
- Career Guide: Natural Language Processing (NLP)
- Datasets for analyze in Tableau
- Role of Digital Marketing Services to Uplift Online business of Company and Beat Its Competitors
- AI Agents & Autonomous Systems – The Future of Self-Driven Intelligence
- Variable Assignment in Python
- Top 10 Blogs of Digital Marketing you Must Follow
- The Ultimate Guide to Artificial Intelligence (AI) for Beginners
- Extract RGB Color From a Image Using CV2