SQLNoSQLACIDBASEDatabase SelectionConsistencySystem Design

Choosing a Database

There is no 'best' database — only the right one for your use case. Learn SQL vs NoSQL trade-offs, ACID vs BASE consistency models, and a decision framework for real-world systems.

25 min read9 sections
01

The Big Picture — Why Database Choice Matters

Choosing a database is one of the most consequential decisions in system design. It affects your data model, query patterns, scaling strategy, consistency guarantees, and operational complexity for years. Changing databases later is like swapping the engine of a car while driving — possible, but painful and expensive.

🚗

The Vehicle Analogy

There's no 'best vehicle' — it depends on the terrain. A sedan (SQL) is reliable, structured, and predictable. It follows well-paved roads (schemas, joins, transactions) and gets you there safely. A pickup truck (document DB) handles rough terrain — flexible cargo, no fixed structure, great for varied loads. A motorcycle (key-value store) is blazing fast for simple trips — one key, one value, no overhead. A bus (wide-column DB) carries massive loads across long distances — built for scale, not for agility. You wouldn't take a motorcycle to move furniture, and you wouldn't take a bus to the corner store. The same logic applies to databases.

The biggest mistake engineers make is choosing a database based on hype or familiarity instead of requirements. "We use MongoDB because it's popular" is not a design decision — it's a default. Real design means analyzing your data patterns, consistency needs, and scale requirements, then picking the tool that fits.

🔥 Key Insight

There is no "best" database. PostgreSQL, MongoDB, Redis, Cassandra, DynamoDB — each excels at specific workloads and struggles at others. The skill is matching the database to the problem, not the other way around.

02

Factors That Influence the Choice

Before comparing SQL vs NoSQL, understand the dimensions that matter. Every database decision is a trade-off across these factors.

🗂️

Data Structure

Is your data highly structured with clear relationships (users → orders → items)? Or is it semi-structured, nested, or schema-less (user profiles with varying fields, event logs)?

📈

Scale Requirements

How much data? How many reads/writes per second? Do you need to scale vertically (bigger machine) or horizontally (more machines)? SQL scales vertically well; NoSQL is built for horizontal scale.

🔒

Consistency Needs

Does every read need to see the latest write (strong consistency)? Or is it OK if data takes a few seconds to propagate (eventual consistency)? Banking needs strong. Social media likes can be eventual.

🔍

Query Patterns

Do you need complex joins and aggregations? Or simple key-based lookups? SQL excels at complex queries. NoSQL excels at simple, high-throughput access patterns.

FactorFavors SQLFavors NoSQL
Data structureStructured, relational, stable schemaSemi-structured, nested, evolving schema
RelationshipsComplex relationships, many JOINsFew relationships, denormalized data
ConsistencyStrong consistency required (ACID)Eventual consistency acceptable (BASE)
ScaleModerate scale, vertical scaling OKMassive scale, horizontal scaling needed
Query complexityComplex queries, aggregations, reportsSimple lookups by key or partition
Schema stabilitySchema is well-defined and stableSchema changes frequently or varies per record

💡 The Real Question

Don't ask "SQL or NoSQL?" Ask: "What are my data access patterns, consistency requirements, and scale needs?" The answer to those questions tells you which database to use.

03

SQL vs NoSQL — Deep Dive

SQL — Relational Databases

SQL databases store data in tables with fixed schemas, enforce relationships through foreign keys, and guarantee consistency through ACID transactions. They've been the backbone of production systems for 40+ years.

✅ How SQL Works

  • Fixed schema — columns defined upfront (name VARCHAR, price DECIMAL)
  • Tables linked by foreign keys (orders.user_id → users.id)
  • Powerful query language — JOINs, aggregations, subqueries
  • ACID transactions — all-or-nothing operations
  • Examples: PostgreSQL, MySQL, SQL Server, Oracle

📊 SQL Strengths & Weaknesses

  • ✅ Strong consistency (ACID)
  • ✅ Complex queries (JOINs, GROUP BY, window functions)
  • ✅ Data integrity (constraints, foreign keys)
  • ✅ Mature ecosystem (40+ years of tooling)
  • ❌ Harder to scale horizontally (sharding is complex)
  • ❌ Rigid schema (migrations needed for changes)
  • ❌ Not ideal for unstructured or highly nested data

NoSQL — Non-Relational Databases

NoSQL is not one thing — it's a family of databases designed for specific access patterns. They trade the generality of SQL for performance and scalability in their niche.

🔑

Key-Value Stores

Simplest model: one key, one value. Blazing fast for lookups. No queries beyond 'get by key'. Use case: caching, sessions, feature flags. Examples: Redis, Memcached, DynamoDB.

📄

Document Databases

Store JSON-like documents with flexible schemas. Each document can have different fields. Good for content management, user profiles, catalogs. Examples: MongoDB, CouchDB, Firestore.

📊

Wide-Column Stores

Tables with rows and dynamic columns. Optimized for massive write throughput and time-series data. Use case: analytics, IoT, event logging. Examples: Cassandra, HBase, ScyllaDB.

🕸️

Graph Databases

Store nodes and edges (relationships). Optimized for traversing connections. Use case: social networks, recommendation engines, fraud detection. Examples: Neo4j, Amazon Neptune.

NoSQL Strengths

  • Horizontal scaling — add more machines, not bigger ones
  • Flexible schema — no migrations for new fields
  • High throughput for specific access patterns
  • Built for distributed systems (replication, partitioning)
  • Low latency for simple lookups (key-value, document by ID)

NoSQL Weaknesses

  • No JOINs (or very limited) — denormalization required
  • Eventual consistency by default (not always acceptable)
  • Less mature tooling for complex queries and reporting
  • Data duplication — same data stored in multiple places
  • Harder to enforce data integrity (no foreign keys)

Head-to-Head Comparison

DimensionSQLNoSQL
Data modelTables with fixed schemaDocuments, key-value, columns, graphs
SchemaRigid (defined upfront)Flexible (schema-less or schema-on-read)
RelationshipsFirst-class (foreign keys, JOINs)Application-level (denormalized, embedded)
ConsistencyStrong (ACID)Eventual (BASE) — configurable
ScalingVertical (bigger machine)Horizontal (more machines)
Query languageSQL (standardized, powerful)Varies per database (no standard)
TransactionsMulti-row, multi-table ACIDLimited (single-document or partition)
Best forComplex relationships, reports, integrityHigh scale, simple access, flexible data

🎯 Interview Insight — When to Choose Which

SQL when: you need transactions (banking, e-commerce orders), complex queries (reporting, analytics), or strong data integrity (healthcare, finance). NoSQL when: you need massive horizontal scale (social media feeds), flexible schemas (user-generated content), or ultra-low latency lookups (caching, sessions).

04

ACID vs BASE

ACID and BASE are two consistency models that represent opposite ends of a spectrum. ACID prioritizes correctness. BASE prioritizes availability and performance. Understanding this trade-off is fundamental to database selection.

ACID — The Relational Mindset

🏦

The Bank Vault

ACID is like a bank vault. Every transaction is carefully controlled: the door opens, you deposit money, the ledger is updated, the door locks. If anything goes wrong mid-transaction — power outage, system crash — the vault rolls back to its previous state. Nothing is half-done. The money is never lost, never duplicated, never in an inconsistent state. Safe, reliable, predictable.

⚛️

Atomicity

All or nothing. Transfer $500: deduct from A AND add to B, or do neither. No partial state where money disappears.

Consistency

Every transaction moves the database from one valid state to another. Constraints are always satisfied. Balance can't go negative if there's a CHECK constraint.

🔒

Isolation

Concurrent transactions don't see each other's uncommitted changes. Two people transferring money simultaneously won't corrupt the balance.

💾

Durability

Once committed, data survives crashes. Written to disk (WAL) before the commit is acknowledged. Power goes out? Data is still there.

BASE — The Distributed Mindset

📱

The Social Media Feed

BASE is like a social media feed. When you post a photo, your friend in Tokyo might not see it for 2 seconds. That's fine — nobody notices or cares. The system prioritizes being available (you can always post, always scroll) over being perfectly consistent (everyone sees the exact same feed at the exact same millisecond). Eventually, everyone sees the photo. The system is 'basically available' and 'eventually consistent'.

🟢

Basically Available

The system is always responsive, even during failures. Some nodes might be down, but the system as a whole keeps serving requests. Availability over perfection.

🔄

Soft State

The system's state can change over time even without new input — because data is still propagating between replicas. The state is 'soft', not fixed.

Eventual Consistency

If no new updates are made, all replicas will eventually converge to the same state. 'Eventually' might be milliseconds or seconds — but not instant.

ACID vs BASE — Side by Side

DimensionACIDBASE
PhilosophyCorrectness firstAvailability first
ConsistencyStrong — every read sees the latest writeEventual — reads may be slightly stale
AvailabilityMay sacrifice availability for consistencyAlways available, even during partitions
TransactionsMulti-row, multi-table atomic operationsSingle-partition or no transactions
ScaleHarder to distribute (coordination overhead)Built for distributed systems
LatencyHigher (coordination between nodes)Lower (no coordination needed)
Use casesBanking, e-commerce orders, healthcareSocial feeds, caching, analytics, IoT
ExamplesPostgreSQL, MySQL, SQL ServerCassandra, DynamoDB, MongoDB, Redis

Strong vs Eventual Consistency — Visualized

Consistency Models — What Happens After a Writetext
Strong Consistency (ACID):
  Client writes: "balance = $500"
  ┌─────────┐     ┌─────────┐     ┌─────────┐
Node A  │ ──→ │ Node B  │ ──→ │ Node C
$500 ✅ │     │ $500 ✅ │     │ $500 ✅ │
  └─────────┘     └─────────┘     └─────────┘
  All nodes updated BEFORE write is acknowledged.
  Any read from any node returns $500 immediately.

Eventual Consistency (BASE):
  Client writes: "likes = 1001"
  ┌─────────┐     ┌─────────┐     ┌─────────┐
Node A  │     │ Node B  │     │ Node C
1001 ✅ │     │ 1000 ⏳ │     │ 1000 ⏳ │
  └─────────┘     └─────────┘     └─────────┘
  Write acknowledged after Node A. B and C update async.
  Read from B might return 1000 for a few milliseconds.
  Eventually (ms to seconds), all nodes show 1001.

🎯 Interview Insight — Why Distributed Systems Lean BASE

In a distributed system, network partitions are inevitable (the CAP theorem). During a partition, you must choose: block all writes until nodes reconnect (consistency → ACID), or keep accepting writes and reconcile later (availability → BASE). Most large-scale systems choose availability because downtime costs more than brief inconsistency. That's why DynamoDB, Cassandra, and most NoSQL databases default to eventual consistency.

05

Decision Framework

Use this framework in interviews and real design decisions. Ask these questions in order — each answer narrows the choice.

1

Do I need strong consistency (ACID transactions)?

If yes → SQL (PostgreSQL, MySQL). Financial transactions, inventory management, booking systems — anywhere 'double-spending' or 'double-booking' is catastrophic. If eventual consistency is acceptable → NoSQL is an option.

2

Do I need complex queries (JOINs, aggregations, reports)?

If yes → SQL. Relational databases are unmatched for complex queries across multiple tables. If your access pattern is simple (get by ID, get by partition key) → NoSQL can be faster and simpler.

3

What is my scale?

Moderate scale (millions of rows, thousands of QPS) → SQL handles this fine with proper indexing and read replicas. Massive scale (billions of rows, hundreds of thousands of QPS) → NoSQL databases like Cassandra or DynamoDB are designed for this.

4

Is my schema stable or evolving?

Stable, well-defined schema → SQL (schema enforces correctness). Rapidly evolving, varies per record, or deeply nested → Document DB (MongoDB, Firestore). Schema-less data (logs, events) → Wide-column (Cassandra) or key-value (Redis).

5

What is my primary access pattern?

Key-based lookups → Key-value store (Redis, DynamoDB). Document retrieval → Document DB (MongoDB). Relationship traversal → Graph DB (Neo4j). Time-series / append-heavy → Wide-column (Cassandra). Everything else → SQL.

Decision Flowcharttext
Need ACID transactions?
  ├── YESSQL (PostgreSQL, MySQL)
  └── NO

Need complex JOINs / aggregations?
  ├── YESSQL (PostgreSQL, MySQL)
  └── NO

Primary access pattern?
  ├── Key-value lookupsRedis, DynamoDB
  ├── Document by IDMongoDB, Firestore
  ├── Relationship traversalNeo4j, Neptune
  ├── Time-series / logsCassandra, TimescaleDB
  └── High-throughput writesCassandra, ScyllaDB

Scale requirement?
  ├── Moderate (< 100K QPS) → SQL with read replicas
  └── Massive (> 100K QPS)  → NoSQL (DynamoDB, Cassandra)

Still unsure? → Start with PostgreSQL.
It handles 90% of use cases well. Optimize later.

🔥 The Default Choice

If you're unsure, start with PostgreSQL. It's the most versatile database — it handles relational data, JSON documents, full-text search, and even time-series with extensions. Move to a specialized database only when PostgreSQL's limitations become a measured bottleneck.

06

Real-World Scenarios

Let's apply the framework to real systems. For each scenario, the reasoning matters more than the answer.

🏦 Banking System

Choose: SQL (PostgreSQL / Oracle)

Why: ACID is non-negotiable. A bank transfer must be atomic — deduct from A and add to B, or do neither. Strong consistency is required — you can't show a stale balance. Complex queries are needed for reporting, auditing, and fraud detection. The schema is well-defined and stable (accounts, transactions, users). Scale is moderate — even large banks don't need millions of writes per second.

Why not NoSQL: Eventual consistency is unacceptable. If a user sees $1000 but the real balance is $500, they could overdraw. No transactions across multiple records means you can't atomically transfer money.

🛒 E-Commerce Platform

Choose: SQL for orders/payments + NoSQL for catalog/sessions

Why SQL for orders: Order placement needs ACID — reduce stock, create order, charge payment atomically. Inventory must be strongly consistent to prevent overselling.

Why NoSQL for catalog: Product catalog has varied schemas (electronics have specs, clothing has sizes). Read-heavy, rarely updated. A document DB (MongoDB) or search engine (Elasticsearch) handles this well. Sessions and cart data → Redis (key-value, fast, ephemeral).

Architecture: PostgreSQL for transactional data + MongoDB/Elasticsearch for catalog + Redis for sessions/caching.

💬 Chat Application

Choose: NoSQL (Cassandra or DynamoDB) for messages + SQL for user accounts

Why NoSQL for messages: Messages are append-heavy (write-heavy workload). Access pattern is simple: get messages by conversation_id, sorted by timestamp. No complex JOINs needed. Scale is massive — billions of messages. Eventual consistency is fine — if a message appears 100ms late, nobody notices.

Why SQL for accounts: User registration, authentication, and profile management need ACID. Schema is stable. Scale is moderate.

Architecture: PostgreSQL for users/auth + Cassandra for messages + Redis for online presence/typing indicators.

📊 Analytics / Logging System

Choose: Wide-column store (Cassandra) or time-series DB (TimescaleDB)

Why: Logs and events are append-only (never updated). Write throughput is massive — millions of events per second. Access pattern: query by time range and service name. No JOINs, no transactions needed. Eventual consistency is fine — a log entry appearing 1 second late doesn't matter.

Why not SQL: A single PostgreSQL instance can't handle millions of writes per second. Sharding SQL for append-only data is over-engineering when Cassandra does it natively.

Architecture: Cassandra for raw event storage + Elasticsearch for search/filtering + ClickHouse or BigQuery for analytical queries.

07

Trade-offs & Decision Making

Consistency vs Availability

DimensionStrong ConsistencyEventual Consistency
GuaranteeEvery read returns the latest writeReads may return stale data briefly
LatencyHigher (must coordinate across nodes)Lower (read from nearest replica)
AvailabilityLower (blocks during partitions)Higher (always serves requests)
ComplexitySimpler application logicApplication must handle stale reads
Use whenMoney, inventory, bookingsLikes, feeds, analytics, caching

Flexibility vs Structure

DimensionStructured (SQL)Flexible (NoSQL)
Schema changesRequires migrations (ALTER TABLE)Add fields anytime (schema-less)
Data validationDatabase enforces constraintsApplication must validate
QueryingPowerful (SQL, JOINs, aggregations)Limited (varies by database)
Data integrityHigh (foreign keys, constraints)Low (no referential integrity)
Development speedSlower initially (schema design)Faster initially (just store JSON)
Long-term maintenanceEasier (schema is documentation)Harder (schema drift, inconsistency)

Performance vs Correctness

DimensionCorrectness First (ACID)Performance First (BASE)
Write speedSlower (coordination, locking)Faster (no coordination)
Read speedConsistent but may wait for locksFast (read from any replica)
ThroughputLower (serialization overhead)Higher (parallel, distributed)
Data guaranteesAlways correctEventually correct
Failure handlingRollback on failureReconcile on recovery

🎯 The Polyglot Persistence Pattern

Most production systems use multiple databases — each for what it does best. PostgreSQL for transactions, Redis for caching, Elasticsearch for search, Cassandra for logs. This is called polyglot persistence. In interviews, showing you can pick the right database for each part of the system is more impressive than picking one database for everything.

08

Interview Questions

Conceptual, scenario-based, and comparison questions you're likely to encounter.

Q:Why would you choose NoSQL over SQL?

A: When the access pattern is simple (key-based lookups, not complex JOINs), the schema is flexible or varies per record, horizontal scaling is needed (millions of writes/sec), and eventual consistency is acceptable. Examples: caching (Redis), user-generated content with varying fields (MongoDB), high-throughput event logging (Cassandra). The key insight: NoSQL isn't 'better' — it's optimized for specific patterns that SQL handles poorly at scale.

Q:What is eventual consistency?

A: After a write, not all replicas are updated immediately. For a brief period (milliseconds to seconds), different replicas may return different values. Eventually, all replicas converge to the same state. Example: you post a photo on social media. Your friend in another country might not see it for 2 seconds because their read hits a replica that hasn't received the update yet. This is acceptable for social media but catastrophic for banking.

Q:When is ACID mandatory?

A: Whenever incorrect data causes real harm: financial transactions (money can't disappear), inventory management (can't sell more than you have), booking systems (can't double-book a seat), healthcare records (wrong data = wrong treatment). The common thread: the cost of inconsistency is higher than the cost of slower performance. If a brief inconsistency is harmless (social media likes, view counts, analytics), ACID is overkill.

1

You're designing a ride-sharing app like Uber

Which databases would you use and why?

Answer: PostgreSQL for user accounts, payment records, and trip history (ACID for payments, complex queries for reporting). Redis for real-time driver locations (key-value: driver_id → {lat, lng}, ultra-low latency, ephemeral data). Cassandra or DynamoDB for trip events and logs (high write throughput, time-series access pattern). Elasticsearch for location-based search ('find drivers near me'). This is polyglot persistence — each database handles what it's best at.

2

A startup asks: 'Should we use MongoDB or PostgreSQL?'

How would you advise them?

Answer: Start with PostgreSQL unless you have a specific reason not to. It handles relational data, JSON documents (JSONB), full-text search, and even geospatial queries. It's ACID-compliant, has a massive ecosystem, and scales to millions of rows easily. Choose MongoDB only if: your data is genuinely document-oriented (nested, varying schemas), you don't need JOINs or transactions across documents, and you need the schema flexibility from day one. Most startups that chose MongoDB early end up needing relational features later.

3

Your system needs to handle 500K writes per second

Can SQL handle this? What would you recommend?

Answer: A single SQL instance cannot handle 500K writes/sec. Options: (1) If the data is append-only (logs, events) → Cassandra or ScyllaDB, designed for this exact workload. (2) If you need ACID for some writes → use SQL for transactional data (orders, payments) and NoSQL for high-throughput data (events, analytics). (3) If all writes need ACID → shard PostgreSQL with Citus, or use CockroachDB (distributed SQL). The answer depends on whether all 500K writes need strong consistency or just a subset.

09

Common Mistakes

These mistakes lead to painful migrations, data loss, and architectural dead ends.

🎯

Choosing NoSQL without understanding the trade-offs

'MongoDB is web-scale' is not a design decision. Teams pick NoSQL for the flexible schema, then discover they need JOINs, transactions, and data integrity — all things SQL gives you for free. Six months later, they're reimplementing half of PostgreSQL in application code.

Start with SQL (PostgreSQL) unless you have a specific, measured reason to use NoSQL. Flexible schema sounds great until you have 50 microservices writing different shapes to the same collection with no validation.

📊

Overusing SQL for high-scale write workloads

Trying to push 100K writes/sec into a single PostgreSQL instance for event logging or analytics. SQL databases are optimized for transactional workloads with moderate write throughput, not for firehose-level append-only data.

Use the right tool: Cassandra, Kafka, or ClickHouse for high-throughput writes. Keep SQL for transactional data where ACID matters. Don't force a screwdriver to be a hammer.

🔒

Ignoring consistency requirements

Using an eventually consistent database for data that requires strong consistency. Example: using DynamoDB (eventual consistency by default) for inventory management, then wondering why items are oversold during flash sales.

Map each data type to its consistency requirement. Orders, payments, inventory → strong consistency (SQL). Likes, views, recommendations → eventual consistency is fine (NoSQL). Don't apply one consistency model to all data.

🤔

Misunderstanding BASE as 'no consistency'

BASE doesn't mean 'data is wrong.' It means 'data is temporarily stale but will converge.' Eventual consistency still guarantees that all replicas will eventually agree. It's not chaos — it's a deliberate trade-off for availability and performance.

Understand the consistency spectrum: strong → bounded staleness → session → eventual. Most NoSQL databases let you configure the consistency level per query. DynamoDB has 'strongly consistent reads' as an option. Cassandra has tunable consistency (ONE, QUORUM, ALL).

🔧

Using one database for everything

Forcing PostgreSQL to be your cache, search engine, message queue, and analytics store. Or forcing MongoDB to handle transactions, reporting, and real-time analytics. Every database has a sweet spot — pushing it beyond that creates performance problems and operational nightmares.

Embrace polyglot persistence. Use each database for what it does best. PostgreSQL for transactions, Redis for caching, Elasticsearch for search, Cassandra for logs. The complexity of managing multiple databases is lower than the complexity of forcing one database to do everything poorly.