SQLNoSQLACIDBASEDatabase SelectionConsistencySystem Design

Choosing a Database

There is no 'best' database — only the right one for your use case. Learn SQL vs NoSQL trade-offs, ACID vs BASE consistency models, and a decision framework for real-world systems.

25 min read9 sections

The Big Picture — Why Database Choice Matters

Choosing a database is one of the most consequential decisions in system design. It affects your data model, query patterns, scaling strategy, , and operational complexity for years. Changing databases later is like swapping the engine of a car while driving — possible, but painful and expensive.

🚗

The Vehicle Analogy

There's no 'best vehicle' — it depends on the terrain. A sedan (SQL) is reliable, structured, and predictable. It follows well-paved roads (schemas, joins, transactions) and gets you there safely. A pickup truck (document DB) handles rough terrain — flexible cargo, no fixed structure, great for varied loads. A motorcycle (key-value store) is blazing fast for simple trips — one key, one value, no overhead. A bus (wide-column DB) carries massive loads across long distances — built for scale, not for agility. You wouldn't take a motorcycle to move furniture, and you wouldn't take a bus to the corner store. The same logic applies to databases.

The biggest mistake engineers make is choosing a database based on hype or familiarity instead of requirements. "We use MongoDB because it's popular" is not a design decision — it's a default. Real design means analyzing your data patterns, consistency needs, and scale requirements, then picking the tool that fits.

🔥 Key Insight

There is no "best" database. PostgreSQL, MongoDB, Redis, Cassandra, DynamoDB — each excels at specific workloads and struggles at others. The skill is matching the database to the problem, not the other way around.

Factors That Influence the Choice

Before comparing SQL vs NoSQL, understand the dimensions that matter. Every database decision is a trade-off across these factors.

🗂️

Data Structure

Is your data highly structured with clear relationships (users → orders → items)? Or is it semi-structured, nested, or schema-less (user profiles with varying fields, event logs)?

📈

Scale Requirements

How much data? How many reads/writes per second? Do you need to scale vertically (bigger machine) or horizontally (more machines)? SQL scales vertically well; NoSQL is built for horizontal scale.

🔒

Consistency Needs

Does every read need to see the latest write (strong consistency)? Or is it OK if data takes a few seconds to propagate (eventual consistency)? Banking needs strong. Social media likes can be eventual.

🔍

Query Patterns

Do you need complex joins and aggregations? Or simple key-based lookups? SQL excels at complex queries. NoSQL excels at simple, high-throughput access patterns.

Factor	Favors SQL	Favors NoSQL
Data structure	Structured, relational, stable schema	Semi-structured, nested, evolving schema
Relationships	Complex relationships, many JOINs	Few relationships, denormalized data
Consistency	Strong consistency required (ACID)	Eventual consistency acceptable (BASE)
Scale	Moderate scale, vertical scaling OK	Massive scale, horizontal scaling needed
Query complexity	Complex queries, aggregations, reports	Simple lookups by key or partition
Schema stability	Schema is well-defined and stable	Schema changes frequently or varies per record

💡 The Real Question

Don't ask "SQL or NoSQL?" Ask: "What are my data access patterns, consistency requirements, and scale needs?" The answer to those questions tells you which database to use.

SQL vs NoSQL — Deep Dive

SQL — Relational Databases

SQL databases store data in tables with fixed schemas, enforce relationships through foreign keys, and guarantee consistency through . They've been the backbone of production systems for 40+ years.

✅ How SQL Works

Fixed schema — columns defined upfront (name VARCHAR, price DECIMAL)
Tables linked by foreign keys (orders.user_id → users.id)
Powerful query language — JOINs, aggregations, subqueries
ACID transactions — all-or-nothing operations
Examples: PostgreSQL, MySQL, SQL Server, Oracle

📊 SQL Strengths & Weaknesses

✅ Strong consistency (ACID)
✅ Complex queries (JOINs, GROUP BY, window functions)
✅ Data integrity (constraints, foreign keys)
✅ Mature ecosystem (40+ years of tooling)
❌ Harder to scale horizontally (sharding is complex)
❌ Rigid schema (migrations needed for changes)
❌ Not ideal for unstructured or highly nested data

NoSQL — Non-Relational Databases

NoSQL is not one thing — it's a family of databases designed for specific access patterns. They trade the generality of SQL for performance and scalability in their niche.

🔑

Key-Value Stores

Simplest model: one key, one value. Blazing fast for lookups. No queries beyond 'get by key'. Use case: caching, sessions, feature flags. Examples: Redis, Memcached, DynamoDB.

📄

Document Databases

Store JSON-like documents with flexible schemas. Each document can have different fields. Good for content management, user profiles, catalogs. Examples: MongoDB, CouchDB, Firestore.

📊

Wide-Column Stores

Tables with rows and dynamic columns. Optimized for massive write throughput and time-series data. Use case: analytics, IoT, event logging. Examples: Cassandra, HBase, ScyllaDB.

🕸️

Graph Databases

Store nodes and edges (relationships). Optimized for traversing connections. Use case: social networks, recommendation engines, fraud detection. Examples: Neo4j, Amazon Neptune.

NoSQL database types — key-value, document, wide-column, and graph with examples and primary use cases for each

NoSQL Strengths

✅Horizontal scaling — add more machines, not bigger ones
✅Flexible schema — no migrations for new fields
✅High throughput for specific access patterns
✅Built for distributed systems (replication, partitioning)
✅Low latency for simple lookups (key-value, document by ID)

NoSQL Weaknesses

❌No JOINs (or very limited) — denormalization required
❌Eventual consistency by default (not always acceptable)
❌Less mature tooling for complex queries and reporting
❌Data duplication — same data stored in multiple places
❌Harder to enforce data integrity (no foreign keys)

Head-to-Head Comparison

Dimension	SQL	NoSQL
Data model	Tables with fixed schema	Documents, key-value, columns, graphs
Schema	Rigid (defined upfront)	Flexible (schema-less or schema-on-read)
Relationships	First-class (foreign keys, JOINs)	Application-level (denormalized, embedded)
Consistency	Strong (ACID)	Eventual (BASE) — configurable
Scaling	Vertical (bigger machine)	Horizontal (more machines)
Query language	SQL (standardized, powerful)	Varies per database (no standard)
Transactions	Multi-row, multi-table ACID	Limited (single-document or partition)
Best for	Complex relationships, reports, integrity	High scale, simple access, flexible data

SQL vs NoSQL comparison — relational tables with fixed schema and JOINs vs flexible document, key-value, and graph models

🎯 Interview Insight — When to Choose Which

SQL when: you need transactions (banking, e-commerce orders), complex queries (reporting, analytics), or strong data integrity (healthcare, finance). NoSQL when: you need massive horizontal scale (social media feeds), flexible schemas (user-generated content), or ultra-low latency lookups (caching, sessions).

ACID vs BASE

ACID and BASE are two consistency models that represent opposite ends of a spectrum. ACID prioritizes correctness. BASE prioritizes availability and performance. Understanding this trade-off is fundamental to database selection.

ACID — The Relational Mindset

🏦

The Bank Vault

ACID is like a bank vault. Every transaction is carefully controlled: the door opens, you deposit money, the ledger is updated, the door locks. If anything goes wrong mid-transaction — power outage, system crash — the vault rolls back to its previous state. Nothing is half-done. The money is never lost, never duplicated, never in an inconsistent state. Safe, reliable, predictable.

⚛️

Atomicity

All or nothing. Transfer $500: deduct from A AND add to B, or do neither. No partial state where money disappears.

✅

Consistency

Every transaction moves the database from one valid state to another. Constraints are always satisfied. Balance can't go negative if there's a CHECK constraint.

🔒

Isolation

Concurrent transactions don't see each other's uncommitted changes. Two people transferring money simultaneously won't corrupt the balance.

💾

Durability

Once committed, data survives crashes. Written to disk (WAL) before the commit is acknowledged. Power goes out? Data is still there.

BASE — The Distributed Mindset

📱

The Social Media Feed

BASE is like a social media feed. When you post a photo, your friend in Tokyo might not see it for 2 seconds. That's fine — nobody notices or cares. The system prioritizes being available (you can always post, always scroll) over being perfectly consistent (everyone sees the exact same feed at the exact same millisecond). Eventually, everyone sees the photo. The system is 'basically available' and 'eventually consistent'.

🟢

Basically Available

The system is always responsive, even during failures. Some nodes might be down, but the system as a whole keeps serving requests. Availability over perfection.

🔄

Soft State

The system's state can change over time even without new input — because data is still propagating between replicas. The state is 'soft', not fixed.

⏳

Eventual Consistency

If no new updates are made, all replicas will eventually converge to the same state. 'Eventually' might be milliseconds or seconds — but not instant.

ACID vs BASE — Side by Side

Dimension	ACID	BASE
Philosophy	Correctness first	Availability first
Consistency	Strong — every read sees the latest write	Eventual — reads may be slightly stale
Availability	May sacrifice availability for consistency	Always available, even during partitions
Transactions	Multi-row, multi-table atomic operations	Single-partition or no transactions
Scale	Harder to distribute (coordination overhead)	Built for distributed systems
Latency	Higher (coordination between nodes)	Lower (no coordination needed)
Use cases	Banking, e-commerce orders, healthcare	Social feeds, caching, analytics, IoT
Examples	PostgreSQL, MySQL, SQL Server	Cassandra, DynamoDB, MongoDB, Redis

ACID vs BASE consistency models — ACID prioritizing correctness with strong consistency vs BASE prioritizing availability with eventual consistency

Strong vs Eventual Consistency — Visualized

Consistency Models — What Happens After a Writetext

Strong Consistency (ACID):
  Client writes: "balance = $500"
  ┌─────────┐     ┌─────────┐     ┌─────────┐
  │ Node A  │ ──→ │ Node B  │ ──→ │ Node C  │
  │ $500 ✅ │     │ $500 ✅ │     │ $500 ✅ │
  └─────────┘     └─────────┘     └─────────┘
  All nodes updated BEFORE write is acknowledged.
  Any read from any node returns $500 immediately.

Eventual Consistency (BASE):
  Client writes: "likes = 1001"
  ┌─────────┐     ┌─────────┐     ┌─────────┐
  │ Node A  │     │ Node B  │     │ Node C  │
  │ 1001 ✅ │     │ 1000 ⏳ │     │ 1000 ⏳ │
  └─────────┘     └─────────┘     └─────────┘
  Write acknowledged after Node A. B and C update async.
  Read from B might return 1000 for a few milliseconds.
  Eventually (ms to seconds), all nodes show 1001.

Strong vs eventual consistency — all nodes updated before acknowledgment vs write acknowledged after primary node with async replication

🎯 Interview Insight — Why Distributed Systems Lean BASE

In a distributed system, network partitions are inevitable (the ). During a partition, you must choose: block all writes until nodes reconnect (consistency → ACID), or keep accepting writes and reconcile later (availability → BASE). Most large-scale systems choose availability because downtime costs more than brief inconsistency. That's why DynamoDB, Cassandra, and most NoSQL databases default to eventual consistency.

Decision Framework

Use this framework in interviews and real design decisions. Ask these questions in order — each answer narrows the choice.

Do I need strong consistency (ACID transactions)?

If yes → SQL (PostgreSQL, MySQL). Financial transactions, inventory management, booking systems — anywhere 'double-spending' or 'double-booking' is catastrophic. If eventual consistency is acceptable → NoSQL is an option.

Do I need complex queries (JOINs, aggregations, reports)?

If yes → SQL. Relational databases are unmatched for complex queries across multiple tables. If your access pattern is simple (get by ID, get by partition key) → NoSQL can be faster and simpler.

What is my scale?

Moderate scale (millions of rows, thousands of QPS) → SQL handles this fine with proper indexing and . Massive scale (billions of rows, hundreds of thousands of QPS) → NoSQL databases like Cassandra or DynamoDB are designed for this.

Is my schema stable or evolving?

Stable, well-defined schema → SQL (schema enforces correctness). Rapidly evolving, varies per record, or deeply nested → Document DB (MongoDB, Firestore). Schema-less data (logs, events) → Wide-column (Cassandra) or key-value (Redis).

What is my primary access pattern?

Key-based lookups → Key-value store (Redis, DynamoDB). Document retrieval → Document DB (MongoDB). Relationship traversal → Graph DB (Neo4j). Time-series / append-heavy → Wide-column (Cassandra). Everything else → SQL.

Decision Flowcharttext

Need ACID transactions?
  ├── YES → SQL (PostgreSQL, MySQL)
  └── NO ↓

Need complex JOINs / aggregations?
  ├── YES → SQL (PostgreSQL, MySQL)
  └── NO ↓

Primary access pattern?
  ├── Key-value lookups     → Redis, DynamoDB
  ├── Document by ID        → MongoDB, Firestore
  ├── Relationship traversal → Neo4j, Neptune
  ├── Time-series / logs    → Cassandra, TimescaleDB
  └── High-throughput writes → Cassandra, ScyllaDB

Scale requirement?
  ├── Moderate (< 100K QPS) → SQL with read replicas
  └── Massive (> 100K QPS)  → NoSQL (DynamoDB, Cassandra)

Still unsure? → Start with PostgreSQL.
It handles 90% of use cases well. Optimize later.

Database decision flowchart — step-by-step guide from ACID needs through query complexity, access patterns, and scale to database selection

🔥 The Default Choice

If you're unsure, start with . It's the most versatile database — it handles relational data, JSON documents, full-text search, and even time-series with extensions. Move to a specialized database only when PostgreSQL's limitations become a measured bottleneck.

Real-World Scenarios

Let's apply the framework to real systems. For each scenario, the reasoning matters more than the answer.

🏦 Banking System

Choose: SQL (PostgreSQL / Oracle)

Why: ACID is non-negotiable. A bank transfer must be atomic — deduct from A and add to B, or do neither. Strong consistency is required — you can't show a stale balance. Complex queries are needed for reporting, auditing, and fraud detection. The schema is well-defined and stable (accounts, transactions, users). Scale is moderate — even large banks don't need millions of writes per second.

Why not NoSQL: Eventual consistency is unacceptable. If a user sees $1000 but the real balance is $500, they could overdraw. No transactions across multiple records means you can't atomically transfer money.

🛒 E-Commerce Platform

Choose: SQL for orders/payments + NoSQL for catalog/sessions

Why SQL for orders: Order placement needs ACID — reduce stock, create order, charge payment atomically. Inventory must be strongly consistent to prevent overselling.

Why NoSQL for catalog: Product catalog has varied schemas (electronics have specs, clothing has sizes). Read-heavy, rarely updated. A document DB (MongoDB) or search engine (Elasticsearch) handles this well. Sessions and cart data → Redis (key-value, fast, ephemeral).

Architecture: PostgreSQL for transactional data + MongoDB/Elasticsearch for catalog + Redis for sessions/caching.

Polyglot persistence architecture — PostgreSQL for transactions, MongoDB for catalog, Redis for caching, and Elasticsearch for search in one system

💬 Chat Application

Choose: NoSQL (Cassandra or DynamoDB) for messages + SQL for user accounts

Why NoSQL for messages: Messages are append-heavy (write-heavy workload). Access pattern is simple: get messages by conversation_id, sorted by timestamp. No complex JOINs needed. Scale is massive — billions of messages. Eventual consistency is fine — if a message appears 100ms late, nobody notices.

Why SQL for accounts: User registration, authentication, and profile management need ACID. Schema is stable. Scale is moderate.

Architecture: PostgreSQL for users/auth + Cassandra for messages + Redis for online presence/typing indicators.

📊 Analytics / Logging System

Choose: Wide-column store (Cassandra) or time-series DB (TimescaleDB)

Why: Logs and events are append-only (never updated). Write throughput is massive — millions of events per second. Access pattern: query by time range and service name. No JOINs, no transactions needed. Eventual consistency is fine — a log entry appearing 1 second late doesn't matter.

Why not SQL: A single PostgreSQL instance can't handle millions of writes per second. Sharding SQL for append-only data is over-engineering when Cassandra does it natively.

Architecture: Cassandra for raw event storage + Elasticsearch for search/filtering + ClickHouse or BigQuery for analytical queries.

Trade-offs & Decision Making

Consistency vs Availability

Dimension	Strong Consistency	Eventual Consistency
Guarantee	Every read returns the latest write	Reads may return stale data briefly
Latency	Higher (must coordinate across nodes)	Lower (read from nearest replica)
Availability	Lower (blocks during partitions)	Higher (always serves requests)
Complexity	Simpler application logic	Application must handle stale reads
Use when	Money, inventory, bookings	Likes, feeds, analytics, caching

Flexibility vs Structure

Dimension	Structured (SQL)	Flexible (NoSQL)
Schema changes	Requires migrations (ALTER TABLE)	Add fields anytime (schema-less)
Data validation	Database enforces constraints	Application must validate
Querying	Powerful (SQL, JOINs, aggregations)	Limited (varies by database)
Data integrity	High (foreign keys, constraints)	Low (no referential integrity)
Development speed	Slower initially (schema design)	Faster initially (just store JSON)
Long-term maintenance	Easier (schema is documentation)	Harder (schema drift, inconsistency)

Performance vs Correctness

Dimension	Correctness First (ACID)	Performance First (BASE)
Write speed	Slower (coordination, locking)	Faster (no coordination)
Read speed	Consistent but may wait for locks	Fast (read from any replica)
Throughput	Lower (serialization overhead)	Higher (parallel, distributed)
Data guarantees	Always correct	Eventually correct
Failure handling	Rollback on failure	Reconcile on recovery

🎯 The Polyglot Persistence Pattern

Most production systems use multiple databases — each for what it does best. PostgreSQL for transactions, Redis for caching, Elasticsearch for search, Cassandra for logs. This is called polyglot persistence. In interviews, showing you can pick the right database for each part of the system is more impressive than picking one database for everything.

Interview Questions

Conceptual, scenario-based, and comparison questions you're likely to encounter.

Q:Why would you choose NoSQL over SQL?

A: When the access pattern is simple (key-based lookups, not complex JOINs), the schema is flexible or varies per record, horizontal scaling is needed (millions of writes/sec), and eventual consistency is acceptable. Examples: caching (Redis), user-generated content with varying fields (MongoDB), high-throughput event logging (Cassandra). The key insight: NoSQL isn't 'better' — it's optimized for specific patterns that SQL handles poorly at scale.

Q:What is eventual consistency?

A: After a write, not all replicas are updated immediately. For a brief period (milliseconds to seconds), different replicas may return different values. Eventually, all replicas converge to the same state. Example: you post a photo on social media. Your friend in another country might not see it for 2 seconds because their read hits a replica that hasn't received the update yet. This is acceptable for social media but catastrophic for banking.

Q:When is ACID mandatory?

A: Whenever incorrect data causes real harm: financial transactions (money can't disappear), inventory management (can't sell more than you have), booking systems (can't double-book a seat), healthcare records (wrong data = wrong treatment). The common thread: the cost of inconsistency is higher than the cost of slower performance. If a brief inconsistency is harmless (social media likes, view counts, analytics), ACID is overkill.

You're designing a ride-sharing app like Uber

Which databases would you use and why?

Answer: PostgreSQL for user accounts, payment records, and trip history (ACID for payments, complex queries for reporting). Redis for real-time driver locations (key-value: driver_id → {lat, lng}, ultra-low latency, ephemeral data). Cassandra or DynamoDB for trip events and logs (high write throughput, time-series access pattern). Elasticsearch for location-based search ('find drivers near me'). This is polyglot persistence — each database handles what it's best at.

A startup asks: 'Should we use MongoDB or PostgreSQL?'

How would you advise them?

Answer: Start with PostgreSQL unless you have a specific reason not to. It handles relational data, JSON documents (JSONB), full-text search, and even geospatial queries. It's ACID-compliant, has a massive ecosystem, and scales to millions of rows easily. Choose MongoDB only if: your data is genuinely document-oriented (nested, varying schemas), you don't need JOINs or transactions across documents, and you need the schema flexibility from day one. Most startups that chose MongoDB early end up needing relational features later.

Your system needs to handle 500K writes per second

Can SQL handle this? What would you recommend?

Answer: A single SQL instance cannot handle 500K writes/sec. Options: (1) If the data is append-only (logs, events) → Cassandra or ScyllaDB, designed for this exact workload. (2) If you need ACID for some writes → use SQL for transactional data (orders, payments) and NoSQL for high-throughput data (events, analytics). (3) If all writes need ACID → shard PostgreSQL with Citus, or use CockroachDB (distributed SQL). The answer depends on whether all 500K writes need strong consistency or just a subset.

Common Mistakes

These mistakes lead to painful migrations, data loss, and architectural dead ends.

🎯

Choosing NoSQL without understanding the trade-offs

'MongoDB is web-scale' is not a design decision. Teams pick NoSQL for the flexible schema, then discover they need JOINs, transactions, and data integrity — all things SQL gives you for free. Six months later, they're reimplementing half of PostgreSQL in application code.

✅Start with SQL (PostgreSQL) unless you have a specific, measured reason to use NoSQL. Flexible schema sounds great until you have 50 microservices writing different shapes to the same collection with no validation.

📊

Overusing SQL for high-scale write workloads

Trying to push 100K writes/sec into a single PostgreSQL instance for event logging or analytics. SQL databases are optimized for transactional workloads with moderate write throughput, not for firehose-level append-only data.

✅Use the right tool: , , or ClickHouse for high-throughput writes. Keep SQL for transactional data where ACID matters. Don't force a screwdriver to be a hammer.

🔒

Ignoring consistency requirements

Using an eventually consistent database for data that requires strong consistency. Example: using DynamoDB (eventual consistency by default) for inventory management, then wondering why items are oversold during flash sales.

✅Map each data type to its consistency requirement. Orders, payments, inventory → strong consistency (SQL). Likes, views, recommendations → eventual consistency is fine (NoSQL). Don't apply one consistency model to all data.

🤔

Misunderstanding BASE as 'no consistency'

BASE doesn't mean 'data is wrong.' It means 'data is temporarily stale but will converge.' Eventual consistency still guarantees that all replicas will eventually agree. It's not chaos — it's a deliberate trade-off for availability and performance.

✅Understand the consistency spectrum: strong → bounded staleness → session → eventual. Most NoSQL databases let you configure the consistency level per query. DynamoDB has 'strongly consistent reads' as an option. Cassandra has tunable consistency (ONE, QUORUM, ALL).

🔧

Using one database for everything

Forcing PostgreSQL to be your cache, search engine, message queue, and analytics store. Or forcing MongoDB to handle transactions, reporting, and real-time analytics. Every database has a sweet spot — pushing it beyond that creates performance problems and operational nightmares.

✅Embrace polyglot persistence. Use each database for what it does best. PostgreSQL for transactions, Redis for caching, Elasticsearch for search, Cassandra for logs. The complexity of managing multiple databases is lower than the complexity of forcing one database to do everything poorly.