ReplicationLeader-FollowerMulti-LeaderLeaderlessReplication LagRead-Your-WritesHigh Availability

Replication

Master data replication strategies — leader-follower, multi-leader, and leaderless replication. Understand replication lag, read-your-writes consistency, and how to keep data in sync across distributed nodes.

30 min read10 sections

Intuition — Why Replication?

Replication means keeping copies of the same data on multiple machines. If you have one database server and it dies, your entire application goes down. If you have three copies of that data on three different machines, one can fail and the other two keep serving traffic. That's the core idea.

But replication isn't just about surviving failures. It also lets you spread read traffic across multiple machines (read scalability) and place copies closer to users in different regions (lower latency). The challenge is keeping all copies in sync — and that's where things get interesting.

📄

The Shared Document Analogy

Imagine a company with offices in New York, London, and Tokyo. Each office needs a copy of the employee handbook. The NY office is the 'master copy' — all edits happen there. Every night, updated copies are sent to London and Tokyo. This works well until someone in London edits their copy locally. Now London's version differs from NY's. Which one is correct? What if Tokyo also made edits? You now have three conflicting versions. This is the fundamental challenge of replication: keeping copies in sync when changes can happen at different places and network delays exist.

🛡️ High Availability

If one node goes down, others continue serving requests. No single point of failure.

⚡ Read Scalability

Spread read queries across replicas. One leader handles writes, many followers handle reads.

🌍 Fault Tolerance

Data survives hardware failures, network partitions, and even entire datacenter outages.

🔥 Key Insight

Replication is not optional in production systems. Every serious database — PostgreSQL, MySQL, MongoDB, Cassandra — supports replication out of the box. The question isn't whether to replicate, but which replication model fits your system's needs.

Big Picture Overview

When data is replicated across nodes, every write must eventually reach all copies. The fundamental question is: who accepts writes, and how do those writes propagate?

Write Path vs Read Pathtext

WRITE PATH (how data gets to replicas):
  Client → Write to Node(s) → Propagate to other replicas
  
  Key questions:
    - Which node(s) accept writes?
    - Is propagation synchronous or asynchronous?
    - What happens if a replica is down during propagation?

READ PATH (how clients read data):
  Client → Read from any replica (or specific one)
  
  Key questions:
    - Can the replica have stale data?
    - How stale is acceptable?
    - Does the client need to read its own writes?

THE TENSION:
  Synchronous replication → strong consistency, higher latency
  Asynchronous replication → lower latency, risk of stale reads
  
  Every replication model makes a different trade-off here.

💡 The Core Challenge

Network delays are unavoidable. A write on Node A takes time to reach Node B. During that window, Node B has stale data. Every replication strategy is fundamentally about managing this window — making it smaller, hiding it from users, or accepting it as a trade-off.

Leader-Follower Replication

Leader-follower (also called primary-replica or master-slave) is the most common replication model. One node is the leader — it handles all writes. The other nodes are followers — they receive a copy of every write from the leader and serve read queries.

Leader-Follower — How It Workstext

WRITE FLOW:
  Client → Leader (primary)
  Leader writes to its local storage
  Leader sends write to Follower 1, Follower 2, Follower 3
  
  Synchronous: Leader waits for follower ACK before confirming to client
  Asynchronous: Leader confirms immediately, followers catch up later

READ FLOW:
  Client → Any Follower (or Leader)
  Follower returns data from its local copy
  
  If async: follower may return slightly stale data
  If sync: follower is guaranteed up-to-date

EXAMPLE — PostgreSQL Streaming Replication:
  Primary (leader):  handles all INSERT/UPDATE/DELETE
  Replica 1:         read-only, streams WAL from primary
  Replica 2:         read-only, streams WAL from primary
  
  Application routes:
    Writes → primary (port 5432)
    Reads  → replica pool (port 5433) via load balancer

Strengths

✅Simple to understand and operate
✅Stronger consistency (all writes go through one node)
✅Read scaling — add more followers for more read throughput
✅Well-supported by all major databases
✅Clear failover path (promote a follower to leader)

Weaknesses

❌Leader is a write bottleneck (single point for writes)
❌Failover is complex (detect failure, elect new leader, redirect clients)
❌Async followers can serve stale data
❌Leader failure can lose recent writes (if async)
❌No write scaling — one leader handles all writes

Failover — The Hard Parttext

Leader dies. What happens?

1. DETECT the failure
   - Followers notice leader stopped sending heartbeats
   - Timeout: typically 10-30 seconds
   - Problem: is the leader dead, or just slow? (false positive risk)

2. ELECT a new leader
   - Choose the follower with the most up-to-date data
   - If async replication: the most up-to-date follower may still
     be behind the old leader (data loss risk)

3. REDIRECT clients
   - Update DNS or load balancer to point writes to new leader
   - Old leader comes back → must become a follower (split-brain risk)

SPLIT-BRAIN DANGER:
   Old leader recovers, thinks it's still the leader
   Two nodes accept writes → conflicting data
   Solution: fencing (old leader is forcibly shut down)

🎯 Interview Insight

Leader-follower is the most commonly used replication model. PostgreSQL, MySQL, MongoDB (replica sets), and Redis all use it. When an interviewer asks about replication, start here. Mention the failover complexity and split-brain risk to show depth.

Multi-Leader Replication

Multi-leader replication allows multiple nodes to accept writes. Each leader processes writes independently and then replicates changes to the other leaders. This is useful when you need low-latency writes in multiple geographic regions.

🏢

The Multi-Office Editing Analogy

Three offices (NY, London, Tokyo) each have their own copy of a shared spreadsheet. Each office can edit their copy locally — no waiting for the NY office to approve changes. At the end of each hour, all offices sync their changes. But what if NY changed cell A1 to '100' and London changed A1 to '200' at the same time? That's a conflict. Someone (or some rule) must decide which value wins. This is the fundamental challenge of multi-leader replication: concurrent writes to different leaders can conflict.

Multi-Leader — How It Workstext

TOPOLOGY:
  Leader A (US-East)  ←→  Leader B (EU-West)  ←→  Leader C (AP-Tokyo)
  Each leader has its own followers for read scaling.

WRITE FLOW:
  User in Tokyo → writes to Leader C (low latency, ~5ms)
  Leader C replicates to Leader A and Leader B (async, ~100-200ms)
  
  vs Leader-Follower:
  User in Tokyo → writes to Leader A in US-East (~150ms round trip)
  Much higher latency for geo-distributed users.

CONFLICT EXAMPLE:
  Time T1: User in US updates username to "alice_v2" on Leader A
  Time T1: User in EU updates username to "alice_eu" on Leader B
  
  Both writes succeed locally. When they sync:
  Leader A has "alice_v2", Leader B has "alice_eu"
  → CONFLICT. Which one wins?

CONFLICT RESOLUTION STRATEGIES:
  1. Last-write-wins (LWW): highest timestamp wins (data loss risk)
  2. Merge values: combine both ("alice_v2 | alice_eu")
  3. Custom logic: application-specific resolution
  4. Prompt user: "which version do you want to keep?"

Strengths

✅Low-latency writes in multiple regions
✅High availability — each region operates independently
✅Tolerates entire datacenter failures
✅Good for geo-distributed systems
✅No single write bottleneck

Weaknesses

❌Write conflicts are inevitable and hard to resolve
❌Data inconsistency risk between leaders
❌Complex to implement and debug
❌Conflict resolution logic can be application-specific
❌Not supported well by all databases

🎯 Interview Insight

Multi-leader replication is used when low-latency writes are required globally — think Google Docs (collaborative editing), multi-region databases, or CRDTs. Mention conflict resolution as the key challenge. If the interviewer asks "how would you handle writes in multiple regions?" — this is the answer.

Leaderless Replication

Leaderless replication has no designated leader. Clients send writes to multiple nodes simultaneously, and reads also query multiple nodes. Consistency is achieved through quorum-based voting — a write succeeds if enough nodes acknowledge it, and a read is correct if enough nodes agree on the value.

🗳️

The Voting Analogy

Imagine 5 judges scoring a gymnastics routine. To 'confirm' a score, at least 3 judges must agree (a majority quorum). If you ask 3 judges for the score and 2 say '9.5' and 1 says '9.3', you trust the majority — the score is 9.5. Even if 2 judges are absent, you can still get a valid score from the remaining 3. This is how leaderless replication works: write to W nodes, read from R nodes, and as long as W + R > N (total nodes), you're guaranteed to read at least one up-to-date copy.

Leaderless — Quorum Mechanicstext

SETUP: 5 nodes (N=5)

WRITE: send to W=3 nodes
  Client → Node 1 ✅ (ACK)
  Client → Node 2 ✅ (ACK)
  Client → Node 3 ✅ (ACK)
  Client → Node 4 ❌ (down)
  Client → Node 5 ❌ (slow)
  
  W=3 ACKs received → write succeeds

READ: query R=3 nodes
  Client → Node 1: returns value V2 (latest)
  Client → Node 2: returns value V2 (latest)
  Client → Node 3: returns value V1 (stale — missed the write)
  
  Majority says V2 → return V2 to client ✅

QUORUM RULE: W + R > N
  W=3, R=3, N=5 → 3+3=6 > 5 ✅
  Guarantees at least one node in the read set has the latest write.

COMMON CONFIGURATIONS:
  N=3, W=2, R=2 → balanced (tolerates 1 failure)
  N=5, W=3, R=3 → higher availability (tolerates 2 failures)
  N=3, W=3, R=1 → fast reads, slow writes
  N=3, W=1, R=3 → fast writes, slow reads

Strengths

✅No single point of failure (no leader to lose)
✅High availability — tolerates multiple node failures
✅No failover needed (no leader election)
✅Tunable consistency via W and R values
✅Good for write-heavy workloads

Weaknesses

❌Complex consistency handling (sloppy quorums, read repair)
❌No strong consistency guarantee (even with quorums, edge cases exist)
❌Higher read/write latency (must contact multiple nodes)
❌Conflict resolution still needed for concurrent writes
❌Harder to reason about than leader-follower

🎯 Interview Insight

Leaderless replication is used in Dynamo-style databases — Amazon DynamoDB, Apache Cassandra, and Riak. Mention the quorum formula (W + R > N) and explain that it's used in highly distributed systems where availability matters more than strong consistency. This shows you understand the CAP trade-off in practice.

Replication Lag & Read-Your-Writes

Replication lag is the delay between a write on the leader and that write appearing on a follower. In async replication, this lag can range from milliseconds to seconds (or even minutes under heavy load). During this window, followers serve stale data.

Replication Lag — The Problemtext

SCENARIO: User updates their profile name

  T=0ms:  User sends UPDATE name='Alice V2' → Leader
  T=1ms:  Leader writes to local storage ✅
  T=1ms:  Leader responds "success" to user ✅
  T=2ms:  Leader sends change to Follower 1 (async)
  T=50ms: Follower 1 applies the change ✅
  
  T=5ms:  User refreshes profile page
          → Load balancer routes read to Follower 1
          → Follower 1 still has old name 'Alice V1' (lag = 50ms)
          → User sees OLD name despite just updating it 😡

  T=50ms: Follower 1 finally has 'Alice V2'
          → Now reads return correct data

THE WINDOW (T=1ms to T=50ms):
  Leader has 'Alice V2'
  Follower has 'Alice V1'
  User is confused: "I just changed my name, why is it still old?"

Read-Your-Writes Consistency

Read-your-writes (also called read-after-write) consistency guarantees that if a user writes data, subsequent reads by that same user will see the write. Other users may still see stale data, but the writer always sees their own changes.

Read from the leader after a write

After a user writes, route their subsequent reads to the leader (which always has the latest data) for a short window (e.g., 10 seconds). After the window, reads can go back to followers. Simple but increases leader load.

Session-based routing (sticky sessions)

Track which follower a user is reading from. After a write, route that user's reads to a follower that has caught up to the write's position in the replication log. Use the write's log sequence number (LSN) as a marker.

Write-through caching

After a write, update a cache (Redis) with the new value. Reads check the cache first. If the cache has the value, return it (fresh). If not, read from a follower. The cache acts as a bridge during the replication lag window.

Session-Based Routing — Implementationtext

WRITE:
  Client → Leader: UPDATE name='Alice V2'
  Leader returns: { success: true, lsn: 12847 }
  Client stores lsn=12847 in session/cookie

SUBSEQUENT READ:
  Client → Load Balancer: GET /profile (header: X-Min-LSN: 12847)
  Load Balancer checks followers:
    Follower 1: current LSN = 12840 (behind) ❌
    Follower 2: current LSN = 12850 (caught up) ✅
  Route to Follower 2 → returns 'Alice V2' ✅

FALLBACK:
  If no follower has caught up → route to Leader
  After 10 seconds → clear the LSN requirement (lag has resolved)

🎯 Interview Insight

Replication lag is one of the most commonly asked topics in system design interviews. The classic scenario is "user updates profile but doesn't see the change." Know the three solutions (read from leader, session routing, cache) and their trade-offs. This shows you understand the practical implications of async replication.

End-to-End Scenario

Let's design the replication strategy for a social media platform with 200M users across 3 regions (US, EU, Asia).

📱 Social Media — 200M Users, 3 Regions

Read:write ratio: 100:1 (reads dominate).

Users expect to see their own posts immediately.

Global users — latency matters for all regions.

99.99% availability target.

Choose replication model: leader-follower per region with cross-region async replication

Each region has a leader-follower setup. US-East is the primary leader for writes. EU and Asia have read replicas that stream from US-East asynchronously. For the 100:1 read-heavy workload, followers handle 99% of traffic. The leader handles the 1% writes.

Handle replication lag: read-your-writes via session routing

After a user creates a post, store the write's LSN in their session. For the next 10 seconds, route that user's reads to a follower that has caught up past that LSN. If no follower is ready, fall back to the leader. This ensures users always see their own posts.

Handle regional latency: local read replicas

EU users read from EU replicas (~10ms). Asia users read from Asia replicas (~10ms). Without replicas, EU users would read from US-East (~100ms). The 10x latency improvement justifies the replication complexity.

Handle leader failure: automated failover

If US-East leader fails, promote the most up-to-date EU follower to leader. Use a consensus-based leader election (Raft) to avoid split-brain. Redirect writes to the new leader via DNS update (TTL: 30s). Accept that a few seconds of writes may be lost (async replication trade-off).

Future: consider multi-leader for write-heavy features

If the platform adds real-time messaging (write-heavy), consider multi-leader replication for the messaging service. Each region accepts message writes locally, syncs async. Use last-write-wins for conflict resolution (acceptable for chat — latest message wins). Keep leader-follower for the core feed (read-heavy, consistency matters more).

Architecture Summarytext

REPLICATION TOPOLOGY:
  US-East (Leader) → EU-West (Follower) → 3 read replicas
                   → AP-Tokyo (Follower) → 3 read replicas
                   → US-East local       → 5 read replicas

WRITE PATH:
  Any user → writes routed to US-East Leader
  Leader replicates async to EU and Asia followers (~100-200ms lag)

READ PATH:
  US users  → US-East read replicas (5ms)
  EU users  → EU-West read replicas (10ms)
  Asia users → AP-Tokyo read replicas (10ms)

READ-YOUR-WRITES:
  After write → session stores LSN
  Next 10s → reads routed to caught-up replica or leader
  After 10s → normal replica routing

FAILOVER:
  Leader down → promote EU follower (Raft election)
  DNS update → 30s TTL → writes redirect to new leader
  Old leader recovers → becomes follower

Trade-offs & Decision Making

Dimension	Leader-Follower	Multi-Leader	Leaderless
Write handling	Single leader	Multiple leaders	Any node (quorum)
Read scaling	Excellent (add followers)	Good (read from local leader)	Good (read from any node)
Consistency	Strong (sync) or eventual (async)	Eventual (conflicts possible)	Eventual (quorum-based)
Write latency	Higher for remote users	Low (local leader)	Medium (write to W nodes)
Failover	Complex (leader election)	Automatic (other leaders exist)	Not needed (no leader)
Conflict risk	None (single writer)	High (concurrent writes)	Medium (concurrent writes)
Complexity	Low	High	High
Best for	Most applications, read-heavy	Geo-distributed writes	High availability, write-heavy

Consistency vs Availability vs Latency

The Trade-off Triangletext

SYNCHRONOUS REPLICATION:
  ✅ Strong consistency (all replicas agree)
  ❌ Higher write latency (wait for all ACKs)
  ❌ Lower availability (one slow replica blocks writes)
  Use when: financial transactions, inventory counts

ASYNCHRONOUS REPLICATION:
  ✅ Low write latency (don't wait for replicas)
  ✅ High availability (replicas don't block writes)
  ❌ Eventual consistency (stale reads possible)
  Use when: social media feeds, analytics, logs

SEMI-SYNCHRONOUS (practical middle ground):
  ✅ One replica is synchronous (guaranteed backup)
  ✅ Other replicas are async (don't block writes)
  ✅ Balances durability and performance
  Use when: most production databases (PostgreSQL default)

🎯 Decision Framework

Start with leader-follower (simplest, covers 90% of cases). Move to multi-leader only if you need low-latency writes in multiple regions. Use leaderless only if you need extreme availability and can tolerate eventual consistency. Don't over-engineer — most systems never need anything beyond leader-follower with async replication.

Interview Questions

Q:What is replication lag and why does it matter?

A: Replication lag is the delay between a write on the leader and that write appearing on a follower. It matters because during the lag window, followers serve stale data. A user who just updated their profile might see the old version if their read hits a lagging follower. Solutions include reading from the leader after writes, session-based routing using LSN markers, or write-through caching.

Q:When would you use multi-leader replication?

A: When you need low-latency writes in multiple geographic regions. For example, a global e-commerce platform where users in Europe, Asia, and the US all need fast write access. Each region has its own leader that accepts writes locally (~5ms) instead of routing to a single leader across the ocean (~150ms). The trade-off is conflict resolution — concurrent writes to different leaders can conflict and must be resolved (last-write-wins, merge, or custom logic).

Q:How do you ensure read-your-writes consistency?

A: Three approaches: (1) After a write, route the user's reads to the leader for a short window (simple but increases leader load). (2) Session-based routing — store the write's LSN in the user's session and route reads to a follower that has caught up past that LSN. (3) Write-through cache — update a cache (Redis) on write, reads check cache first. The best approach depends on your read/write ratio and infrastructure.

Q:Explain the quorum formula W + R > N.

A: In leaderless replication with N nodes, W is the number of nodes that must acknowledge a write, and R is the number of nodes queried for a read. If W + R > N, at least one node in the read set must have the latest write (pigeonhole principle). Example: N=5, W=3, R=3 → 3+3=6 > 5, so reads always see the latest write. Tuning W and R lets you trade off read vs write latency and consistency vs availability.

Q:What is split-brain and how do you prevent it?

A: Split-brain occurs when two nodes both believe they are the leader and accept writes independently, leading to conflicting data. It happens during failover if the old leader recovers before being demoted. Prevention: use fencing (forcibly shut down the old leader via STONITH), use a consensus algorithm (Raft/Paxos) for leader election, and use epoch/term numbers so nodes reject writes from stale leaders.

Q:Compare synchronous vs asynchronous replication.

A: Synchronous: the leader waits for follower acknowledgment before confirming the write. Guarantees no data loss on leader failure, but increases write latency and reduces availability (a slow follower blocks writes). Asynchronous: the leader confirms immediately and replicates in the background. Lower latency and higher availability, but recent writes can be lost if the leader fails before replication completes. Most production systems use semi-synchronous — one sync replica for durability, rest async for performance.

Common Mistakes

⏱️

Assuming immediate consistency with async replication

Using async replication and expecting all replicas to have the latest data instantly.

✅Acknowledge that async replication has lag. Design for it — use read-your-writes patterns, cache recent writes, or accept eventual consistency where appropriate.

👁️

Ignoring replication lag in the design

Not mentioning replication lag when designing a read-heavy system with replicas.

✅Always address replication lag. Explain how users will experience it and what mitigation strategies you'll use (session routing, read from leader, caching).

🔀

Choosing multi-leader when leader-follower suffices

Using multi-leader replication for a single-region application because it sounds more scalable.

✅Leader-follower handles 90% of use cases. Multi-leader adds conflict resolution complexity that's only justified for geo-distributed writes. Start simple.

💥

Not handling conflicts in multi-leader setups

Setting up multi-leader replication without a conflict resolution strategy.

✅Always define how conflicts are resolved: last-write-wins, merge, custom logic, or user prompt. Unresolved conflicts lead to silent data corruption.

🔄

Forgetting about failover complexity

Saying 'if the leader dies, a follower takes over' without explaining the mechanics.

✅Explain the full failover process: failure detection (heartbeat timeout), leader election (consensus), client redirection (DNS/LB update), and split-brain prevention (fencing).

Replication

Table of Contents

Intuition — Why Replication?

The Shared Document Analogy

🛡️ High Availability

⚡ Read Scalability

🌍 Fault Tolerance

Big Picture Overview

Leader-Follower Replication

Strengths

Weaknesses

Multi-Leader Replication

The Multi-Office Editing Analogy

Strengths

Weaknesses

Leaderless Replication

The Voting Analogy

Strengths

Weaknesses

Replication Lag & Read-Your-Writes

Read-Your-Writes Consistency

Read from the leader after a write

Session-based routing (sticky sessions)

Write-through caching

End-to-End Scenario

📱 Social Media — 200M Users, 3 Regions

Choose replication model: leader-follower per region with cross-region async replication

Handle replication lag: read-your-writes via session routing

Handle regional latency: local read replicas

Handle leader failure: automated failover

Future: consider multi-leader for write-heavy features

Trade-offs & Decision Making

Consistency vs Availability vs Latency

Interview Questions

Q:What is replication lag and why does it matter?

Q:When would you use multi-leader replication?

Q:How do you ensure read-your-writes consistency?

Q:Explain the quorum formula W + R > N.

Q:What is split-brain and how do you prevent it?

Q:Compare synchronous vs asynchronous replication.

Common Mistakes

Assuming immediate consistency with async replication

Ignoring replication lag in the design

Choosing multi-leader when leader-follower suffices

Not handling conflicts in multi-leader setups

Forgetting about failover complexity