Read ReplicasMulti-Level CachingCDNRedisRead ScalingReplication LagEdge Cache

Replication & Caching

Scale read-heavy systems with read replicas, multi-level caching, and CDN for static assets. Learn how to push reads as far from the database as possible.

24 min read9 sections

The Big Picture — Why Reads Are the Bottleneck

Most systems are read-heavy. A social media feed is read 100x for every post written. A product page is viewed 10,000x for every price update. If every read hits the primary database, the database becomes the bottleneck — no matter how powerful the machine.

🍽️

The Restaurant Analogy

A popular restaurant has one kitchen (primary database). During peak hours, 500 customers want food simultaneously. The kitchen can prepare 50 dishes per minute — the line stretches around the block. Solution: add multiple serving counters (read replicas) that serve pre-made dishes. Add a buffet table (cache) with the most popular items ready to grab. Put a food truck outside (CDN) serving drinks and snacks. Now the kitchen only handles new orders and restocking. 90% of customers never interact with the kitchen directly.

🔥 Key Insight

The goal of read scaling is simple: push reads as far away from the primary database as possible. Every read that hits a cache instead of the database is a read the database doesn't have to handle. Every image served from a CDN is a request the origin server never sees.

The Layered Architecture

Read scaling works in layers. Each layer absorbs a percentage of reads, so only a tiny fraction reaches the primary database.

👤

User

Makes a request

🌍

CDN

Static assets

⚡

App Cache

Redis / Memcached

📖

Read Replica

DB copy

🗄️

Primary DB

Source of truth

How Reads Are Absorbed — Layer by Layertext

Incoming: 100,000 reads/sec

Layer 1 — CDN (images, CSS, JS, static HTML):
  Absorbs: 60% of all requests
  Remaining: 40,000 reads/sec → next layer

Layer 2 — Application Cache (Redis):
  Absorbs: 90% of remaining API requests
  Remaining: 4,000 reads/sec → next layer

Layer 3 — Read Replicas (3 replicas):
  Absorbs: 75% of remaining DB queries
  Remaining: 1,000 reads/sec → primary DB

Layer 4 — Primary Database:
  Handles: 1,000 reads/sec (+ all writes)
  Original load: 100,000 → Actual load: 1,000
  → 99% reduction in primary DB load

💡 The Principle

Each layer is cheaper and faster than the one below it. CDN serves in 10ms from the edge. Redis serves in 1ms from memory. Read replicas serve in 5ms from local SSD. Primary DB serves in 50ms. Push as much traffic as possible to the upper layers.

Read Replicas

A read replica is a copy of the primary database that serves read queries. The primary handles all writes and streams changes to replicas. Replicas handle reads, distributing the load across multiple machines.

How Read Replicas Work

Primary handles all writes

INSERT, UPDATE, DELETE go to the primary database only. The primary is the single source of truth for writes.

Changes stream to replicas

The primary streams its write-ahead log (WAL) to all replicas. Replicas apply the changes to stay in sync. This can be synchronous (wait for replica to confirm) or asynchronous (fire and forget).

Reads are distributed across replicas

The application routes read queries to replicas using round-robin, least-connections, or random selection. Each replica handles a fraction of the total read load.

Read Replica Architecturetext

                    
                    ┌──────────────┐
  Writes ──────────→│  Primary DB  │
                    └──────┬───────┘
                           │ WAL stream
              ┌────────────┼────────────┐
              ▼            ▼            ▼
        ┌──────────┐ ┌──────────┐ ┌──────────┐
        │ Replica 1│ │ Replica 2│ │ Replica 3│
        └──────────┘ └──────────┘ └──────────┘
              ▲            ▲            ▲
              └────────────┼────────────┘
                           │
  Reads ───────────────────┘ (load balanced)

Capacity:
  Primary alone: 5,000 reads/sec
  + 3 replicas:  20,000 reads/sec (4x improvement)
  + 10 replicas: 55,000 reads/sec (11x improvement)

Replication Lag — The Trade-off

Asynchronous replication is faster but introduces lag — a replica might be milliseconds to seconds behind the primary. A user updates their profile on the primary, then reads from a replica that hasn't received the update yet. They see the old data.

🔄 Async Replication (Common)

Primary doesn't wait for replica confirmation
Lower write latency (primary responds immediately)
Replication lag: 10ms to seconds
Risk: stale reads from replicas
Used by: most production systems

🔒 Sync Replication (Strict)

Primary waits for at least 1 replica to confirm
Higher write latency (wait for network round trip)
No replication lag (replicas always up-to-date)
Risk: write availability depends on replica health
Used by: financial systems, critical data

Handling Stale Readstext

Problem:
  User updates profile → writes to Primary
  User refreshes page → reads from Replica (stale!)
  User sees old name → "My update didn't save!"

Solutions:
  1. Read-your-own-writes:
     After a write, route that user's reads to the Primary
     for the next N seconds. Other users read from replicas.

  2. Causal consistency:
     Track the write's WAL position. Route reads to a replica
     that has caught up to at least that position.

  3. Sticky sessions:
     Route a user to the same replica consistently.
     If they wrote to Primary, route reads to Primary briefly.

Strengths

✅Linear read scaling — add more replicas for more throughput
✅Improved availability — if primary fails, promote a replica
✅Geographic distribution — replicas in different regions
✅Offload analytics — run heavy reports on a replica, not primary
✅First step in scaling reads (before caching)

Limitations

❌Doesn't scale writes (all writes still go to primary)
❌Replication lag causes stale reads
❌More replicas = more replication traffic on primary
❌Failover complexity (promoting replica to primary)
❌Each replica is a full copy of the database (storage cost)

🎯 Interview Insight

Read replicas are the first answer to "how do you scale reads?" They're simple, well-understood, and supported by every major database. But they're not enough alone — you still need caching on top to handle truly massive read loads. Replicas scale reads 5-10x. Caching scales reads 100x.

Multi-Level Caching

Caching doesn't happen at one layer — it happens at every layer between the user and the database. Each layer catches requests that the layer above missed, so only a tiny fraction reaches the database.

Multi-Level Cache — The 4 Layerstext

Request: GET /api/products/42

L1 — Browser Cache (client-side):
  Browser checks: "Do I have this cached?"
  Cache-Control: max-age=60 → cached for 60 seconds
  HIT → return instantly (0ms, no network request)
  MISS → proceed to L2

L2 — CDN Edge Cache (Cloudflare, CloudFront):
  CDN edge in user's region checks its cache
  HIT → return from edge (10-30ms)
  MISS → proceed to L3

L3 — Application Cache (Redis / Memcached):
  App server checks Redis: GET product:42
  HIT → return from Redis (1-5ms)
  MISS → proceed to L4

L4 — Database (PostgreSQL / MySQL):
  Query: SELECT * FROM products WHERE id = 42
  Return result (20-100ms)
  Store in Redis for next request (cache-aside)

Typical hit rates:
  L1 (browser):  30-50% of requests never leave the browser
  L2 (CDN):      60-80% of remaining requests served from edge
  L3 (Redis):    90-95% of remaining requests served from cache
  L4 (database): handles only 1-5% of original traffic

🖥️

L1 — Browser Cache

Controlled by Cache-Control and ETag headers. Static assets (JS, CSS, images) cached for hours/days. API responses cached for seconds/minutes. Zero latency — no network request at all.

🌍

L2 — CDN Edge Cache

Globally distributed servers cache content close to users. A user in Tokyo gets data from a Tokyo edge, not from Virginia. 10-30ms latency. Handles images, videos, and cacheable API responses.

⚡

L3 — Application Cache (Redis)

In-memory cache on the server side. Stores API responses, database query results, computed values. 1-5ms latency. The most impactful layer for dynamic content.

🗄️

L4 — Database

The source of truth. Only hit when all cache layers miss. 20-100ms latency. With proper caching, handles 1-5% of original read traffic.

💡 Key Insight — Closer Cache = Faster Response

Browser cache: 0ms (no network). CDN: 10ms (same region). Redis: 1ms (same data center). Database: 50ms (disk I/O). Every layer you add between the user and the database reduces latency and load. The best request is one that never reaches your servers.

CDN for Static Assets

A CDN (Content Delivery Network) is a globally distributed network of edge servers that cache and serve static content close to users. For images, videos, CSS, and JavaScript — the CDN is the single biggest performance win.

CDN — How It Reduces Latencytext

Without CDN:
  User (Mumbai) → Origin Server (Virginia)
  Round trip: ~200ms
  Every image, every CSS file, every JS bundle: 200ms each

With CDN:
  User (Mumbai) → CDN Edge (Mumbai)
  Round trip: ~15ms
  Images, CSS, JS served from local edge

  First request (cache miss):
    CDN Edge → Origin → CDN caches it → serves to user
    Latency: ~200ms (one-time cost)

  Subsequent requests (cache hit):
    CDN Edge serves directly
    Latency: ~15ms (93% faster)

For a page with 30 assets:
  Without CDN: 30 × 200ms = 6 seconds (sequential)
  With CDN:    30 × 15ms  = 450ms
  → 13x faster page load

What to Put on CDN

CDN-friendly (static, cacheable)

✅Images (product photos, avatars, banners)
✅Videos (streaming content, previews)
✅CSS and JavaScript bundles
✅Fonts and icons
✅Static HTML pages (marketing, docs)
✅Downloads (PDFs, installers)

Not CDN-friendly (dynamic, personalized)

❌User-specific API responses (my orders, my feed)
❌Real-time data (stock prices, live scores)
❌Authenticated content (dashboard, admin panel)
❌Frequently changing data (inventory counts)
❌Small, unique payloads (not worth caching)

🎯 Interview Insight

When designing any system that serves media (images, videos), always mention CDN. It's the first thing you add. Netflix serves 100% of video from CDN edge servers. E-commerce sites serve all product images from CDN. Without CDN, your origin servers would need 10-100x more bandwidth capacity.

End-to-End Scenario

Let's design the read scaling layer for an Instagram-like feed — one of the most read-heavy systems imaginable.

📱 Instagram Feed — 500M DAU, 200K Read QPS

Each feed load: 1 API call + 20 images + 5 video thumbnails.

Read:Write ratio = 100:1. Writes: 2K posts/sec.

CDN → Serve all media (images, videos, thumbnails)

20 images + 5 thumbnails per feed load = 25 media requests. At 200K feed loads/sec = 5M media requests/sec. CDN absorbs 95%+ of these. Origin handles ~250K media requests/sec (cache misses only). Without CDN: 5M requests/sec to origin — impossible.

Redis Cache → Serve feed API responses

Pre-computed feed for each user stored in Redis: feed:{user_id} → [post_id_1, post_id_2, ...]. Cache hit rate: 90%. 200K API requests/sec → 180K served from Redis (1ms), 20K cache misses go to replicas.

Read Replicas → Handle cache misses

5 read replicas handle the 20K reads/sec that miss Redis. Each replica handles 4K reads/sec — well within capacity. Replicas also serve analytics queries and background jobs without impacting the primary.

Primary DB → Writes only

Primary handles 2K writes/sec (new posts, likes, comments). No read traffic — all reads are absorbed by the layers above. Primary has headroom for spikes and complex write operations.

Traffic Flow — Where Reads Gotext

Original traffic: 200K API reads/sec + 5M media reads/sec

Layer 1 — CDN:
  5M media reads → 4.75M served from edge (95%)
  250K cache misses → origin

Layer 2 — Redis Cache:
  200K API reads → 180K served from Redis (90%)
  20K cache misses → read replicas

Layer 3 — Read Replicas (5x):
  20K reads → distributed across 5 replicas
  4K reads/sec per replica

Layer 4 — Primary DB:
  0 reads (all absorbed above)
  2K writes/sec only

Summary:
  Total incoming: 5.2M requests/sec
  Primary DB handles: 2K writes/sec
  → 99.96% of traffic never touches the primary database

🔥 The Key Takeaway

99.96% of traffic is handled before it reaches the primary database. This is how Instagram, Twitter, and Netflix operate. The primary database is protected by layers of caching and replication. It only handles writes and the rare cache miss.

Trade-offs & Decision Making

Cache vs Read Replica

Dimension	Application Cache (Redis)	Read Replica
Latency	1-5ms (in-memory)	5-50ms (disk-based)
Data freshness	Stale until TTL/invalidation	Stale by replication lag (ms-seconds)
Capacity	Limited by RAM (expensive)	Full database copy (cheaper per GB)
Query flexibility	Key-value lookups only	Full SQL queries
Best for	Hot data, simple lookups, API responses	Complex queries, analytics, full dataset access
Scaling factor	100x read reduction	5-10x read reduction

CDN vs Backend Serving

Dimension	CDN Edge	Origin Server
Latency	10-30ms (local edge)	100-300ms (cross-region)
Cost	Cheap per GB (bandwidth pricing)	Expensive (server + bandwidth)
Content type	Static (images, JS, CSS, video)	Dynamic (API responses, personalized)
Scalability	Infinite (distributed globally)	Limited (your server capacity)
Cache control	TTL, cache busting, purge API	Full control (your code)

Freshness vs Performance

Strategy	Freshness	Latency	DB Load
No caching, no replicas	Always fresh	50-100ms	100% (all reads hit DB)
Read replicas only	Lag: ms-seconds	5-50ms	~20% (distributed across replicas)
Redis cache only	Stale up to TTL	1-5ms	~5-10% (cache misses only)
CDN + Redis + replicas	Varies by layer	1-30ms	~1-5% (only cache misses)

🎯 Decision Framework

Start with read replicas (simplest, scales reads 5-10x). Add Redis caching for hot data (scales reads 100x). Add CDN for static assets (eliminates media traffic from origin). Each layer is additive — you don't choose one, you stack them.

Interview Questions

Q:How do you scale read-heavy systems?

A: Layer by layer: (1) Add read replicas to distribute DB reads across multiple machines — 5-10x improvement. (2) Add application-level caching (Redis) for hot data — 90%+ of reads served from memory in 1ms. (3) Add CDN for static assets — images, videos, JS/CSS served from edge servers globally. (4) Use browser caching (Cache-Control headers) to eliminate repeat requests entirely. The result: 99%+ of reads never touch the primary database.

Q:Why use read replicas?

A: Read replicas offload read traffic from the primary database. The primary handles writes; replicas handle reads. Benefits: (1) Linear read scaling — add more replicas for more throughput. (2) Improved availability — promote a replica if primary fails. (3) Geographic distribution — replicas in different regions for lower latency. (4) Offload analytics — run heavy reports on a replica without impacting production. Trade-off: replication lag means replicas may serve slightly stale data.

Q:What is multi-level caching?

A: Caching at every layer between the user and the database: L1 (browser cache, 0ms), L2 (CDN edge, 10-30ms), L3 (Redis/Memcached, 1-5ms), L4 (database, 50ms+). Each layer catches requests the layer above missed. With 90% hit rate at each layer, only 0.1% of requests reach the database. The key insight: closer cache = faster response. The best request is one that never leaves the browser.

Q:When should you use a CDN?

A: Whenever you serve static content to a geographically distributed audience. Images, videos, CSS, JS, fonts, downloads — all belong on a CDN. The latency difference is dramatic: 200ms from a cross-continent origin vs 15ms from a local edge. CDN is essential for: e-commerce (product images), streaming (video delivery), any global web app. Not suitable for: personalized API responses, real-time data, authenticated content.

Pitfalls

🗑️

Ignoring cache invalidation

Adding Redis caching with a 30-minute TTL and never invalidating on writes. A user updates their profile, but the old data shows for 30 minutes. For prices and inventory, this causes real business problems — customers see wrong prices or buy out-of-stock items.

✅Use delete-on-write as the primary invalidation strategy. When data changes in the DB, delete the cache key immediately. TTL is a safety net, not the strategy. For critical data (prices, stock), use write-through caching or very short TTLs (5-10 seconds).

📖

Over-relying on replicas without caching

Adding 10 read replicas to handle 100K reads/sec instead of adding a cache. Each replica is a full database copy — expensive to maintain, and you still have replication lag. A single Redis instance could handle 100K reads/sec for hot data at 1ms latency.

✅Use replicas for complex queries and full dataset access. Use Redis for hot, simple lookups (user profiles, product details, session data). Redis handles 100K+ reads/sec on a single instance. You might need 2 replicas + Redis instead of 10 replicas.

🌍

Not using CDN for static assets

Serving product images directly from your application servers. Each image request consumes server CPU, bandwidth, and a connection slot. At scale, your servers spend more time serving images than processing API requests.

✅Put ALL static assets on a CDN from day one. It's cheap, easy to set up, and the performance improvement is immediate. Cloudflare, CloudFront, or Fastly — any CDN is better than serving images from your origin. This is the lowest-effort, highest-impact optimization.

📊

Poor cache layering

Caching everything in Redis but not using browser caching or CDN. Every request still hits your servers, even for data that hasn't changed. Or: using CDN for everything including dynamic API responses that change per user.

✅Match the cache layer to the content type. Browser cache: static assets the user has already loaded. CDN: static assets for all users (images, JS, CSS). Redis: dynamic but cacheable data (product details, user profiles). Database: only for cache misses and writes. Each layer has a purpose.