Replication & Caching
Scale read-heavy systems with read replicas, multi-level caching, and CDN for static assets. Learn how to push reads as far from the database as possible.
Table of Contents
The Big Picture ā Why Reads Are the Bottleneck
Most systems are read-heavy. A social media feed is read 100x for every post written. A product page is viewed 10,000x for every price update. If every read hits the primary database, the database becomes the bottleneck ā no matter how powerful the machine.
The Restaurant Analogy
A popular restaurant has one kitchen (primary database). During peak hours, 500 customers want food simultaneously. The kitchen can prepare 50 dishes per minute ā the line stretches around the block. Solution: add multiple serving counters (read replicas) that serve pre-made dishes. Add a buffet table (cache) with the most popular items ready to grab. Put a food truck outside (CDN) serving drinks and snacks. Now the kitchen only handles new orders and restocking. 90% of customers never interact with the kitchen directly.
š„ Key Insight
The goal of read scaling is simple: push reads as far away from the primary database as possible. Every read that hits a cache instead of the database is a read the database doesn't have to handle. Every image served from a CDN is a request the origin server never sees.
The Layered Architecture
Read scaling works in layers. Each layer absorbs a percentage of reads, so only a tiny fraction reaches the primary database.
User
Makes a request
CDN
Static assets
App Cache
Redis / Memcached
Read Replica
DB copy
Primary DB
Source of truth
Incoming: 100,000 reads/sec Layer 1 ā CDN (images, CSS, JS, static HTML): Absorbs: 60% of all requests Remaining: 40,000 reads/sec ā next layer Layer 2 ā Application Cache (Redis): Absorbs: 90% of remaining API requests Remaining: 4,000 reads/sec ā next layer Layer 3 ā Read Replicas (3 replicas): Absorbs: 75% of remaining DB queries Remaining: 1,000 reads/sec ā primary DB Layer 4 ā Primary Database: Handles: 1,000 reads/sec (+ all writes) Original load: 100,000 ā Actual load: 1,000 ā 99% reduction in primary DB load
š” The Principle
Each layer is cheaper and faster than the one below it. CDN serves in 10ms from the edge. Redis serves in 1ms from memory. Read replicas serve in 5ms from local SSD. Primary DB serves in 50ms. Push as much traffic as possible to the upper layers.
Read Replicas
A read replica is a copy of the primary database that serves read queries. The primary handles all writes and streams changes to replicas. Replicas handle reads, distributing the load across multiple machines.
How Read Replicas Work
Primary handles all writes
INSERT, UPDATE, DELETE go to the primary database only. The primary is the single source of truth for writes.
Changes stream to replicas
The primary streams its write-ahead log (WAL) to all replicas. Replicas apply the changes to stay in sync. This can be synchronous (wait for replica to confirm) or asynchronous (fire and forget).
Reads are distributed across replicas
The application routes read queries to replicas using round-robin, least-connections, or random selection. Each replica handles a fraction of the total read load.
āāāāāāāāāāāāāāāā Writes āāāāāāāāāāāā Primary DB ā āāāāāāāā¬āāāāāāāā ā WAL stream āāāāāāāāāāāāāā¼āāāāāāāāāāāāā ā¼ ā¼ ā¼ āāāāāāāāāāāā āāāāāāāāāāāā āāāāāāāāāāāā ā Replica 1ā ā Replica 2ā ā Replica 3ā āāāāāāāāāāāā āāāāāāāāāāāā āāāāāāāāāāāā ā² ā² ā² āāāāāāāāāāāāāā¼āāāāāāāāāāāāā ā Reads āāāāāāāāāāāāāāāāāāāā (load balanced) Capacity: Primary alone: 5,000 reads/sec + 3 replicas: 20,000 reads/sec (4x improvement) + 10 replicas: 55,000 reads/sec (11x improvement)
Replication Lag ā The Trade-off
Asynchronous replication is faster but introduces lag ā a replica might be milliseconds to seconds behind the primary. A user updates their profile on the primary, then reads from a replica that hasn't received the update yet. They see the old data.
š Async Replication (Common)
- Primary doesn't wait for replica confirmation
- Lower write latency (primary responds immediately)
- Replication lag: 10ms to seconds
- Risk: stale reads from replicas
- Used by: most production systems
š Sync Replication (Strict)
- Primary waits for at least 1 replica to confirm
- Higher write latency (wait for network round trip)
- No replication lag (replicas always up-to-date)
- Risk: write availability depends on replica health
- Used by: financial systems, critical data
Problem: User updates profile ā writes to Primary User refreshes page ā reads from Replica (stale!) User sees old name ā "My update didn't save!" Solutions: 1. Read-your-own-writes: After a write, route that user's reads to the Primary for the next N seconds. Other users read from replicas. 2. Causal consistency: Track the write's WAL position. Route reads to a replica that has caught up to at least that position. 3. Sticky sessions: Route a user to the same replica consistently. If they wrote to Primary, route reads to Primary briefly.
Strengths
- ā Linear read scaling ā add more replicas for more throughput
- ā Improved availability ā if primary fails, promote a replica
- ā Geographic distribution ā replicas in different regions
- ā Offload analytics ā run heavy reports on a replica, not primary
- ā First step in scaling reads (before caching)
Limitations
- āDoesn't scale writes (all writes still go to primary)
- āReplication lag causes stale reads
- āMore replicas = more replication traffic on primary
- āFailover complexity (promoting replica to primary)
- āEach replica is a full copy of the database (storage cost)
šÆ Interview Insight
Read replicas are the first answer to "how do you scale reads?" They're simple, well-understood, and supported by every major database. But they're not enough alone ā you still need caching on top to handle truly massive read loads. Replicas scale reads 5-10x. Caching scales reads 100x.
Multi-Level Caching
Caching doesn't happen at one layer ā it happens at every layer between the user and the database. Each layer catches requests that the layer above missed, so only a tiny fraction reaches the database.
Request: GET /api/products/42 L1 ā Browser Cache (client-side): Browser checks: "Do I have this cached?" Cache-Control: max-age=60 ā cached for 60 seconds HIT ā return instantly (0ms, no network request) MISS ā proceed to L2 L2 ā CDN Edge Cache (Cloudflare, CloudFront): CDN edge in user's region checks its cache HIT ā return from edge (10-30ms) MISS ā proceed to L3 L3 ā Application Cache (Redis / Memcached): App server checks Redis: GET product:42 HIT ā return from Redis (1-5ms) MISS ā proceed to L4 L4 ā Database (PostgreSQL / MySQL): Query: SELECT * FROM products WHERE id = 42 Return result (20-100ms) Store in Redis for next request (cache-aside) Typical hit rates: L1 (browser): 30-50% of requests never leave the browser L2 (CDN): 60-80% of remaining requests served from edge L3 (Redis): 90-95% of remaining requests served from cache L4 (database): handles only 1-5% of original traffic
L1 ā Browser Cache
Controlled by Cache-Control and ETag headers. Static assets (JS, CSS, images) cached for hours/days. API responses cached for seconds/minutes. Zero latency ā no network request at all.
L2 ā CDN Edge Cache
Globally distributed servers cache content close to users. A user in Tokyo gets data from a Tokyo edge, not from Virginia. 10-30ms latency. Handles images, videos, and cacheable API responses.
L3 ā Application Cache (Redis)
In-memory cache on the server side. Stores API responses, database query results, computed values. 1-5ms latency. The most impactful layer for dynamic content.
L4 ā Database
The source of truth. Only hit when all cache layers miss. 20-100ms latency. With proper caching, handles 1-5% of original read traffic.
š” Key Insight ā Closer Cache = Faster Response
Browser cache: 0ms (no network). CDN: 10ms (same region). Redis: 1ms (same data center). Database: 50ms (disk I/O). Every layer you add between the user and the database reduces latency and load. The best request is one that never reaches your servers.
CDN for Static Assets
A CDN (Content Delivery Network) is a globally distributed network of edge servers that cache and serve static content close to users. For images, videos, CSS, and JavaScript ā the CDN is the single biggest performance win.
Without CDN: User (Mumbai) ā Origin Server (Virginia) Round trip: ~200ms Every image, every CSS file, every JS bundle: 200ms each With CDN: User (Mumbai) ā CDN Edge (Mumbai) Round trip: ~15ms Images, CSS, JS served from local edge First request (cache miss): CDN Edge ā Origin ā CDN caches it ā serves to user Latency: ~200ms (one-time cost) Subsequent requests (cache hit): CDN Edge serves directly Latency: ~15ms (93% faster) For a page with 30 assets: Without CDN: 30 Ć 200ms = 6 seconds (sequential) With CDN: 30 Ć 15ms = 450ms ā 13x faster page load
What to Put on CDN
CDN-friendly (static, cacheable)
- ā Images (product photos, avatars, banners)
- ā Videos (streaming content, previews)
- ā CSS and JavaScript bundles
- ā Fonts and icons
- ā Static HTML pages (marketing, docs)
- ā Downloads (PDFs, installers)
Not CDN-friendly (dynamic, personalized)
- āUser-specific API responses (my orders, my feed)
- āReal-time data (stock prices, live scores)
- āAuthenticated content (dashboard, admin panel)
- āFrequently changing data (inventory counts)
- āSmall, unique payloads (not worth caching)
šÆ Interview Insight
When designing any system that serves media (images, videos), always mention CDN. It's the first thing you add. Netflix serves 100% of video from CDN edge servers. E-commerce sites serve all product images from CDN. Without CDN, your origin servers would need 10-100x more bandwidth capacity.
End-to-End Scenario
Let's design the read scaling layer for an Instagram-like feed ā one of the most read-heavy systems imaginable.
š± Instagram Feed ā 500M DAU, 200K Read QPS
Each feed load: 1 API call + 20 images + 5 video thumbnails.
Read:Write ratio = 100:1. Writes: 2K posts/sec.
CDN ā Serve all media (images, videos, thumbnails)
20 images + 5 thumbnails per feed load = 25 media requests. At 200K feed loads/sec = 5M media requests/sec. CDN absorbs 95%+ of these. Origin handles ~250K media requests/sec (cache misses only). Without CDN: 5M requests/sec to origin ā impossible.
Redis Cache ā Serve feed API responses
Pre-computed feed for each user stored in Redis: feed:{user_id} ā [post_id_1, post_id_2, ...]. Cache hit rate: 90%. 200K API requests/sec ā 180K served from Redis (1ms), 20K cache misses go to replicas.
Read Replicas ā Handle cache misses
5 read replicas handle the 20K reads/sec that miss Redis. Each replica handles 4K reads/sec ā well within capacity. Replicas also serve analytics queries and background jobs without impacting the primary.
Primary DB ā Writes only
Primary handles 2K writes/sec (new posts, likes, comments). No read traffic ā all reads are absorbed by the layers above. Primary has headroom for spikes and complex write operations.
Original traffic: 200K API reads/sec + 5M media reads/sec Layer 1 ā CDN: 5M media reads ā 4.75M served from edge (95%) 250K cache misses ā origin Layer 2 ā Redis Cache: 200K API reads ā 180K served from Redis (90%) 20K cache misses ā read replicas Layer 3 ā Read Replicas (5x): 20K reads ā distributed across 5 replicas 4K reads/sec per replica Layer 4 ā Primary DB: 0 reads (all absorbed above) 2K writes/sec only Summary: Total incoming: 5.2M requests/sec Primary DB handles: 2K writes/sec ā 99.96% of traffic never touches the primary database
š„ The Key Takeaway
99.96% of traffic is handled before it reaches the primary database. This is how Instagram, Twitter, and Netflix operate. The primary database is protected by layers of caching and replication. It only handles writes and the rare cache miss.
Trade-offs & Decision Making
Cache vs Read Replica
| Dimension | Application Cache (Redis) | Read Replica |
|---|---|---|
| Latency | 1-5ms (in-memory) | 5-50ms (disk-based) |
| Data freshness | Stale until TTL/invalidation | Stale by replication lag (ms-seconds) |
| Capacity | Limited by RAM (expensive) | Full database copy (cheaper per GB) |
| Query flexibility | Key-value lookups only | Full SQL queries |
| Best for | Hot data, simple lookups, API responses | Complex queries, analytics, full dataset access |
| Scaling factor | 100x read reduction | 5-10x read reduction |
CDN vs Backend Serving
| Dimension | CDN Edge | Origin Server |
|---|---|---|
| Latency | 10-30ms (local edge) | 100-300ms (cross-region) |
| Cost | Cheap per GB (bandwidth pricing) | Expensive (server + bandwidth) |
| Content type | Static (images, JS, CSS, video) | Dynamic (API responses, personalized) |
| Scalability | Infinite (distributed globally) | Limited (your server capacity) |
| Cache control | TTL, cache busting, purge API | Full control (your code) |
Freshness vs Performance
| Strategy | Freshness | Latency | DB Load |
|---|---|---|---|
| No caching, no replicas | Always fresh | 50-100ms | 100% (all reads hit DB) |
| Read replicas only | Lag: ms-seconds | 5-50ms | ~20% (distributed across replicas) |
| Redis cache only | Stale up to TTL | 1-5ms | ~5-10% (cache misses only) |
| CDN + Redis + replicas | Varies by layer | 1-30ms | ~1-5% (only cache misses) |
šÆ Decision Framework
Start with read replicas (simplest, scales reads 5-10x). Add Redis caching for hot data (scales reads 100x). Add CDN for static assets (eliminates media traffic from origin). Each layer is additive ā you don't choose one, you stack them.
Interview Questions
Q:How do you scale read-heavy systems?
A: Layer by layer: (1) Add read replicas to distribute DB reads across multiple machines ā 5-10x improvement. (2) Add application-level caching (Redis) for hot data ā 90%+ of reads served from memory in 1ms. (3) Add CDN for static assets ā images, videos, JS/CSS served from edge servers globally. (4) Use browser caching (Cache-Control headers) to eliminate repeat requests entirely. The result: 99%+ of reads never touch the primary database.
Q:Why use read replicas?
A: Read replicas offload read traffic from the primary database. The primary handles writes; replicas handle reads. Benefits: (1) Linear read scaling ā add more replicas for more throughput. (2) Improved availability ā promote a replica if primary fails. (3) Geographic distribution ā replicas in different regions for lower latency. (4) Offload analytics ā run heavy reports on a replica without impacting production. Trade-off: replication lag means replicas may serve slightly stale data.
Q:What is multi-level caching?
A: Caching at every layer between the user and the database: L1 (browser cache, 0ms), L2 (CDN edge, 10-30ms), L3 (Redis/Memcached, 1-5ms), L4 (database, 50ms+). Each layer catches requests the layer above missed. With 90% hit rate at each layer, only 0.1% of requests reach the database. The key insight: closer cache = faster response. The best request is one that never leaves the browser.
Q:When should you use a CDN?
A: Whenever you serve static content to a geographically distributed audience. Images, videos, CSS, JS, fonts, downloads ā all belong on a CDN. The latency difference is dramatic: 200ms from a cross-continent origin vs 15ms from a local edge. CDN is essential for: e-commerce (product images), streaming (video delivery), any global web app. Not suitable for: personalized API responses, real-time data, authenticated content.
Pitfalls
Ignoring cache invalidation
Adding Redis caching with a 30-minute TTL and never invalidating on writes. A user updates their profile, but the old data shows for 30 minutes. For prices and inventory, this causes real business problems ā customers see wrong prices or buy out-of-stock items.
ā Use delete-on-write as the primary invalidation strategy. When data changes in the DB, delete the cache key immediately. TTL is a safety net, not the strategy. For critical data (prices, stock), use write-through caching or very short TTLs (5-10 seconds).
Over-relying on replicas without caching
Adding 10 read replicas to handle 100K reads/sec instead of adding a cache. Each replica is a full database copy ā expensive to maintain, and you still have replication lag. A single Redis instance could handle 100K reads/sec for hot data at 1ms latency.
ā Use replicas for complex queries and full dataset access. Use Redis for hot, simple lookups (user profiles, product details, session data). Redis handles 100K+ reads/sec on a single instance. You might need 2 replicas + Redis instead of 10 replicas.
Not using CDN for static assets
Serving product images directly from your application servers. Each image request consumes server CPU, bandwidth, and a connection slot. At scale, your servers spend more time serving images than processing API requests.
ā Put ALL static assets on a CDN from day one. It's cheap, easy to set up, and the performance improvement is immediate. Cloudflare, CloudFront, or Fastly ā any CDN is better than serving images from your origin. This is the lowest-effort, highest-impact optimization.
Poor cache layering
Caching everything in Redis but not using browser caching or CDN. Every request still hits your servers, even for data that hasn't changed. Or: using CDN for everything including dynamic API responses that change per user.
ā Match the cache layer to the content type. Browser cache: static assets the user has already loaded. CDN: static assets for all users (images, JS, CSS). Redis: dynamic but cacheable data (product details, user profiles). Database: only for cache misses and writes. Each layer has a purpose.