Failure Modes
Understand how caching systems fail in production — cache miss storms, cache penetration, and cache avalanche. Learn how to design systems that don't collapse under load.
Table of Contents
The Big Picture — Why Cache Failures Are Dangerous
Caching isn't just an optimization — in most production systems, it's a critical dependency. Your database is sized to handle 1,000 queries per second. Your cache absorbs 99% of reads, so only 10 QPS reach the database. If the cache fails, 100,000 QPS suddenly hit a database designed for 1,000. The database collapses. The entire system goes down.
This is the fundamental danger: systems are designed around the assumption that the cache works. When it doesn't, the failure cascades through every layer.
The Popular Store Analogy
Imagine a popular electronics store on Black Friday. The cache is the fast checkout counter at the front — it handles 99% of customers instantly. The database is the warehouse in the back — it can fulfill 10 orders per minute. Cache miss storm: the checkout counter breaks. All 1,000 customers rush to the warehouse. The warehouse staff is overwhelmed, orders pile up, and the entire store grinds to a halt. Cache penetration: customers keep asking for products that don't exist ('Do you have a flying car?'). The checkout counter can't help (it only stocks real products), so every fake request goes to the warehouse. Attackers send thousands of fake requests. Cache avalanche: the store opens at 9 AM and every product's price tag expires at exactly 10 AM. At 10:01, every customer needs a fresh price check from the warehouse — simultaneously.
🔥 The Amplification Effect
A cache with a 99% hit rate means the database handles 1% of traffic. If the cache fails, the database doesn't get 2x traffic — it gets 100x. This amplification is why cache failures cause cascading outages, not gradual degradation.
Cache in the Architecture
In a typical architecture, the cache sits between the application and the database. It absorbs the vast majority of read traffic, protecting the database from being overwhelmed.
Client
Sends request
App Server
Checks cache first
Cache (Redis)
99% hit rate
Database
1% of reads
Normal operation (cache healthy): Incoming traffic: 100,000 requests/sec Cache hit rate: 99% Cache serves: 99,000 requests/sec (0.2ms each) Database receives: 1,000 requests/sec (2ms each) Database capacity: 5,000 QPS ✅ Comfortable Cache failure (any failure mode): Incoming traffic: 100,000 requests/sec Cache hit rate: 0% (or severely degraded) Cache serves: 0 requests/sec Database receives: 100,000 requests/sec ← 100x normal load Database capacity: 5,000 QPS ❌ OVERWHELMED Result: database connection pool exhausted, queries timeout, error rates spike, cascading failure across all services.
Cache Miss Storm
A single hot key expires or is evicted. Thousands of concurrent requests for that key all miss the cache simultaneously and hit the database.
Cache Penetration
Requests for data that doesn't exist bypass the cache entirely. Every request goes to the database, which also returns nothing — wasted work.
Cache Avalanche
Many cache entries expire at the same time. The database is suddenly flooded with requests to rebuild all of them simultaneously.
Cache Miss Storms
A cache miss storm (also called a thundering herd or cache stampede) happens when a popular cache entry expires or is evicted, and hundreds or thousands of concurrent requests all try to rebuild it at the same time.
Everyone Asking the Warehouse for the Same Item
The checkout counter runs out of the most popular item (iPhone). 500 customers all ask the warehouse for it simultaneously. The warehouse only has one forklift — it can fulfill one request at a time. But all 500 requests arrive at once, creating a massive queue. Meanwhile, the forklift fetches the same pallet 500 times instead of once. The fix: one person goes to the warehouse, everyone else waits for that one result.
How It Happens
T=0: Cache entry "product:42" expires (TTL reached) T=0.001: Request A arrives → cache MISS → queries database T=0.002: Request B arrives → cache MISS → queries database T=0.003: Request C arrives → cache MISS → queries database ... T=0.010: 500 requests arrived → ALL query the database T=0.015: Database receives 500 identical queries simultaneously T=0.020: Database struggles, response time spikes from 2ms to 500ms T=0.025: Request A gets result, writes to cache T=0.026: Requests B-500 also get results (redundant work) Cache now has the value again Problem: 500 identical database queries when 1 would suffice. At scale: this happens for every hot key, every TTL expiration.
Solutions
Request Coalescing (Single Flight)
When a cache miss occurs, the first request acquires a lock and fetches from the database. All subsequent requests for the same key wait for the first request to complete, then share the result. 500 requests → 1 database query. This is the most effective solution.
Locking with Short TTL
When a cache miss occurs, set a short-lived lock key (e.g., 'lock:product:42' with 5-second TTL). Only the request that acquires the lock queries the database. Others either wait and retry, or serve stale data if available.
Early Refresh (Background Revalidation)
Before the TTL expires, proactively refresh the cache in the background. If TTL is 60 seconds, start refreshing at 50 seconds. The cache never actually expires — it's always warm. Trade-off: slightly stale data during the refresh window.
Cache Hot Data Aggressively
For the top 1% of keys that receive 50% of traffic, use very long TTLs or never-expire policies. Invalidate explicitly when data changes instead of relying on TTL expiration.
function getProduct(id): value = cache.get("product:" + id) if value != null: return value // Cache HIT — fast path // Cache MISS — check if someone is already fetching acquired = lock.acquire("fetch:product:" + id, ttl=5s) if acquired: // I'm the first — fetch from DB value = database.query("SELECT * FROM products WHERE id = ?", id) cache.set("product:" + id, value, ttl=60s) lock.release("fetch:product:" + id) return value else: // Someone else is fetching — wait and retry sleep(50ms) return getProduct(id) // Retry — cache should be populated now
🎯 Interview Insight
Cache miss storms are one of the most commonly asked caching questions. Always mention request coalescing (single flight) as the primary solution. Explain: "The first request fetches from the database, all others wait for that result. This turns N database queries into 1."
Cache Penetration
Cache penetration happens when requests are made for data that doesn't exist — not in the cache AND not in the database. Every request bypasses the cache (nothing to cache) and hits the database (which returns nothing). The cache provides zero protection because there's no data to cache.
Asking for Products That Don't Exist
Customers keep asking: 'Do you have a flying car?' The checkout counter (cache) doesn't have it. The warehouse (database) doesn't have it either. But every single request still requires a warehouse trip to confirm it doesn't exist. An attacker sends 100,000 requests for random non-existent product IDs. Every request penetrates through the cache and hits the database. The cache is useless because there's nothing to cache.
How It Happens
Normal request for existing data: GET /api/users/42 → Cache: MISS → DB: found → Cache: store → Return user → Next request: Cache HIT ✅ (cache protects DB) Penetration request for non-existent data: GET /api/users/99999999 → Cache: MISS → DB: not found → Cache: nothing to store → Return 404 → Next request: Cache MISS again → DB again → not found again → Every request hits the database ❌ Attack scenario: Attacker sends 50,000 requests/sec with random user IDs → user/a8f3b2c1, user/x7y9z0, user/random123... → None exist in cache OR database → All 50,000 requests hit the database every second → Database overwhelmed by queries that all return empty
Solutions
Cache Null Responses
When the database returns 'not found', cache that result too: cache.set('user:99999999', NULL, ttl=60s). Next request for the same ID gets a cached NULL — no database query. Use a short TTL (60-300 seconds) so real data can appear later.
Bloom Filters
A Bloom filter is a space-efficient data structure that can tell you 'definitely not in the set' or 'probably in the set'. Before checking the cache or database, check the Bloom filter: 'Does user 99999999 exist?' If the Bloom filter says NO, return 404 immediately — no cache or DB lookup needed.
Input Validation
Validate request parameters before they reach the cache layer. If user IDs are UUIDs, reject anything that isn't a valid UUID format. If product IDs are integers 1-10M, reject IDs outside that range. This blocks most attack traffic at the edge.
| Solution | How It Works | Pros | Cons |
|---|---|---|---|
| Cache NULL | Store 'not found' in cache with short TTL | Simple, effective for repeated IDs | Wastes cache memory if attacker uses random IDs |
| Bloom Filter | Probabilistic check: 'does this key exist?' | Blocks non-existent keys with zero DB load | False positives possible, needs to be updated on writes |
| Input Validation | Reject invalid formats at the API layer | Zero overhead, blocks malformed requests | Doesn't help if attacker uses valid-looking IDs |
Setup (on startup or periodically): bloom_filter = new BloomFilter(expected_items=10M, false_positive_rate=0.01) for each user_id in database: bloom_filter.add(user_id) Request flow: GET /api/users/99999999 Step 1: bloom_filter.contains(99999999)? → NO → Return 404 immediately (no cache or DB hit) GET /api/users/42 Step 1: bloom_filter.contains(42)? → YES (probably exists) → Check cache → Check DB False positive rate: ~1% → 1% of non-existent IDs will still reach the DB → 99% are blocked at the Bloom filter — massive protection
🎯 Interview Insight
Cache penetration is a favorite interview topic because it combines caching, security, and data structures. Mention all three solutions: cache NULL values (simple), Bloom filters (elegant), and input validation (practical). Explain that in production, you'd use all three as defense in depth.
Cache Avalanche
A cache avalanche happens when a large number of cache entries expire at the same time, causing a sudden flood of requests to the database. Unlike a miss storm (one hot key), an avalanche involves thousands of different keys all expiring simultaneously.
All Price Tags Expire at Once
The store updates all price tags at 9 AM every day. At 9:01 AM, every single price tag is expired. Every customer who picks up a product needs a fresh price check from the warehouse — simultaneously. 10,000 products × 100 customers = 1,000,000 warehouse requests in one minute. The warehouse collapses. The fix: stagger the price tag updates — some expire at 9:00, some at 9:05, some at 9:10. The load spreads out over time.
How It Happens
Scenario: cache warmed at startup with TTL=3600s (1 hour) T=0: System starts, cache is populated 10,000 product entries cached with TTL=3600s All entries expire at T=3600 T=3599: Cache is healthy, 99% hit rate Database handles 100 QPS comfortably T=3600: ALL 10,000 entries expire simultaneously Next second: 10,000 cache misses All 10,000 requests hit the database Database capacity: 5,000 QPS → OVERWHELMED T=3601: Database connection pool exhausted Queries timeout, errors cascade App servers retry → even more DB load System enters death spiral Common causes: → Cache warmed at startup (all same TTL) → Batch cache refresh (all keys refreshed together) → Cache server restart (all keys lost at once) → Synchronized TTLs (all set to round numbers like 60s, 300s)
Solutions
TTL Randomization (Jitter)
Instead of TTL=3600s for every key, use TTL=3600 + random(0, 600). Keys expire between 3600-4200 seconds — spread over a 10-minute window instead of all at once. This is the simplest and most effective solution.
Staggered Expiration
When warming the cache, add incremental offsets: key 1 gets TTL=3600, key 2 gets TTL=3601, key 3 gets TTL=3602, etc. 10,000 keys expire over ~3 hours instead of all at once.
Cache Warming on Restart
When a cache server restarts, don't let all traffic hit the database. Pre-warm the cache by loading hot data from the database before accepting traffic. Use a readiness probe — the server isn't 'ready' until the cache is warm.
Multi-Layer Caching
Use two cache layers: L1 (local in-process cache, small) and L2 (Redis, large). If L2 fails, L1 absorbs some traffic. If both fail, rate-limit database queries to prevent overload.
Circuit Breaker on Database
If the database is overwhelmed, stop sending it more requests. A circuit breaker detects high error rates and returns cached stale data or a degraded response instead of adding to the overload.
❌ Without jitter (all expire together): cache.set("product:1", data, ttl=3600) cache.set("product:2", data, ttl=3600) cache.set("product:3", data, ttl=3600) // All 10,000 keys expire at T+3600 → avalanche ✅ With jitter (spread over time): cache.set("product:1", data, ttl=3600 + random(0, 600)) // 3600-4200s cache.set("product:2", data, ttl=3600 + random(0, 600)) // 3600-4200s cache.set("product:3", data, ttl=3600 + random(0, 600)) // 3600-4200s // 10,000 keys expire over a 10-minute window // ~17 keys expire per second instead of 10,000 at once
🎯 Interview Insight
TTL randomization is the answer interviewers are looking for. It's simple, effective, and shows you understand the root cause (synchronized expiration). Always mention it first, then add cache warming and circuit breakers as defense in depth.
End-to-End Scenario
Let's design a high-scale product page system and show how each failure mode can occur and how to mitigate them.
System: E-Commerce Product Page (50K requests/sec)
Traffic: 50,000 product page views per second Products: 2 million products in database Hot products: top 1,000 products = 80% of traffic (40K req/s) Cache: Redis cluster, 99.5% hit rate Database: PostgreSQL, capacity 2,000 QPS Normal state: 50,000 req/s × 0.5% miss rate = 250 QPS to database ✅ Failure mode 1 — Miss Storm: iPhone 16 page cache expires. 5,000 concurrent requests. All 5,000 query PostgreSQL for the same product. DB goes from 250 QPS to 5,250 QPS → overwhelmed. Failure mode 2 — Penetration: Bot sends 10,000 req/s for random product IDs (product/-1, product/abc). None exist in cache or DB. All 10,000 hit the database. DB goes from 250 QPS to 10,250 QPS → overwhelmed. Failure mode 3 — Avalanche: Redis node restarts. 500,000 cached entries lost. Next minute: 50,000 req/s × 100% miss rate = 50,000 QPS to DB. DB capacity: 2,000 QPS → catastrophic failure.
The Protected Design
Request Coalescing (prevents miss storms)
Use a single-flight pattern: when product:42 cache misses, the first request acquires a lock and fetches from DB. All other requests for product:42 wait for that result. 5,000 concurrent requests → 1 DB query.
Bloom Filter + NULL Caching (prevents penetration)
A Bloom filter loaded with all valid product IDs blocks 99% of non-existent ID requests at the edge. For the 1% that pass (false positives), cache the NULL result with a 5-minute TTL. Input validation rejects non-integer IDs.
TTL Jitter + Cache Warming (prevents avalanche)
All cache entries use TTL = 3600 + random(0, 600). On Redis restart, a warm-up job pre-loads the top 1,000 hot products before the server accepts traffic. A circuit breaker on the DB connection limits concurrent queries to 1,500.
Multi-Layer Cache (defense in depth)
L1: in-process cache (Caffeine/Guava) with 1,000 entries, 30-second TTL. L2: Redis cluster. If Redis is down, L1 absorbs the hottest 80% of traffic. The database only sees the long-tail cold requests.
💡 This Is Production-Grade Caching
Every large-scale system (Amazon, Netflix, Twitter) uses all of these protections. It's not about picking one — it's about layering them. Request coalescing + Bloom filters + TTL jitter + multi-layer caching + circuit breakers = a system that survives cache failures gracefully.
Trade-offs & Decision Making
Every protection mechanism adds complexity. The skill is knowing which protections are worth the cost for your system.
| Protection | Complexity | Memory Cost | When Worth It |
|---|---|---|---|
| Request Coalescing | Medium (distributed lock) | Low (lock keys only) | Always — any system with hot keys |
| Cache NULL Values | Low (one-line change) | Low-Medium (depends on attack volume) | Always — trivial to implement, high value |
| Bloom Filter | Medium (build + maintain) | Low (~1.2 bytes per element) | When facing penetration attacks or large key spaces |
| TTL Jitter | Low (one-line change) | None | Always — zero cost, prevents avalanche |
| Cache Warming | Medium (warm-up job) | None (uses existing cache) | When cache restarts are common or hot data is predictable |
| Multi-Layer Cache | High (two cache systems) | Medium (L1 memory per server) | When Redis downtime is unacceptable |
| Circuit Breaker | Medium (library/config) | None | Always — protects DB from cascading failure |
Minimum Viable Protection
✅ Always Do (zero/low cost)
- TTL jitter on all cache entries
- Cache NULL values for non-existent keys
- Input validation on API parameters
- Circuit breaker on database connections
🔧 Add When Needed (medium cost)
- Request coalescing for hot keys
- Bloom filter for large key spaces
- Cache warming on restart
- Multi-layer caching for high availability
🎯 Interview Framework
When discussing caching in an interview, proactively mention failure modes: "I'd add TTL jitter to prevent avalanche, cache NULL values to prevent penetration, and use request coalescing for hot keys to prevent miss storms." This shows you think about failure, not just the happy path.
Interview Questions
These questions test whether you understand how caching fails and how to protect against it.
Q:What is cache penetration and how do you prevent it?
A: Cache penetration occurs when requests are made for data that doesn't exist in the cache OR the database. Every request bypasses the cache and hits the database, which also returns nothing. Prevention: (1) Cache NULL values — store 'not found' in the cache with a short TTL (60-300s). Next request gets a cached NULL instead of hitting the DB. (2) Bloom filter — a probabilistic data structure that can definitively say 'this key does NOT exist.' Check the Bloom filter before the cache/DB. (3) Input validation — reject obviously invalid IDs at the API layer. In production, use all three as defense in depth.
Q:How do you handle cache avalanche?
A: Cache avalanche happens when many cache entries expire simultaneously, flooding the database. The primary solution is TTL randomization (jitter): instead of TTL=3600s, use TTL=3600+random(0,600). This spreads expirations over a 10-minute window. Additional protections: (1) Cache warming — pre-load hot data on restart before accepting traffic. (2) Staggered expiration — add incremental offsets when bulk-loading cache. (3) Circuit breaker — if the DB is overwhelmed, stop sending requests and serve stale data or degraded responses.
Q:What happens if the cache fails completely?
A: If Redis goes down entirely, all traffic hits the database — typically 50-100x the normal load. The database will be overwhelmed within seconds. Protections: (1) Multi-layer cache — an in-process L1 cache (Caffeine/Guava) absorbs the hottest traffic even when Redis is down. (2) Circuit breaker — detect the overload and start rejecting or degrading requests instead of cascading the failure. (3) Rate limiting — limit database queries to its capacity (e.g., 2,000 QPS) and queue or reject the rest. (4) Graceful degradation — serve stale data, show cached pages, or display a 'temporarily unavailable' message instead of crashing.
Your Redis cluster restarts and your database immediately crashes
What went wrong and how do you prevent it?
Answer: This is a cache avalanche caused by total cache loss. When Redis restarted, all cached entries were lost. 100% of traffic hit the database, which was sized for 1-2% of traffic. Prevention: (1) Redis persistence (RDB/AOF) — data survives restarts. (2) Cache warming — a startup job pre-loads hot keys before the server accepts traffic. Use a readiness probe. (3) Circuit breaker — limit concurrent DB queries to prevent overload. (4) Multi-layer cache — L1 in-process cache absorbs hot traffic during Redis downtime.
An attacker sends millions of requests with random user IDs
How do you protect your system?
Answer: This is cache penetration — none of the random IDs exist, so every request bypasses the cache and hits the database. Defense: (1) Input validation — reject IDs that don't match the expected format (UUID, integer range). (2) Rate limiting — limit requests per IP/API key. (3) Bloom filter — loaded with all valid user IDs, blocks 99% of non-existent IDs at zero DB cost. (4) Cache NULL values — for the 1% that pass the Bloom filter, cache the 'not found' result. (5) WAF — detect and block the attack pattern at the edge.
Common Mistakes
These mistakes have caused real production outages at companies of every size.
Ignoring failure scenarios entirely
Teams implement caching for the happy path — cache hit returns fast, cache miss queries the database. They never consider: what if 10,000 requests miss simultaneously? What if the cache server restarts? What if an attacker sends requests for non-existent data? The system works perfectly until it doesn't — and then it fails catastrophically.
✅For every cache you add, ask three questions: What happens on a miss storm (hot key expires)? What happens on penetration (non-existent keys)? What happens on avalanche (mass expiration)? Add TTL jitter, NULL caching, and request coalescing as baseline protections.
Not caching NULL values
The cache only stores data that exists. Requests for non-existent data always hit the database. An attacker discovers this and sends millions of requests for random IDs. The database is overwhelmed by queries that all return empty — and the cache provides zero protection.
✅Always cache negative results. When the database returns 'not found', store cache.set(key, NULL, ttl=300). Use a shorter TTL than positive results so real data can appear. This one-line change blocks the most common penetration attack.
Using fixed TTL everywhere
Every cache entry gets TTL=3600. The cache is warmed at startup. At T+3600, every entry expires simultaneously. The database receives 100x its normal load in one second. The system crashes. This is the textbook cache avalanche.
✅Always add jitter: TTL = base_ttl + random(0, base_ttl * 0.1). For a 1-hour TTL, entries expire between 3600-3960 seconds — spread over 6 minutes. This costs nothing to implement and prevents the most common avalanche scenario.
Not protecting the database from spikes
The cache fails (restart, network issue, avalanche). All traffic hits the database. The database has no protection — it accepts every connection, every query, until it runs out of connections and crashes. The crash causes retries, which make it worse. Recovery takes minutes to hours.
✅Add a circuit breaker between the application and database. When error rates exceed a threshold (e.g., 50% of queries failing), the circuit opens — requests are rejected immediately with a fallback response instead of adding to the overload. The database recovers, the circuit closes, and normal operation resumes.