Cache Miss StormCache PenetrationCache AvalancheThundering HerdBloom FilterTTL Randomization

Failure Modes

Understand how caching systems fail in production — cache miss storms, cache penetration, and cache avalanche. Learn how to design systems that don't collapse under load.

25 min read9 sections

The Big Picture — Why Cache Failures Are Dangerous

Caching isn't just an optimization — in most production systems, it's a critical dependency. Your database is sized to handle 1,000 queries per second. Your cache absorbs 99% of reads, so only 10 QPS reach the database. If the cache fails, 100,000 QPS suddenly hit a database designed for 1,000. The database collapses. The entire system goes down.

This is the fundamental danger: systems are designed around the assumption that the cache works. When it doesn't, the failure cascades through every layer.

🏪

The Popular Store Analogy

Imagine a popular electronics store on Black Friday. The cache is the fast checkout counter at the front — it handles 99% of customers instantly. The database is the warehouse in the back — it can fulfill 10 orders per minute. Cache miss storm: the checkout counter breaks. All 1,000 customers rush to the warehouse. The warehouse staff is overwhelmed, orders pile up, and the entire store grinds to a halt. Cache penetration: customers keep asking for products that don't exist ('Do you have a flying car?'). The checkout counter can't help (it only stocks real products), so every fake request goes to the warehouse. Attackers send thousands of fake requests. Cache avalanche: the store opens at 9 AM and every product's price tag expires at exactly 10 AM. At 10:01, every customer needs a fresh price check from the warehouse — simultaneously.

🔥 The Amplification Effect

A cache with a 99% hit rate means the database handles 1% of traffic. If the cache fails, the database doesn't get 2x traffic — it gets 100x. This amplification is why cache failures cause cascading outages, not gradual degradation.

Cache in the Architecture

In a typical architecture, the cache sits between the application and the database. It absorbs the vast majority of read traffic, protecting the database from being overwhelmed.

👤

Client

Sends request

🖥️

App Server

Checks cache first

⚡

Cache (Redis)

99% hit rate

🗄️

Database

1% of reads

Normal Operation vs Cache Failuretext

Normal operation (cache healthy):
  Incoming traffic:     100,000 requests/sec
  Cache hit rate:       99%
  Cache serves:         99,000 requests/sec  (0.2ms each)
  Database receives:    1,000 requests/sec   (2ms each)
  Database capacity:    5,000 QPS            ✅ Comfortable

Cache failure (any failure mode):
  Incoming traffic:     100,000 requests/sec
  Cache hit rate:       0% (or severely degraded)
  Cache serves:         0 requests/sec
  Database receives:    100,000 requests/sec  ← 100x normal load
  Database capacity:    5,000 QPS             ❌ OVERWHELMED

Result: database connection pool exhausted, queries timeout,
        error rates spike, cascading failure across all services.

🌪️

Cache Miss Storm

A single hot key expires or is evicted. Thousands of concurrent requests for that key all miss the cache simultaneously and hit the database.

🕳️

Cache Penetration

Requests for data that doesn't exist bypass the cache entirely. Every request goes to the database, which also returns nothing — wasted work.

🌊

Cache Avalanche

Many cache entries expire at the same time. The database is suddenly flooded with requests to rebuild all of them simultaneously.

Cache Miss Storms

A cache miss storm (also called a thundering herd or cache stampede) happens when a popular cache entry expires or is evicted, and hundreds or thousands of concurrent requests all try to rebuild it at the same time.

🏭

Everyone Asking the Warehouse for the Same Item

The checkout counter runs out of the most popular item (iPhone). 500 customers all ask the warehouse for it simultaneously. The warehouse only has one forklift — it can fulfill one request at a time. But all 500 requests arrive at once, creating a massive queue. Meanwhile, the forklift fetches the same pallet 500 times instead of once. The fix: one person goes to the warehouse, everyone else waits for that one result.

How It Happens

Cache Miss Storm — Timelinetext

T=0:    Cache entry "product:42" expires (TTL reached)
T=0.001: Request A arrives → cache MISS → queries database
T=0.002: Request B arrives → cache MISS → queries database
T=0.003: Request C arrives → cache MISS → queries database
  ...
T=0.010: 500 requests arrived → ALL query the database
T=0.015: Database receives 500 identical queries simultaneously
T=0.020: Database struggles, response time spikes from 2ms to 500ms
T=0.025: Request A gets result, writes to cache
T=0.026: Requests B-500 also get results (redundant work)
         Cache now has the value again

Problem: 500 identical database queries when 1 would suffice.
At scale: this happens for every hot key, every TTL expiration.

Solutions

Request Coalescing (Single Flight)

When a cache miss occurs, the first request acquires a lock and fetches from the database. All subsequent requests for the same key wait for the first request to complete, then share the result. 500 requests → 1 database query. This is the most effective solution.

Locking with Short TTL

When a cache miss occurs, set a short-lived lock key (e.g., 'lock:product:42' with 5-second TTL). Only the request that acquires the lock queries the database. Others either wait and retry, or serve stale data if available.

Early Refresh (Background Revalidation)

Before the TTL expires, proactively refresh the cache in the background. If TTL is 60 seconds, start refreshing at 50 seconds. The cache never actually expires — it's always warm. Trade-off: slightly stale data during the refresh window.

Cache Hot Data Aggressively

For the top 1% of keys that receive 50% of traffic, use very long TTLs or never-expire policies. Invalidate explicitly when data changes instead of relying on TTL expiration.

Request Coalescing — Pseudocodetext

function getProduct(id):
  value = cache.get("product:" + id)
  if value != null:
    return value                    // Cache HIT — fast path

  // Cache MISS — check if someone is already fetching
  acquired = lock.acquire("fetch:product:" + id, ttl=5s)

  if acquired:
    // I'm the first — fetch from DB
    value = database.query("SELECT * FROM products WHERE id = ?", id)
    cache.set("product:" + id, value, ttl=60s)
    lock.release("fetch:product:" + id)
    return value
  else:
    // Someone else is fetching — wait and retry
    sleep(50ms)
    return getProduct(id)           // Retry — cache should be populated now

🎯 Interview Insight

Cache miss storms are one of the most commonly asked caching questions. Always mention request coalescing (single flight) as the primary solution. Explain: "The first request fetches from the database, all others wait for that result. This turns N database queries into 1."

Cache Penetration

Cache penetration happens when requests are made for data that doesn't exist — not in the cache AND not in the database. Every request bypasses the cache (nothing to cache) and hits the database (which returns nothing). The cache provides zero protection because there's no data to cache.

👻

Asking for Products That Don't Exist

Customers keep asking: 'Do you have a flying car?' The checkout counter (cache) doesn't have it. The warehouse (database) doesn't have it either. But every single request still requires a warehouse trip to confirm it doesn't exist. An attacker sends 100,000 requests for random non-existent product IDs. Every request penetrates through the cache and hits the database. The cache is useless because there's nothing to cache.

How It Happens

Cache Penetration — The Problemtext

Normal request for existing data:
  GET /api/users/42
  → Cache: MISS → DB: found → Cache: store → Return user
  → Next request: Cache HIT ✅ (cache protects DB)

Penetration request for non-existent data:
  GET /api/users/99999999
  → Cache: MISS → DB: not found → Cache: nothing to store → Return 404
  → Next request: Cache MISS again → DB again → not found again
  → Every request hits the database ❌

Attack scenario:
  Attacker sends 50,000 requests/sec with random user IDs
  → user/a8f3b2c1, user/x7y9z0, user/random123...
  → None exist in cache OR database
  → All 50,000 requests hit the database every second
  → Database overwhelmed by queries that all return empty

Solutions

Cache Null Responses

When the database returns 'not found', cache that result too: cache.set('user:99999999', NULL, ttl=60s). Next request for the same ID gets a cached NULL — no database query. Use a short TTL (60-300 seconds) so real data can appear later.

Bloom Filters

A Bloom filter is a space-efficient data structure that can tell you 'definitely not in the set' or 'probably in the set'. Before checking the cache or database, check the Bloom filter: 'Does user 99999999 exist?' If the Bloom filter says NO, return 404 immediately — no cache or DB lookup needed.

Input Validation

Validate request parameters before they reach the cache layer. If user IDs are UUIDs, reject anything that isn't a valid UUID format. If product IDs are integers 1-10M, reject IDs outside that range. This blocks most attack traffic at the edge.

Solution	How It Works	Pros	Cons
Cache NULL	Store 'not found' in cache with short TTL	Simple, effective for repeated IDs	Wastes cache memory if attacker uses random IDs
Bloom Filter	Probabilistic check: 'does this key exist?'	Blocks non-existent keys with zero DB load	False positives possible, needs to be updated on writes
Input Validation	Reject invalid formats at the API layer	Zero overhead, blocks malformed requests	Doesn't help if attacker uses valid-looking IDs

Bloom Filter — How It Protects the Databasetext

Setup (on startup or periodically):
  bloom_filter = new BloomFilter(expected_items=10M, false_positive_rate=0.01)
  for each user_id in database:
    bloom_filter.add(user_id)

Request flow:
  GET /api/users/99999999

  Step 1: bloom_filter.contains(99999999)?
          → NO → Return 404 immediately (no cache or DB hit)

  GET /api/users/42

  Step 1: bloom_filter.contains(42)?
          → YES (probably exists) → Check cache → Check DB

False positive rate: ~1%
  → 1% of non-existent IDs will still reach the DB
  → 99% are blocked at the Bloom filter — massive protection

🎯 Interview Insight

Cache penetration is a favorite interview topic because it combines caching, security, and data structures. Mention all three solutions: cache NULL values (simple), Bloom filters (elegant), and input validation (practical). Explain that in production, you'd use all three as defense in depth.

Cache Avalanche

A cache avalanche happens when a large number of cache entries expire at the same time, causing a sudden flood of requests to the database. Unlike a miss storm (one hot key), an avalanche involves thousands of different keys all expiring simultaneously.

🏬

All Price Tags Expire at Once

The store updates all price tags at 9 AM every day. At 9:01 AM, every single price tag is expired. Every customer who picks up a product needs a fresh price check from the warehouse — simultaneously. 10,000 products × 100 customers = 1,000,000 warehouse requests in one minute. The warehouse collapses. The fix: stagger the price tag updates — some expire at 9:00, some at 9:05, some at 9:10. The load spreads out over time.

How It Happens

Cache Avalanche — Timelinetext

Scenario: cache warmed at startup with TTL=3600s (1 hour)

T=0:      System starts, cache is populated
          10,000 product entries cached with TTL=3600s
          All entries expire at T=3600

T=3599:   Cache is healthy, 99% hit rate
          Database handles 100 QPS comfortably

T=3600:   ALL 10,000 entries expire simultaneously
          Next second: 10,000 cache misses
          All 10,000 requests hit the database
          Database capacity: 5,000 QPS → OVERWHELMED

T=3601:   Database connection pool exhausted
          Queries timeout, errors cascade
          App servers retry → even more DB load
          System enters death spiral

Common causes:
  → Cache warmed at startup (all same TTL)
  → Batch cache refresh (all keys refreshed together)
  → Cache server restart (all keys lost at once)
  → Synchronized TTLs (all set to round numbers like 60s, 300s)

Solutions

TTL Randomization (Jitter)

Instead of TTL=3600s for every key, use TTL=3600 + random(0, 600). Keys expire between 3600-4200 seconds — spread over a 10-minute window instead of all at once. This is the simplest and most effective solution.

Staggered Expiration

When warming the cache, add incremental offsets: key 1 gets TTL=3600, key 2 gets TTL=3601, key 3 gets TTL=3602, etc. 10,000 keys expire over ~3 hours instead of all at once.

Cache Warming on Restart

When a cache server restarts, don't let all traffic hit the database. Pre-warm the cache by loading hot data from the database before accepting traffic. Use a readiness probe — the server isn't 'ready' until the cache is warm.

Multi-Layer Caching

Use two cache layers: L1 (local in-process cache, small) and L2 (Redis, large). If L2 fails, L1 absorbs some traffic. If both fail, rate-limit database queries to prevent overload.

Circuit Breaker on Database

If the database is overwhelmed, stop sending it more requests. A circuit breaker detects high error rates and returns cached stale data or a degraded response instead of adding to the overload.

TTL Randomization — The Fixtext

❌ Without jitter (all expire together):
  cache.set("product:1", data, ttl=3600)
  cache.set("product:2", data, ttl=3600)
  cache.set("product:3", data, ttl=3600)
  // All 10,000 keys expire at T+3600 → avalanche

✅ With jitter (spread over time):
  cache.set("product:1", data, ttl=3600 + random(0, 600))  // 3600-4200s
  cache.set("product:2", data, ttl=3600 + random(0, 600))  // 3600-4200s
  cache.set("product:3", data, ttl=3600 + random(0, 600))  // 3600-4200s
  // 10,000 keys expire over a 10-minute window
  // ~17 keys expire per second instead of 10,000 at once

🎯 Interview Insight

TTL randomization is the answer interviewers are looking for. It's simple, effective, and shows you understand the root cause (synchronized expiration). Always mention it first, then add cache warming and circuit breakers as defense in depth.

End-to-End Scenario

Let's design a high-scale product page system and show how each failure mode can occur and how to mitigate them.

System: E-Commerce Product Page (50K requests/sec)

Architecture & Traffictext

Traffic: 50,000 product page views per second
Products: 2 million products in database
Hot products: top 1,000 products = 80% of traffic (40K req/s)
Cache: Redis cluster, 99.5% hit rate
Database: PostgreSQL, capacity 2,000 QPS

Normal state:
  50,000 req/s × 0.5% miss rate = 250 QPS to database ✅

Failure mode 1 — Miss Storm:
  iPhone 16 page cache expires. 5,000 concurrent requests.
  All 5,000 query PostgreSQL for the same product.
  DB goes from 250 QPS to 5,250 QPS → overwhelmed.

Failure mode 2 — Penetration:
  Bot sends 10,000 req/s for random product IDs (product/-1, product/abc).
  None exist in cache or DB. All 10,000 hit the database.
  DB goes from 250 QPS to 10,250 QPS → overwhelmed.

Failure mode 3 — Avalanche:
  Redis node restarts. 500,000 cached entries lost.
  Next minute: 50,000 req/s × 100% miss rate = 50,000 QPS to DB.
  DB capacity: 2,000 QPS → catastrophic failure.

The Protected Design

Request Coalescing (prevents miss storms)

Use a single-flight pattern: when product:42 cache misses, the first request acquires a lock and fetches from DB. All other requests for product:42 wait for that result. 5,000 concurrent requests → 1 DB query.

Bloom Filter + NULL Caching (prevents penetration)

A Bloom filter loaded with all valid product IDs blocks 99% of non-existent ID requests at the edge. For the 1% that pass (false positives), cache the NULL result with a 5-minute TTL. Input validation rejects non-integer IDs.

TTL Jitter + Cache Warming (prevents avalanche)

All cache entries use TTL = 3600 + random(0, 600). On Redis restart, a warm-up job pre-loads the top 1,000 hot products before the server accepts traffic. A circuit breaker on the DB connection limits concurrent queries to 1,500.

Multi-Layer Cache (defense in depth)

L1: in-process cache (Caffeine/Guava) with 1,000 entries, 30-second TTL. L2: Redis cluster. If Redis is down, L1 absorbs the hottest 80% of traffic. The database only sees the long-tail cold requests.

💡 This Is Production-Grade Caching

Every large-scale system (Amazon, Netflix, Twitter) uses all of these protections. It's not about picking one — it's about layering them. Request coalescing + Bloom filters + TTL jitter + multi-layer caching + circuit breakers = a system that survives cache failures gracefully.

Trade-offs & Decision Making

Every protection mechanism adds complexity. The skill is knowing which protections are worth the cost for your system.

Protection	Complexity	Memory Cost	When Worth It
Request Coalescing	Medium (distributed lock)	Low (lock keys only)	Always — any system with hot keys
Cache NULL Values	Low (one-line change)	Low-Medium (depends on attack volume)	Always — trivial to implement, high value
Bloom Filter	Medium (build + maintain)	Low (~1.2 bytes per element)	When facing penetration attacks or large key spaces
TTL Jitter	Low (one-line change)	None	Always — zero cost, prevents avalanche
Cache Warming	Medium (warm-up job)	None (uses existing cache)	When cache restarts are common or hot data is predictable
Multi-Layer Cache	High (two cache systems)	Medium (L1 memory per server)	When Redis downtime is unacceptable
Circuit Breaker	Medium (library/config)	None	Always — protects DB from cascading failure

Minimum Viable Protection

✅ Always Do (zero/low cost)

TTL jitter on all cache entries
Cache NULL values for non-existent keys
Input validation on API parameters
Circuit breaker on database connections

🔧 Add When Needed (medium cost)

Request coalescing for hot keys
Bloom filter for large key spaces
Cache warming on restart
Multi-layer caching for high availability

🎯 Interview Framework

When discussing caching in an interview, proactively mention failure modes: "I'd add TTL jitter to prevent avalanche, cache NULL values to prevent penetration, and use request coalescing for hot keys to prevent miss storms." This shows you think about failure, not just the happy path.

Interview Questions

These questions test whether you understand how caching fails and how to protect against it.

Q:What is cache penetration and how do you prevent it?

A: Cache penetration occurs when requests are made for data that doesn't exist in the cache OR the database. Every request bypasses the cache and hits the database, which also returns nothing. Prevention: (1) Cache NULL values — store 'not found' in the cache with a short TTL (60-300s). Next request gets a cached NULL instead of hitting the DB. (2) Bloom filter — a probabilistic data structure that can definitively say 'this key does NOT exist.' Check the Bloom filter before the cache/DB. (3) Input validation — reject obviously invalid IDs at the API layer. In production, use all three as defense in depth.

Q:How do you handle cache avalanche?

A: Cache avalanche happens when many cache entries expire simultaneously, flooding the database. The primary solution is TTL randomization (jitter): instead of TTL=3600s, use TTL=3600+random(0,600). This spreads expirations over a 10-minute window. Additional protections: (1) Cache warming — pre-load hot data on restart before accepting traffic. (2) Staggered expiration — add incremental offsets when bulk-loading cache. (3) Circuit breaker — if the DB is overwhelmed, stop sending requests and serve stale data or degraded responses.

Q:What happens if the cache fails completely?

A: If Redis goes down entirely, all traffic hits the database — typically 50-100x the normal load. The database will be overwhelmed within seconds. Protections: (1) Multi-layer cache — an in-process L1 cache (Caffeine/Guava) absorbs the hottest traffic even when Redis is down. (2) Circuit breaker — detect the overload and start rejecting or degrading requests instead of cascading the failure. (3) Rate limiting — limit database queries to its capacity (e.g., 2,000 QPS) and queue or reject the rest. (4) Graceful degradation — serve stale data, show cached pages, or display a 'temporarily unavailable' message instead of crashing.

Your Redis cluster restarts and your database immediately crashes

What went wrong and how do you prevent it?

Answer: This is a cache avalanche caused by total cache loss. When Redis restarted, all cached entries were lost. 100% of traffic hit the database, which was sized for 1-2% of traffic. Prevention: (1) Redis persistence (RDB/AOF) — data survives restarts. (2) Cache warming — a startup job pre-loads hot keys before the server accepts traffic. Use a readiness probe. (3) Circuit breaker — limit concurrent DB queries to prevent overload. (4) Multi-layer cache — L1 in-process cache absorbs hot traffic during Redis downtime.

An attacker sends millions of requests with random user IDs

How do you protect your system?

Answer: This is cache penetration — none of the random IDs exist, so every request bypasses the cache and hits the database. Defense: (1) Input validation — reject IDs that don't match the expected format (UUID, integer range). (2) Rate limiting — limit requests per IP/API key. (3) Bloom filter — loaded with all valid user IDs, blocks 99% of non-existent IDs at zero DB cost. (4) Cache NULL values — for the 1% that pass the Bloom filter, cache the 'not found' result. (5) WAF — detect and block the attack pattern at the edge.

Common Mistakes

These mistakes have caused real production outages at companies of every size.

🙈

Ignoring failure scenarios entirely

Teams implement caching for the happy path — cache hit returns fast, cache miss queries the database. They never consider: what if 10,000 requests miss simultaneously? What if the cache server restarts? What if an attacker sends requests for non-existent data? The system works perfectly until it doesn't — and then it fails catastrophically.

✅For every cache you add, ask three questions: What happens on a miss storm (hot key expires)? What happens on penetration (non-existent keys)? What happens on avalanche (mass expiration)? Add TTL jitter, NULL caching, and request coalescing as baseline protections.

🚫

Not caching NULL values

The cache only stores data that exists. Requests for non-existent data always hit the database. An attacker discovers this and sends millions of requests for random IDs. The database is overwhelmed by queries that all return empty — and the cache provides zero protection.

✅Always cache negative results. When the database returns 'not found', store cache.set(key, NULL, ttl=300). Use a shorter TTL than positive results so real data can appear. This one-line change blocks the most common penetration attack.

⏰

Using fixed TTL everywhere

Every cache entry gets TTL=3600. The cache is warmed at startup. At T+3600, every entry expires simultaneously. The database receives 100x its normal load in one second. The system crashes. This is the textbook cache avalanche.

✅Always add jitter: TTL = base_ttl + random(0, base_ttl * 0.1). For a 1-hour TTL, entries expire between 3600-3960 seconds — spread over 6 minutes. This costs nothing to implement and prevents the most common avalanche scenario.

🛡️

Not protecting the database from spikes

The cache fails (restart, network issue, avalanche). All traffic hits the database. The database has no protection — it accepts every connection, every query, until it runs out of connections and crashes. The crash causes retries, which make it worse. Recovery takes minutes to hours.

✅Add a circuit breaker between the application and database. When error rates exceed a threshold (e.g., 50% of queries failing), the circuit opens — requests are rejected immediately with a fallback response instead of adding to the overload. The database recovers, the circuit closes, and normal operation resumes.

Failure Modes

Table of Contents

The Big Picture — Why Cache Failures Are Dangerous

The Popular Store Analogy

Cache in the Architecture

Cache Miss Storm

Cache Penetration

Cache Avalanche

Cache Miss Storms

Everyone Asking the Warehouse for the Same Item

How It Happens

Solutions

Request Coalescing (Single Flight)

Locking with Short TTL

Early Refresh (Background Revalidation)

Cache Hot Data Aggressively

Cache Penetration

Asking for Products That Don't Exist

How It Happens

Solutions

Cache Null Responses

Bloom Filters

Input Validation

Cache Avalanche

All Price Tags Expire at Once

How It Happens

Solutions

TTL Randomization (Jitter)

Staggered Expiration

Cache Warming on Restart

Multi-Layer Caching

Circuit Breaker on Database

End-to-End Scenario

System: E-Commerce Product Page (50K requests/sec)

The Protected Design

Request Coalescing (prevents miss storms)

Bloom Filter + NULL Caching (prevents penetration)

TTL Jitter + Cache Warming (prevents avalanche)

Multi-Layer Cache (defense in depth)

Trade-offs & Decision Making

Minimum Viable Protection

✅ Always Do (zero/low cost)

🔧 Add When Needed (medium cost)

Interview Questions

Q:What is cache penetration and how do you prevent it?

Q:How do you handle cache avalanche?

Q:What happens if the cache fails completely?

Your Redis cluster restarts and your database immediately crashes

An attacker sends millions of requests with random user IDs

Common Mistakes

Ignoring failure scenarios entirely

Not caching NULL values

Using fixed TTL everywhere

Not protecting the database from spikes