Eviction & Invalidation
Master cache eviction policies, invalidation strategies, cache stampede prevention, and cache warming — the fundamentals of keeping caches fast and correct.
Table of Contents
The Big Picture — Why Caches Can't Store Everything
A cache is fast because it lives in memory (RAM). But memory is limited and expensive. A Redis instance with 64 GB of RAM can't hold a 2 TB database. So the cache must decide: what stays, what goes, and when does stored data become too old to trust?
The Fridge Analogy
Your fridge has limited shelf space (memory). You can't store every item from the grocery store. Eviction is deciding what to throw out when the fridge is full — you toss the item you haven't touched in weeks (LRU), or the one you rarely use (LFU), or the one past its expiry date (TTL). Invalidation is checking if something is still good — the milk says 'best before Dec 10' but it smells fine (TTL-based), or you know the store recalled that batch so you throw it out immediately (event-based invalidation). Keeping expired food is dangerous (stale data). Throwing out everything daily is wasteful (over-invalidation). The skill is finding the balance.
🔥 Key Insight
Eviction answers: "The cache is full — what do I remove to make room?" Invalidation answers: "This data might be wrong — should I remove or refresh it?" Both are essential. A cache without eviction runs out of memory. A cache without invalidation serves stale data.
Cache Lifecycle Overview
1. ADD → Data is written to cache (on miss or on write) 2. USE → Data is read from cache (cache hit) 3. UPDATE → Data is refreshed (invalidation + re-fetch) 4. EVICT → Data is removed (memory full, TTL expired, or invalidated) ┌──────────────────────────────┐ │ CACHE MEMORY │ │ │ ADD ──────→ │ key:value (TTL: 300s) │ ──────→ EVICT │ key:value (TTL: 60s) │ (LRU / LFU / TTL) USE ←────── │ key:value (TTL: 600s) │ │ │ └──────────────────────────────┘ ↑ INVALIDATE (data changed in DB)
Eviction
Cache is full → remove something. The eviction policy decides WHAT to remove. LRU removes the oldest-accessed. LFU removes the least-used. TTL removes expired entries.
Invalidation
Data changed in the database → the cached copy is now wrong. Invalidation removes or refreshes the stale entry. Can be triggered by writes, events, or time.
Expiration (TTL)
Each entry has a time-to-live. After TTL seconds, the entry is considered stale and eligible for removal. A safety net that limits how long stale data can persist.
💡 The Freshness vs Performance Trade-off
Aggressive invalidation = fresh data, more cache misses, higher DB load. Lazy invalidation (long TTL) = stale data, fewer cache misses, lower DB load. Every caching decision is a point on this spectrum.
Eviction Policies
When the cache is full and a new entry needs space, the eviction policy decides which existing entry to remove. The right policy keeps hot data in cache and removes cold data.
LRU — Least Recently Used
The Desk Stack
You have a stack of books on your desk. Every time you use a book, you put it on top. When the desk is full and you need space, you remove the book at the bottom — the one you haven't touched in the longest time. That's LRU: the item that was accessed least recently gets evicted first.
Cache capacity: 4 items Access sequence: A, B, C, D, E, A, F State after each access: A → [A] B → [B, A] C → [C, B, A] D → [D, C, B, A] ← cache full E → [E, D, C, B] ← A evicted (least recently used) A → [A, E, D, C] ← B evicted, A moves to front F → [F, A, E, D] ← C evicted Key insight: recently accessed items stay, old items go. Redis uses an approximated LRU (samples random keys, evicts the oldest).
LFU — Least Frequently Used
The Popularity Contest
Instead of tracking WHEN you last used something, LFU tracks HOW MANY TIMES you've used it. The item with the fewest total accesses gets evicted. A book you read once last week gets evicted before a book you've read 50 times — even if the 50-time book hasn't been touched today.
Cache capacity: 3 items Access sequence: A, A, A, B, B, C, D State (item: access count): A(1) → A(2) → A(3) → A(3),B(1) → A(3),B(2) → A(3),B(2),C(1) ← full D arrives → evict C (count=1, lowest) → A(3),B(2),D(1) Key insight: frequently accessed items stay, rarely used items go. Problem: a once-popular item that's no longer relevant stays forever (high count but no recent accesses). Fix: decay counters over time, or use a hybrid LRU-LFU approach.
TTL — Time-To-Live
TTL isn't strictly an eviction policy — it's an expiration mechanism. Each entry has a timestamp. After TTL seconds, the entry is considered expired and removed (either lazily on next access or actively by a background sweep).
SET user:42 = {name: "Alice"} TTL 300 (expires in 5 minutes) T=0s: GET user:42 → {name: "Alice"} ✅ (fresh) T=200s: GET user:42 → {name: "Alice"} ✅ (still valid) T=301s: GET user:42 → null ❌ (expired, cache miss) → Fetch from DB, re-cache with new TTL TTL is a safety net: Even if invalidation fails, data won't be stale for more than TTL seconds. Short TTL (10s) = fresher data, more DB hits Long TTL (1hr) = staler data, fewer DB hits
| Policy | Evicts Based On | Best For | Weakness |
|---|---|---|---|
| LRU | Least recently accessed | General purpose, most workloads | Scan pollution (one-time access pushes out hot data) |
| LFU | Least frequently accessed | Stable hot sets (popular products) | Stale popular items never evicted; cold start problem |
| TTL | Time since creation/last refresh | Data with known freshness requirements | Doesn't consider access patterns; all entries treated equally |
| Random | Random selection | When access patterns are uniform | Might evict hot data |
🎯 Interview Insight — Why LRU Is Most Common
LRU is the default in Redis, Memcached, and most caching systems. It works well for the majority of workloads because recent access is a strong predictor of future access. LFU is better when you have a stable set of popular items (top 100 products), but it's more complex and has the stale-popularity problem. In interviews, default to LRU unless there's a specific reason for LFU.
Cache Invalidation Strategies
"There are only two hard things in Computer Science: cache invalidation and naming things." — Phil Karlton. Cache invalidation is hard because you must keep two data stores (cache and database) in sync, and any gap means users see wrong data.
Strategy 1: TTL-Based (Time Expiration)
How it works
- ✅Every cache entry has a TTL (e.g., 5 minutes)
- ✅After TTL expires, entry is removed
- ✅Next read triggers a fresh fetch from DB
- ✅Simplest approach — no coordination needed
Trade-offs
- ❌Data is stale for up to TTL duration
- ❌Short TTL = more DB load, fresher data
- ❌Long TTL = less DB load, staler data
- ❌Doesn't react to actual data changes
Strategy 2: Write-Through Invalidation
User updates profile name: 1. App writes to DB: UPDATE users SET name='Alicia' WHERE id=42 2. App writes to cache: SET user:42 = {name: "Alicia"} (or: DELETE user:42 to force re-fetch on next read) Result: cache is always in sync with DB Trade-off: every write is slower (must update both) Risk: if step 2 fails, cache has stale data → Use "delete" instead of "update" — safer (forces re-fetch)
Strategy 3: Cache-Aside Invalidation (Delete on Write)
function updateUser(userId, data) { // 1. Write to database db.query("UPDATE users SET name = ? WHERE id = ?", data.name, userId); // 2. Delete from cache (NOT update) cache.delete("user:" + userId); // Next read will miss → fetch fresh data from DB → re-cache } Why DELETE instead of UPDATE? → If the DB write succeeds but cache update fails, cache has stale data → If you DELETE and it fails, worst case: cache still has old data (TTL safety net) → DELETE is idempotent and safer than UPDATE
Strategy 4: Event-Based Invalidation
Architecture: App Server A writes to DB App Server A publishes event: "user:42:updated" Redis Pub/Sub (or Kafka) broadcasts to all app servers All app servers delete "user:42" from their local cache Flow: DB write → publish event → all caches invalidated Benefits: ✅ Works across multiple app servers / cache instances ✅ Near real-time invalidation ✅ Decoupled — writers don't need to know about caches Trade-offs: ⚠️ More infrastructure (message broker) ⚠️ Events can be delayed or lost ⚠️ TTL still needed as a safety net
| Strategy | Freshness | Complexity | DB Load | Best For |
|---|---|---|---|---|
| TTL only | Stale up to TTL | Very low | Moderate | Data where brief staleness is OK |
| Delete on write | Fresh after next read | Low | Low (only on miss) | Most CRUD applications |
| Write-through | Always fresh | Medium | Low | Consistency-critical data |
| Event-based | Near real-time | High | Low | Multi-server, distributed caches |
🎯 Interview Insight
The safest default: delete-on-write + TTL as safety net. When data changes, delete the cache key. If the delete fails, TTL ensures the stale entry expires eventually. This covers 90% of use cases with minimal complexity.
Cache Stampede
A cache stampede (also called thundering herd) happens when a popular cache entry expires and hundreds or thousands of requests simultaneously hit the database to regenerate it. The database gets overwhelmed, response times spike, and the system can crash.
The Shop Opening Analogy
A popular store opens at 9 AM. At 8:59, 500 people are waiting outside. The moment the doors open, everyone rushes in at once — the entrance is jammed, shelves are overwhelmed, staff can't cope. That's a cache stampede. The 'store' is your database, the 'doors opening' is the cache entry expiring, and the '500 people' are concurrent requests that all miss the cache at the same instant.
Popular cache key: "homepage:trending" (TTL: 60 seconds) Traffic: 10,000 requests/second for this key Normal operation: 9,999 requests → cache HIT (1ms) 1 request → cache MISS → fetch from DB → re-cache At TTL expiration (T=60s): Cache key expires Next 100 requests (within 50ms) ALL miss the cache All 100 hit the database simultaneously DB query takes 200ms (expensive aggregation) DB is now handling 100 identical queries at once DB CPU spikes to 100%, other queries slow down Cascading failure: more timeouts → more retries → more load ┌─────────┐ │ Cache │ key expired! └────┬─────┘ │ MISS × 100 ▼ ┌─────────┐ │ Database │ 💥 100 identical queries at once └─────────┘
Solutions
Request Coalescing (Single Flight)
When multiple requests miss the same key simultaneously, only ONE request fetches from the database. All other requests wait for that single fetch to complete and share the result. 100 concurrent misses → 1 DB query instead of 100. Libraries: Go's singleflight, Node's async-sema.
Locking (Mutex on Cache Key)
When a cache miss occurs, the first request acquires a lock on the key, fetches from DB, and populates the cache. Other requests for the same key wait for the lock to release, then read from cache. Prevents duplicate DB queries. Risk: if the lock holder crashes, others wait forever — use a lock with a timeout.
Staggered Expiration (Jitter)
Instead of all keys expiring at exactly TTL seconds, add random jitter: TTL = 300 + random(0, 60). Keys expire at different times, spreading the DB load. Simple and effective for preventing synchronized expiration of many keys.
Early Refresh (Stale-While-Revalidate)
Before the TTL expires, a background process refreshes the cache. The key never actually expires — it's always fresh. Requests always hit the cache. The refresh happens asynchronously. Most sophisticated but eliminates stampedes entirely.
| Solution | Complexity | Effectiveness | Trade-off |
|---|---|---|---|
| Request coalescing | Low | High | Requires in-process coordination |
| Locking | Medium | High | Lock contention, timeout handling |
| Staggered TTL | Very low | Medium | Doesn't prevent single-key stampede |
| Early refresh | High | Very high | Background process, extra DB reads |
⚠️ This Is a Real Production Problem
Cache stampedes have caused outages at major companies. The fix is usually request coalescing (simplest, most effective). Add staggered TTL as a baseline for all keys. Use early refresh for your hottest keys (homepage, trending content).
Cache Warming
Cache warming is the process of preloading frequently accessed data into the cache before users request it. Instead of waiting for the first request to trigger a cache miss (cold start), you fill the cache proactively.
When Cache Warming Matters
After Deployment
A new server starts with an empty cache. The first 1000 requests all miss → DB is hammered. Warming preloads hot data before the server takes traffic.
After Cache Failure
Redis crashes and restarts with empty memory. Suddenly 100% of requests hit the DB. Warming restores the hot dataset quickly.
Predictable Traffic
You know the homepage, top products, and trending content will be requested. Preload them at startup or on a schedule instead of waiting for misses.
Approach 1: Startup Warming On server start, before accepting traffic: → Fetch top 1000 products from DB → cache them → Fetch homepage data → cache it → Fetch user session data for active users → cache it Then: start accepting traffic (cache is warm) Approach 2: Scheduled Warming Every 5 minutes, a background job: → Queries DB for most-accessed keys (from access logs) → Preloads them into cache → Refreshes keys that are about to expire Approach 3: Peer Warming New server joins the cluster: → Copies hot keys from an existing server's cache → No DB queries needed → Fastest approach but requires cache-to-cache transfer
Benefits
- ✅Eliminates cold-start latency for users
- ✅Prevents DB overload after deployment or cache failure
- ✅Predictable performance from the first request
- ✅Reduces cache stampede risk during startup
Trade-offs
- ❌Extra computation and DB load during warming
- ❌Might preload data that's never actually requested
- ❌Delays server startup (if warming is synchronous)
- ❌Stale data if warming happens too far in advance
💡 Practical Tip
Warm the top 1-5% of keys that generate 80% of traffic. Don't try to warm the entire dataset — that defeats the purpose of caching. Use access logs or analytics to identify the hot set.
End-to-End Scenario
Let's design the eviction, invalidation, and stampede prevention for a social media feed — one of the most cache-intensive systems.
📱 Social Media Feed — 50M DAU
Hot data: trending posts, popular user profiles, feed cache per user.
Cache size: 128 GB Redis cluster. Total data: 5 TB in PostgreSQL.
Read QPS: 200,000. Write QPS: 5,000 (new posts, likes).
Eviction Policy: LRU
128 GB can't hold all 50M users' feeds. LRU ensures active users' feeds stay cached. A user who hasn't opened the app in 3 days gets evicted. When they return, their feed is rebuilt from DB (cache miss). LRU is the right choice because recent access strongly predicts future access in social media.
Invalidation: Delete-on-Write + TTL Safety Net
When a user posts, delete their followers' feed cache keys. Each follower's next feed request rebuilds from DB with the new post included. TTL of 5 minutes as a safety net — even if invalidation fails, feeds are never more than 5 minutes stale.
Stampede Prevention: Request Coalescing + Early Refresh
Trending posts are requested 10,000 times/sec. Use request coalescing: if the key expires, only 1 request fetches from DB while others wait. For the top 100 trending posts, use early refresh: a background job refreshes them 30 seconds before TTL expires. These keys never actually expire.
Cache Warming: Startup + Scheduled
On Redis restart: preload the top 10,000 trending posts and the feeds of the 100,000 most active users. Background job every minute: refresh trending content and soon-to-expire hot keys. This prevents cold-start latency and keeps the hottest data always fresh.
Redis Cluster (128 GB, LRU eviction): ├── feed:{userId} → user's feed (TTL 5min, invalidate on new post) ├── trending:posts → top posts (early refresh, never expires) ├── user:{userId}:profile → user profile (TTL 10min, delete on update) └── post:{postId} → post content (TTL 30min) Invalidation: New post → delete feed:{followerId} for all followers Profile update → delete user:{userId}:profile Safety net: TTL on every key Stampede prevention: All keys: request coalescing (singleflight) Top 100 trending: early refresh (background job) All TTLs: jitter ±10% (staggered expiration) Warming: On startup: preload trending + top 100K user feeds Every minute: refresh trending, refresh expiring hot keys
Trade-offs & Decision Making
Freshness vs Performance
| Approach | Freshness | Performance | Complexity |
|---|---|---|---|
| No cache | Always fresh | Slowest (every read hits DB) | None |
| Long TTL (1hr) | Stale up to 1 hour | Fastest (99% hit rate) | Low |
| Short TTL (10s) | Stale up to 10 seconds | Fast (70-80% hit rate) | Low |
| Delete on write | Fresh after next read | Fast (high hit rate) | Medium |
| Event-based invalidation | Near real-time | Fast | High |
| Write-through | Always fresh | Fast reads, slow writes | Medium |
Memory Usage vs Hit Rate
| Cache Size | Hit Rate | DB Load | Cost |
|---|---|---|---|
| 10% of dataset | ~70-80% | 20-30% of original | Low |
| 20% of dataset | ~90-95% | 5-10% of original | Medium |
| 50% of dataset | ~98-99% | 1-2% of original | High |
| 100% of dataset | ~100% | Near zero | Very high (defeats purpose) |
🎯 The Sweet Spot
Cache 10-20% of your dataset for 90%+ hit rate. Beyond that, you're paying exponentially more memory for diminishing returns. Use LRU to automatically keep the right 10-20% — the data that's actually being accessed.
Interview Questions
Q:Why is cache invalidation hard?
A: Because you have two sources of truth (cache and database) that must stay in sync. Any write to the database makes the cache stale. If you invalidate too aggressively, you lose the performance benefit (too many misses). If you invalidate too lazily, users see wrong data. Race conditions make it worse: what if a read re-caches old data between a DB write and cache delete? There's no perfect solution — only trade-offs between freshness, performance, and complexity.
Q:What is a cache stampede and how do you prevent it?
A: A cache stampede occurs when a popular cache key expires and many concurrent requests all miss the cache simultaneously, overwhelming the database with identical queries. Prevention: (1) Request coalescing — only one request fetches from DB, others wait and share the result. (2) Locking — first request acquires a lock, others wait. (3) Staggered TTL — add random jitter so keys don't expire at the same time. (4) Early refresh — refresh keys before they expire so they never actually go stale.
Q:LRU vs LFU — when would you use each?
A: LRU (Least Recently Used): default choice for most workloads. Works well when recent access predicts future access — social feeds, user sessions, web pages. Simple, low overhead. LFU (Least Frequently Used): better when you have a stable set of popular items — top 100 products, trending content. Items that are accessed thousands of times stay cached even if not accessed in the last minute. Trade-off: LFU has a 'stale popularity' problem — once-popular items with high counts never get evicted. Use LFU with decay or a hybrid approach.
After deploying a new Redis instance, your API latency spikes for 10 minutes
What's happening and how do you fix it?
Answer: Cold cache. The new Redis instance has no data — every request is a cache miss hitting the database. Fix: cache warming. Before routing traffic to the new instance, preload the hot dataset (top products, active user sessions, trending content). Use access logs to identify the top 1-5% of keys. Warm them synchronously before the instance takes traffic. Also: use staggered rollouts — don't replace all cache instances at once.
Your cache hit rate dropped from 95% to 60% overnight
What could cause this and how do you investigate?
Answer: Possible causes: (1) Cache size too small — data grew, LRU is evicting hot keys. Check eviction metrics. Fix: increase cache size. (2) TTL too short — keys expire before being re-accessed. Check TTL distribution. (3) Access pattern changed — a new feature or traffic spike introduced new keys that displaced hot data. (4) Cache warming stopped — a background job failed. Check job logs. Investigate: check Redis INFO stats (evicted_keys, hit_rate, memory usage), compare with the previous day.
Common Mistakes
Ignoring invalidation entirely
Setting a TTL and assuming the cache will 'figure it out.' A user changes their email, but the old email shows for 30 minutes because the cache wasn't invalidated. For some data (prices, inventory), this causes real business problems.
✅Use delete-on-write as the primary invalidation mechanism. TTL is a safety net, not the strategy. When data changes in the DB, explicitly delete the cache key. The next read will fetch fresh data.
Using only TTL blindly
Setting the same TTL (e.g., 5 minutes) for all cache keys regardless of the data's characteristics. Product descriptions (change monthly) and inventory counts (change every second) should not have the same TTL.
✅Match TTL to the data's change frequency and staleness tolerance. Static content: hours. Product details: 10-30 minutes. Inventory: 5-10 seconds. User sessions: match session timeout. Trending content: 1-2 minutes.
Not handling cache stampede
A popular key expires and 1,000 requests hit the database simultaneously. The DB query takes 500ms, so for 500ms you have 1,000 concurrent identical queries. DB CPU spikes, other queries slow down, cascading failure begins.
✅Implement request coalescing as a baseline for all cache reads. Add staggered TTL (jitter) to prevent synchronized expiration. For your hottest keys (homepage, trending), use early refresh so they never expire.
Overloading cache with unnecessary data
Caching every DB query result, including data accessed once per day. The cache fills with cold data, evicting hot data. Hit rate drops. You're paying for 128 GB of Redis that's 80% full of data nobody reads.
✅Cache only hot data. Monitor hit rates per key pattern. If a key pattern has <50% hit rate, stop caching it. Use LRU eviction to automatically prioritize hot data. The goal is 90%+ hit rate on 10-20% of your dataset.