Redis Common Patterns
How Redis is used in real systems — distributed locking, rate limiting, caching patterns, session storage, leaderboards, Pub/Sub, Lua scripting, and transactions.
Table of Contents
Distributed Locking
In a distributed system, multiple processes or servers often need exclusive access to a shared resource — a database row, a file, or an external API. Redis is commonly used to implement distributed locks because it's fast, single-threaded (commands are atomic), and widely deployed.
The Hotel Room Key
A distributed lock is like a hotel room key card. Only one guest can hold the key at a time. If you have the key, you can enter the room (access the resource). When you're done, you return the key (release the lock). If you lose the key or disappear, the hotel re-issues it after a timeout (lock expiry). The naive approach is like writing your name on a sticky note on the door — someone else can overwrite it between the time you check and the time you write.
The Naive Approach — SETNX + EXPIRE (Broken)
The first instinct is to use two separate commands: SETNX to acquire the lock, then EXPIRE to set a timeout. This has a critical race condition.
# Step 1: Try to acquire the lock SETNX my-lock "process-A" # Returns 1 (success) — lock acquired # Step 2: Set expiry so the lock doesn't live forever EXPIRE my-lock 10 # THE PROBLEM: # What if the process crashes BETWEEN Step 1 and Step 2? # → SETNX succeeds (lock is held) # → EXPIRE never runs (process died) # → Lock is held FOREVER — deadlock # → No other process can ever acquire it
🚨 This Is a Real Production Bug
The SETNX + EXPIRE race condition has caused real outages. If the process crashes, loses its network connection, or gets killed between the two commands, the lock is never released. Every other process waits forever.
The Correct Approach — Atomic SET with NX and EX
Redis 2.6.12+ supports setting the value, the NX flag, and the expiry in a single atomic command. No race condition possible.
# Acquire lock — atomic, no race condition SET my-lock "process-A-uuid" NX EX 10 # NX = only set if key does NOT exist (acquire) # EX = expire after 10 seconds (auto-release) # Returns "OK" if acquired, nil if already held # The value should be a unique identifier (UUID) # so only the holder can release the lock # Release lock — only if we still hold it # (Must use Lua script for atomicity) EVAL " if redis.call('GET', KEYS[1]) == ARGV[1] then return redis.call('DEL', KEYS[1]) else return 0 end " 1 my-lock "process-A-uuid"
Acquire: SET key uuid NX EX seconds
Atomically sets the lock key with a unique value (UUID) and an expiry. NX ensures only one process wins. EX ensures the lock auto-releases if the holder dies.
Do Work: Process the critical section
The lock holder performs the exclusive operation. The lock's TTL must be longer than the expected work duration.
Release: Lua script checks value before DEL
A Lua script atomically checks that the lock value matches the holder's UUID before deleting. This prevents accidentally releasing someone else's lock if yours expired.
Redlock Algorithm
A single Redis instance is a single point of failure. The Redlock algorithm, proposed by Salvatore Sanfilippo (antirez), uses N independent Redis nodes (typically 5) to create a more fault-tolerant lock.
Setup: 5 independent Redis instances (not replicas — independent) Acquire lock: 1. Record current time T1 2. Try to acquire the lock on ALL 5 instances SET resource-lock uuid NX EX 30 (on each instance) 3. Record current time T2 4. Lock is acquired if: a. Majority (≥3 of 5) instances granted the lock b. Total time (T2 - T1) < lock TTL (if it took 29 seconds to acquire a 30-second lock, it's already almost expired — reject it) 5. Effective lock validity = TTL - (T2 - T1) Release lock: Send DEL (via Lua script) to ALL 5 instances (even ones that didn't grant the lock — cleanup) Failure tolerance: → 2 of 5 instances can fail and the lock still works → No single point of failure
⚠️ The Redlock Controversy
Martin Kleppmann (author of "Designing Data-Intensive Applications") published a critique arguing Redlock is fundamentally unsafe. His argument: if a process holding the lock pauses (GC pause, network delay) longer than the TTL, the lock expires, another process acquires it, and both processes operate on the shared resource simultaneously. Kleppmann recommends fencing tokens — a monotonically increasing number attached to each lock acquisition that the resource checks before accepting writes. Antirez responded defending Redlock. The debate remains unresolved. In interviews, mention both sides.
Lock Expiry and Holder Death
Key Considerations for Lock Expiry
- ✅TTL too short: lock expires while work is still in progress — two holders simultaneously
- ✅TTL too long: if the holder crashes, the resource is blocked until TTL expires
- ✅Lock renewal: the holder can extend the TTL periodically (watchdog pattern) to prevent premature expiry
- ✅Fencing tokens: attach a monotonic sequence number to each lock — the resource rejects stale tokens
- ✅Always use a unique value (UUID) so only the holder can release the lock
Rate Limiting
Rate limiting controls how many requests a client can make in a given time window. Redis is ideal for this because it's fast (sub-millisecond), atomic (INCR is single-threaded), and supports TTL natively.
The Traffic Light
Rate limiting is like a traffic light at a highway on-ramp. Cars (requests) arrive continuously, but the light only lets one through every few seconds. Without the light, all cars merge at once and the highway (your server) jams. The fixed window counter is like counting cars per minute. The sliding window is like a rolling 60-second view. The token bucket is like a jar of tokens — each car takes one, and tokens refill at a steady rate.
Fixed Window Counter
The simplest approach: count requests per time window using INCR and EXPIRE.
# Key format: rate_limit:{user_id}:{window} # Window = current minute (e.g., 2024-01-15T10:30) # For each incoming request: key = "rate_limit:user123:2024-01-15T10:30" INCR key # First call: creates key with value 1 and returns 1 # Subsequent calls: increments and returns new value # Set expiry only on first creation (when INCR returns 1) EXPIRE key 60 # Key auto-deletes after 60 seconds # Check the count count = INCR key if count > 100: # Rate limit exceeded — reject request (HTTP 429) else: # Allow request # Problem: boundary spike # User sends 100 requests at 10:30:59 (end of window) # User sends 100 requests at 10:31:01 (start of new window) # → 200 requests in 2 seconds, but each window sees only 100
⚠️ The Boundary Problem
Fixed window counters have a well-known flaw: a burst of requests at the boundary of two windows can allow 2x the intended rate. If the limit is 100/minute, a client can send 100 at 10:30:59 and 100 at 10:31:01 — 200 requests in 2 seconds.
Sliding Window with Sorted Sets
A sorted set tracks each request's timestamp as the score. To check the rate, remove old entries and count remaining ones. This eliminates the boundary problem.
# For each incoming request: key = "rate_limit:user123" now = current_timestamp_ms() # e.g., 1705312260000 window = 60000 # 60 seconds in ms # Step 1: Remove entries older than the window ZREMRANGEBYSCORE key 0 (now - window) # Removes all entries with score < (now - 60s) # Step 2: Count entries in the current window count = ZCOUNT key (now - window) now # Step 3: If under limit, add this request if count < 100: ZADD key now unique_request_id EXPIRE key 60 # Cleanup: auto-delete if inactive # Allow request else: # Rate limit exceeded — reject (HTTP 429) # Why this works: # The window "slides" with the current time # No boundary problem — always looking at the last 60 seconds # Trade-off: uses more memory (one entry per request)
Token Bucket with Lua Scripts
The token bucket algorithm allows bursts while enforcing an average rate. Tokens are added at a fixed rate. Each request consumes one token. If no tokens are available, the request is rejected. A Lua script ensures atomicity.
-- Token bucket rate limiter (Lua script) -- KEYS[1] = bucket key -- ARGV[1] = max tokens (bucket capacity) -- ARGV[2] = refill rate (tokens per second) -- ARGV[3] = current timestamp (seconds) -- ARGV[4] = tokens to consume (usually 1) local key = KEYS[1] local max_tokens = tonumber(ARGV[1]) local refill_rate = tonumber(ARGV[2]) local now = tonumber(ARGV[3]) local requested = tonumber(ARGV[4]) -- Get current state local bucket = redis.call('HMGET', key, 'tokens', 'last_refill') local tokens = tonumber(bucket[1]) or max_tokens local last_refill = tonumber(bucket[2]) or now -- Calculate tokens to add since last refill local elapsed = now - last_refill local new_tokens = elapsed * refill_rate tokens = math.min(max_tokens, tokens + new_tokens) -- Try to consume tokens if tokens >= requested then tokens = tokens - requested redis.call('HMSET', key, 'tokens', tokens, 'last_refill', now) redis.call('EXPIRE', key, max_tokens / refill_rate * 2) return 1 -- Allowed else redis.call('HMSET', key, 'tokens', tokens, 'last_refill', now) redis.call('EXPIRE', key, max_tokens / refill_rate * 2) return 0 -- Rejected end
🎯 Why Lua Scripts Matter for Atomicity
The token bucket requires reading the current state, computing new tokens, and updating — multiple steps that must happen atomically. Without Lua, another request could read stale state between your GET and SET. Redis executes Lua scripts atomically — no other command runs until the script finishes. This is why Lua is essential for complex rate limiting.
| Algorithm | Accuracy | Memory | Burst Handling | Complexity |
|---|---|---|---|---|
| Fixed Window | Low (boundary spike) | Low (1 counter per window) | Allows 2x burst at boundary | Very simple |
| Sliding Window | High (no boundary issue) | High (1 entry per request) | Strict enforcement | Medium |
| Token Bucket | High | Low (2 fields per user) | Allows controlled bursts | Medium (Lua script) |
Caching Patterns
Redis is the most widely used caching layer in production systems. But "add a cache" isn't a strategy — the pattern you choose determines consistency, performance, and failure behavior.
Cache-Aside (Lazy Loading)
The application manages the cache explicitly. On read, check the cache first. On miss, query the database, then populate the cache. The cache is only loaded with data that's actually requested.
Read: Check cache
Application calls GET key on Redis. If the value exists (cache hit), return it immediately.
Miss: Query database
If the cache returns nil (cache miss), query the database for the data.
Populate: Write to cache
Store the database result in Redis with a TTL: SET key value EX 3600. Future requests will hit the cache.
Write: Invalidate cache
When data is updated in the database, delete the cache key: DEL key. The next read will trigger a fresh load.
# Read path value = GET "user:42" if value != nil: return deserialize(value) # Cache HIT # Cache MISS — load from database value = db.query("SELECT * FROM users WHERE id = 42") SET "user:42" serialize(value) EX 3600 # Cache for 1 hour return value # Write path — invalidate on update db.query("UPDATE users SET name = 'Alice' WHERE id = 42") DEL "user:42" # Invalidate cache # Next read will load fresh data from DB
Write-Through and Write-Behind
| Pattern | How It Works | Consistency | Write Latency |
|---|---|---|---|
| Cache-Aside | App manages cache. Read: check cache → miss → DB → populate cache. Write: update DB → invalidate cache. | Eventual (stale reads possible between write and invalidation) | Low (write goes to DB only) |
| Write-Through | Every write goes to cache AND database synchronously. Cache is always up to date. | Strong (cache always matches DB) | Higher (two synchronous writes) |
| Write-Behind (Write-Back) | Write to cache immediately, asynchronously flush to database in batches. | Eventual (DB lags behind cache) | Very low (write to cache only) |
Write-Through: 1. Application writes to cache: SET "user:42" new_data 2. Cache layer synchronously writes to DB 3. Both cache and DB are updated before returning to client → Pro: cache is always consistent with DB → Con: every write has double latency (cache + DB) Write-Behind (Write-Back): 1. Application writes to cache: SET "user:42" new_data 2. Return success to client immediately 3. Background worker flushes dirty cache entries to DB periodically → Pro: extremely fast writes (cache-speed) → Con: data loss risk if cache crashes before flush → Con: DB is temporarily stale
Cache Stampede — The Problem and Solutions
A cache stampede (also called thundering herd) occurs when a popular cache key expires and many concurrent requests all try to rebuild it simultaneously, overwhelming the database.
Mutex Lock
When a cache miss occurs, the first request acquires a lock (SET lock:key uuid NX EX 5) and rebuilds the cache. Other requests wait or serve stale data. Only one database query runs.
Probabilistic Early Expiry (PER)
Each request checks: should I refresh early? Using the formula: current_time - (TTL_remaining * beta * log(random())), if the result exceeds the expiry time, refresh proactively. This spreads refreshes before the actual TTL expires, preventing a synchronized stampede.
Background Refresh
A background job refreshes hot keys before they expire. The cache never actually goes empty — it's always warm. Trade-off: slightly stale data during the refresh window.
# Request arrives, cache miss on "product:42" # Try to acquire rebuild lock acquired = SET "lock:product:42" uuid NX EX 5 # NX = only if not exists, EX = 5 second timeout if acquired: # I won the lock — rebuild cache data = db.query("SELECT * FROM products WHERE id = 42") SET "product:42" serialize(data) EX 3600 DEL "lock:product:42" return data else: # Someone else is rebuilding — wait and retry SLEEP 50ms value = GET "product:42" if value != nil: return deserialize(value) # Still not ready — retry (with backoff)
Cache Warming Strategies
When and How to Warm the Cache
- ✅On deploy: pre-load the top N most-accessed keys from the database before accepting traffic
- ✅On cache restart: use a readiness probe — the server isn't 'ready' until hot keys are loaded
- ✅Predictive warming: if you know traffic spikes at 9 AM, warm the cache at 8:55 AM
- ✅Lazy warming with protection: accept traffic immediately but use mutex locks to prevent stampede during cold start
- ✅Never warm everything: only warm the hot set (top 1-5% of keys that serve 80%+ of traffic)
Session Storage
HTTP is stateless — every request is independent. Sessions bridge this gap by storing user state (login status, cart contents, preferences) between requests. Redis is the most popular session store for web applications because it's fast, supports TTL natively, and is shared across all application servers.
The Coat Check
A session is like a coat check at a restaurant. You hand over your coat (user state) and get a ticket (session ID cookie). On every visit, you show the ticket and get your coat back. If the coat check is local to one restaurant (sticky sessions), you can only go back to that specific location. If it's a centralized warehouse (Redis), any location can retrieve your coat with the same ticket.
Hashes vs Serialized Strings
# Approach 1: Hash (individual fields) HSET session:abc123 user_id 42 HSET session:abc123 email "alice@example.com" HSET session:abc123 role "admin" HSET session:abc123 cart_count 3 EXPIRE session:abc123 1800 # 30-minute TTL # Read one field: HGET session:abc123 role # → "admin" # Read all fields: HGETALL session:abc123 # → user_id: 42, email: alice@example.com, role: admin, cart_count: 3 # Update one field without reading/writing the whole session: HINCRBY session:abc123 cart_count 1 # --- # Approach 2: Serialized string (JSON blob) SET session:abc123 '{"user_id":42,"email":"alice@example.com","role":"admin","cart_count":3}' EX 1800 # Read: must deserialize the entire blob GET session:abc123 # → parse JSON, extract what you need # Update: must read, deserialize, modify, serialize, write back # → NOT atomic without Lua
| Approach | Partial Read | Partial Update | Memory | Complexity |
|---|---|---|---|---|
| Hash (HSET/HGET) | Yes — HGET one field | Yes — HSET one field, HINCRBY | Slightly more (field overhead) | Simple Redis commands |
| Serialized String | No — must GET and parse entire blob | No — read-modify-write cycle | Slightly less (compact JSON) | Requires serialization logic |
🎯 Use Hashes for Sessions
Hashes are almost always the better choice for sessions. You can read or update individual fields without touching the rest. HINCRBY lets you atomically increment counters (cart count, page views). The memory overhead is negligible for typical session sizes.
TTL-Based Session Expiry
# Create session with 30-minute TTL HSET session:abc123 user_id 42 email "alice@example.com" EXPIRE session:abc123 1800 # On every request, refresh the TTL (sliding expiration) EXPIRE session:abc123 1800 # → Session stays alive as long as the user is active # → Expires 30 minutes after the LAST request # Check remaining TTL TTL session:abc123 # → 1742 (seconds remaining) # Absolute expiration (session dies at a fixed time regardless) EXPIREAT session:abc123 1705363200 # → Expires at this Unix timestamp, no matter what
Sticky Sessions vs Centralized Session Store
| Approach | How It Works | Pros | Cons |
|---|---|---|---|
| Sticky Sessions | Load balancer routes a user to the same server every time. Session stored in server memory. | No external dependency, very fast reads | Server failure loses all sessions. Can't scale horizontally. Uneven load distribution. |
| Centralized (Redis) | All servers read/write sessions from a shared Redis instance. Any server can handle any request. | Survives server failures. Horizontal scaling. Even load distribution. | Network hop for every session read. Redis is a dependency. |
💡 Production Standard
Centralized session storage with Redis is the industry standard for any system with more than one application server. Sticky sessions are a legacy pattern that breaks auto-scaling, rolling deploys, and fault tolerance.
Leaderboards
Redis Sorted Sets are purpose-built for leaderboards. Each member has a score, and Redis maintains them in sorted order automatically. Rank lookups, score updates, and range queries are all O(log N) — fast enough for millions of entries in real time.
The Scoreboard
A Redis Sorted Set is like a digital scoreboard that automatically re-sorts itself every time a score changes. You don't need to sort manually — just update a player's score and the board instantly reflects the new ranking. You can ask 'What's the top 10?' or 'What rank is player X?' and get answers in microseconds, even with millions of players.
Building a Real-Time Leaderboard
# Add players with scores ZADD leaderboard 1500 "alice" ZADD leaderboard 2300 "bob" ZADD leaderboard 1800 "charlie" ZADD leaderboard 3100 "diana" ZADD leaderboard 2750 "eve" # Get top 5 players (highest scores first) ZREVRANGE leaderboard 0 4 WITHSCORES # → diana: 3100, eve: 2750, bob: 2300, charlie: 1800, alice: 1500 # Get a specific player's rank (0-indexed, highest first) ZREVRANK leaderboard "bob" # → 2 (3rd place, 0-indexed) # Get a specific player's score ZSCORE leaderboard "bob" # → 2300 # Get players ranked 10th to 20th (paginated) ZREVRANGE leaderboard 9 19 WITHSCORES # Count total players ZCARD leaderboard # → 5
Updating Scores Atomically
# Player "alice" scores 200 more points ZINCRBY leaderboard 200 "alice" # → 1700 (returns new score) # The sorted set automatically re-ranks alice # This is atomic — no read-modify-write race condition # Even with 1000 concurrent score updates, every increment is applied # Replace score entirely (ZADD overwrites) ZADD leaderboard 5000 "alice" # → alice now has score 5000, re-ranked to #1 # Conditional update: only if new score is higher (GT flag, Redis 6.2+) ZADD leaderboard GT 4000 "alice" # → Ignored because 4000 < 5000 (current score) ZADD leaderboard GT 6000 "alice" # → Updated to 6000 because 6000 > 5000
Paginated Leaderboards
# Page size: 10 players per page # Page 1 (ranks 1-10) ZREVRANGE leaderboard 0 9 WITHSCORES # Page 2 (ranks 11-20) ZREVRANGE leaderboard 10 19 WITHSCORES # Page 3 (ranks 21-30) ZREVRANGE leaderboard 20 29 WITHSCORES # Generic formula: # Page N (1-indexed): ZREVRANGE leaderboard (N-1)*pageSize N*pageSize-1 # "Show me my neighborhood" — players around a specific rank rank = ZREVRANK leaderboard "alice" # e.g., 42 ZREVRANGE leaderboard (rank - 5) (rank + 5) WITHSCORES # → Shows 11 players: 5 above alice, alice, 5 below alice
🎯 Interview Insight
Leaderboards are a classic Redis interview question. The key insight: Sorted Sets give you O(log N) for add, update, rank lookup, and range queries. For a leaderboard with 10 million players, that's about 23 operations — microseconds. No SQL ORDER BY can compete at this scale.
Pub/Sub
Redis Pub/Sub is a messaging system where publishers send messages to channels and subscribers receive them in real time. It's fire-and-forget — messages are delivered to connected subscribers instantly but are not persisted.
The Radio Station
Pub/Sub is like a live radio broadcast. The station (publisher) broadcasts a message. Anyone tuned in (subscriber) hears it immediately. But if your radio is off (disconnected), you miss the broadcast entirely — there's no recording, no replay. This is fundamentally different from a message queue (like a voicemail box) where messages wait for you.
Core Commands
# Terminal 1 — Subscribe to a channel SUBSCRIBE notifications # Waiting for messages... # Terminal 2 — Subscribe to the same channel SUBSCRIBE notifications # Waiting for messages... # Terminal 3 — Publish a message PUBLISH notifications "New order received: #12345" # → (integer) 2 (delivered to 2 subscribers) # Both Terminal 1 and Terminal 2 receive: # 1) "message" # 2) "notifications" # 3) "New order received: #12345" # Pattern-based subscription (PSUBSCRIBE) PSUBSCRIBE orders.* # Matches: orders.created, orders.shipped, orders.cancelled PUBLISH orders.created '{"id": 123, "total": 59.99}' PUBLISH orders.shipped '{"id": 120, "carrier": "FedEx"}' # Both messages delivered to the pattern subscriber
Pub/Sub vs Streams — The Critical Difference
| Feature | Pub/Sub | Streams |
|---|---|---|
| Persistence | None — fire and forget | Yes — messages stored on disk |
| Missed messages | Lost forever if subscriber is disconnected | Can read from any point in history |
| Consumer groups | No — all subscribers get all messages | Yes — messages distributed across group members |
| Acknowledgment | No — no delivery guarantee | Yes — XACK confirms processing |
| Backpressure | None — slow subscribers get overwhelmed | Built-in — consumers read at their own pace |
| Use case | Real-time notifications, live dashboards | Task queues, event sourcing, reliable messaging |
🚨 Pub/Sub Is NOT a Message Queue
This is the most common misconception. Pub/Sub has no persistence, no acknowledgment, and no replay. If a subscriber disconnects for even a second, it misses every message published during that time. For reliable messaging, use Redis Streams or a dedicated message broker (Kafka, RabbitMQ).
Use Cases
Good Use Cases for Pub/Sub
- ✅Live notifications: push alerts to connected users in real time
- ✅Real-time dashboards: broadcast metric updates to all dashboard clients
- ✅Cache invalidation: publish invalidation events so all app servers clear their local caches
- ✅Chat presence: broadcast 'user online/offline' status to connected clients
- ✅Configuration updates: notify all servers when a feature flag changes
When NOT to Use Pub/Sub
- ❌Task queues: use Streams or a dedicated queue — you need acknowledgment and retry
- ❌Event sourcing: use Streams or Kafka — you need message history
- ❌Critical notifications: if missing a message is unacceptable, Pub/Sub is the wrong tool
- ❌High-volume processing: slow subscribers will drop messages with no backpressure
Lua Scripting & Transactions
Redis is single-threaded — each command is atomic. But what about multi-step operations? If you need to read a value, compute something, and write back, another client could modify the value between your read and write. Redis provides two mechanisms for atomic multi-command execution: Lua scripts and MULTI/EXEC transactions.
Why Lua: Atomic Multi-Command Execution
A Lua script runs entirely on the Redis server. While the script executes, no other command can run — the entire script is atomic. This eliminates race conditions between read and write operations.
# Goal: increment a counter only if it's below a limit # Without Lua — race condition: # Client A # Client B GET counter → 99 GET counter → 99 # 99 < 100, proceed # 99 < 100, proceed INCR counter → 100 INCR counter → 101 ← EXCEEDS LIMIT! # Both clients read 99, both decide to increment # Result: counter = 101, limit violated
-- Atomic check-and-increment (Lua script) -- KEYS[1] = counter key -- ARGV[1] = limit local current = tonumber(redis.call('GET', KEYS[1]) or 0) if current < tonumber(ARGV[1]) then return redis.call('INCR', KEYS[1]) else return -1 -- limit reached end -- Usage: -- EVAL "..." 1 counter 100 -- This entire script runs atomically — no interleaving possible
EVAL and EVALSHA
# EVAL — send the full script every time EVAL "return redis.call('GET', KEYS[1])" 1 mykey # Works but sends the entire script text on every call # EVALSHA — send only the script's SHA1 hash # Step 1: Load the script (returns SHA1 hash) SCRIPT LOAD "return redis.call('GET', KEYS[1])" # → "a42059b356c875f0717db19a51f6aaa9161571a2" # Step 2: Call by hash (much less bandwidth) EVALSHA "a42059b356c875f0717db19a51f6aaa9161571a2" 1 mykey # Redis caches loaded scripts in memory # EVALSHA is preferred in production — less network overhead # If the script isn't cached (server restart), fall back to EVAL
MULTI/EXEC Transactions
MULTI/EXEC groups commands into a transaction. All commands between MULTI and EXEC are queued and executed atomically. But unlike Lua, you cannot read a value and use it in a later command within the same transaction — commands are queued blindly.
# Transfer 100 points from alice to bob — atomically MULTI DECRBY wallet:alice 100 INCRBY wallet:bob 100 EXEC # Both commands execute atomically — no partial state visible # Either both succeed or neither does # IMPORTANT: You CANNOT do this with MULTI/EXEC: MULTI GET wallet:alice # Returns "QUEUED", not the actual value! # Can't check if alice has enough balance here DECRBY wallet:alice 100 # Runs blindly INCRBY wallet:bob 100 EXEC # → No conditional logic possible inside MULTI/EXEC
WATCH for Optimistic Locking
# WATCH monitors keys for changes before EXEC # If any watched key is modified by another client, # EXEC returns nil (transaction aborted) WATCH wallet:alice balance = GET wallet:alice # → 500 if balance >= 100: MULTI DECRBY wallet:alice 100 INCRBY wallet:bob 100 EXEC # If another client modified wallet:alice after WATCH, # EXEC returns nil — transaction aborted, retry else: UNWATCH # Insufficient balance # This is optimistic locking: # → Assume no conflict, proceed optimistically # → If conflict detected at EXEC time, abort and retry # → Works well when conflicts are rare
When to Prefer Lua Over MULTI/EXEC
| Feature | Lua Scripts | MULTI/EXEC |
|---|---|---|
| Conditional logic | Yes — full if/else, loops, math | No — commands are queued blindly |
| Read-then-write | Yes — read a value and use it in the same script | No — GET returns 'QUEUED', not the value |
| Atomicity | Full — entire script is atomic | Full — all commands execute together |
| Performance | One round trip (script runs server-side) | Multiple round trips (MULTI, commands, EXEC) |
| Complexity | Requires Lua knowledge | Simple Redis commands |
| Use case | Complex logic: rate limiting, conditional updates, CAS operations | Simple batching: transfer funds, update multiple keys together |
🎯 Rule of Thumb
If you need to read a value and make a decision based on it within the same atomic operation, use Lua. If you just need to batch multiple writes together, MULTI/EXEC is simpler. In practice, Lua scripts are used far more often in production because most real operations involve conditional logic.
Interview Questions
These questions test whether you understand how Redis patterns work in real systems and can reason about their trade-offs.
Q:How would you implement a distributed lock with Redis? What are the pitfalls?
A: Use the atomic SET command: SET lock-key uuid NX EX 30. NX ensures only one client acquires the lock. EX sets an auto-expiry so the lock is released if the holder crashes. The value must be a unique identifier (UUID) so only the holder can release it — release via a Lua script that checks the value before DEL. Pitfalls: (1) SETNX + EXPIRE as two separate commands has a race condition — if the process dies between them, the lock is held forever. (2) Lock expiry while work is still in progress — another client acquires the lock, leading to two holders. Mitigate with a watchdog that extends the TTL. (3) Single Redis instance is a single point of failure — Redlock uses 5 independent instances with majority quorum. (4) Kleppmann's critique: GC pauses or network delays can cause the lock to expire without the holder knowing. Fencing tokens provide an additional safety layer.
Q:Design a rate limiter using Redis. Which algorithm would you choose and why?
A: For most use cases, I'd use the sliding window with Sorted Sets. For each request: (1) ZREMRANGEBYSCORE to remove entries older than the window, (2) ZCOUNT to check the current count, (3) ZADD to record the new request if under the limit. This eliminates the boundary spike problem of fixed window counters. For APIs that need to allow controlled bursts (e.g., 100 requests/minute but allow a burst of 20 in one second), I'd use a token bucket implemented as a Lua script — it tracks tokens and refill rate in a Hash, and the Lua script atomically computes available tokens and consumes them. The Lua script is critical because the read-compute-write cycle must be atomic to prevent race conditions under concurrent requests.
Q:What is cache stampede and how do you prevent it?
A: Cache stampede (thundering herd) occurs when a popular cache key expires and hundreds of concurrent requests all try to rebuild it simultaneously, overwhelming the database with identical queries. Prevention: (1) Mutex lock — the first request acquires a lock (SET lock:key uuid NX EX 5) and rebuilds the cache. Other requests wait or serve stale data. This turns N database queries into 1. (2) Probabilistic early expiry — each request has a small chance of refreshing the cache before the TTL actually expires, spreading the refresh load over time. (3) Background refresh — a worker proactively refreshes hot keys before they expire, so the cache never goes empty. In production, combine mutex locks with background refresh for hot keys.
Q:When would you use Redis Pub/Sub vs Redis Streams?
A: Pub/Sub is fire-and-forget: messages are delivered to connected subscribers instantly but not persisted. If a subscriber disconnects, it misses all messages published during the disconnection. Use Pub/Sub for real-time notifications, live dashboards, cache invalidation broadcasts, and chat presence — scenarios where missing a message is acceptable. Streams are persistent: messages are stored and can be read from any point in history. They support consumer groups (messages distributed across workers), acknowledgment (XACK), and backpressure. Use Streams for task queues, event sourcing, reliable messaging, and any scenario where every message must be processed exactly once. The critical difference: Pub/Sub is a broadcast channel, Streams is a durable log.
Q:How do Redis Sorted Sets power a real-time leaderboard at scale?
A: Sorted Sets maintain members in score-sorted order with O(log N) operations. ZADD adds or updates a player's score. ZINCRBY atomically increments a score (no read-modify-write race). ZREVRANGE returns the top N players by score. ZREVRANK returns a specific player's rank. For a leaderboard with 10 million players: adding a score is O(log 10M) ≈ 23 operations — microseconds. Pagination uses ZREVRANGE with offset: page 2 is ZREVRANGE key 10 19. For 'show my neighborhood,' get the player's rank with ZREVRANK, then ZREVRANGE from rank-5 to rank+5. This is impossible to match with SQL ORDER BY at this scale — a database would need to sort millions of rows on every query.
Common Mistakes
These mistakes are common in production Redis deployments and have caused real outages.
Using SETNX + EXPIRE for distributed locks
Two separate commands to acquire a lock and set its expiry. If the process crashes between SETNX and EXPIRE, the lock is held forever — a deadlock that requires manual intervention. This race condition has caused production outages at companies of every size.
✅Always use the atomic SET command: SET key value NX EX seconds. This sets the value and expiry in a single atomic operation — no race condition possible. For release, use a Lua script that checks the value (UUID) before deleting to prevent releasing someone else's lock.
Using Pub/Sub as a message queue
Teams use Pub/Sub expecting reliable message delivery. A subscriber disconnects for 5 seconds during a deploy, and misses every message published during that window. No retry, no replay, no acknowledgment. Critical events are silently lost.
✅Use Redis Streams for any scenario where messages must be reliably processed. Streams persist messages, support consumer groups for load distribution, and provide XACK for acknowledgment. Pub/Sub is only appropriate for fire-and-forget broadcasts where missing messages is acceptable.
Setting lock TTL too short or too long
TTL too short: the lock expires while the holder is still processing. Another client acquires the lock, and two processes operate on the shared resource simultaneously — data corruption. TTL too long: if the holder crashes, the resource is blocked for the entire TTL duration. Other processes wait minutes for a lock that will never be released.
✅Set the TTL to 2-3x the expected operation duration. Implement a watchdog pattern: a background thread extends the TTL periodically while the holder is still alive. If the holder crashes, the watchdog stops, and the lock expires naturally. Libraries like Redisson implement this automatically.
Not using Lua for multi-step atomic operations
Implementing rate limiting or conditional updates with separate GET and SET commands. Under concurrency, multiple clients read the same stale value and all proceed — the rate limit is violated, the counter overflows, or the conditional check is bypassed.
✅Any operation that reads a value and makes a decision based on it must be atomic. Use a Lua script: the entire read-compute-write cycle runs as a single atomic operation on the Redis server. No other command can interleave. For simple batching without conditional logic, MULTI/EXEC is sufficient.