Distributed LockingRate LimitingCaching PatternsSession StorageLeaderboardsPub/SubLua Scripting

Redis Common Patterns

How Redis is used in real systems — distributed locking, rate limiting, caching patterns, session storage, leaderboards, Pub/Sub, Lua scripting, and transactions.

35 min read9 sections

Distributed Locking

In a distributed system, multiple processes or servers often need exclusive access to a shared resource — a database row, a file, or an external API. Redis is commonly used to implement distributed locks because it's fast, single-threaded (commands are atomic), and widely deployed.

🔐

The Hotel Room Key

A distributed lock is like a hotel room key card. Only one guest can hold the key at a time. If you have the key, you can enter the room (access the resource). When you're done, you return the key (release the lock). If you lose the key or disappear, the hotel re-issues it after a timeout (lock expiry). The naive approach is like writing your name on a sticky note on the door — someone else can overwrite it between the time you check and the time you write.

The Naive Approach — SETNX + EXPIRE (Broken)

The first instinct is to use two separate commands: SETNX to acquire the lock, then EXPIRE to set a timeout. This has a critical race condition.

Naive Locking — The Race Conditionbash

# Step 1: Try to acquire the lock
SETNX my-lock "process-A"
# Returns 1 (success) — lock acquired

# Step 2: Set expiry so the lock doesn't live forever
EXPIRE my-lock 10

# THE PROBLEM:
# What if the process crashes BETWEEN Step 1 and Step 2?
# → SETNX succeeds (lock is held)
# → EXPIRE never runs (process died)
# → Lock is held FOREVER — deadlock
# → No other process can ever acquire it

🚨 This Is a Real Production Bug

The SETNX + EXPIRE race condition has caused real outages. If the process crashes, loses its network connection, or gets killed between the two commands, the lock is never released. Every other process waits forever.

The Correct Approach — Atomic SET with NX and EX

Redis 2.6.12+ supports setting the value, the NX flag, and the expiry in a single atomic command. No race condition possible.

Correct Locking — Atomic SET NX EXbash

# Acquire lock — atomic, no race condition
SET my-lock "process-A-uuid" NX EX 10
# NX  = only set if key does NOT exist (acquire)
# EX  = expire after 10 seconds (auto-release)
# Returns "OK" if acquired, nil if already held

# The value should be a unique identifier (UUID)
# so only the holder can release the lock

# Release lock — only if we still hold it
# (Must use Lua script for atomicity)
EVAL "
  if redis.call('GET', KEYS[1]) == ARGV[1] then
    return redis.call('DEL', KEYS[1])
  else
    return 0
  end
" 1 my-lock "process-A-uuid"

Acquire: SET key uuid NX EX seconds

Atomically sets the lock key with a unique value (UUID) and an expiry. NX ensures only one process wins. EX ensures the lock auto-releases if the holder dies.

Do Work: Process the critical section

The lock holder performs the exclusive operation. The lock's TTL must be longer than the expected work duration.

Release: Lua script checks value before DEL

A Lua script atomically checks that the lock value matches the holder's UUID before deleting. This prevents accidentally releasing someone else's lock if yours expired.

Redlock Algorithm

A single Redis instance is a single point of failure. The Redlock algorithm, proposed by Salvatore Sanfilippo (antirez), uses N independent Redis nodes (typically 5) to create a more fault-tolerant lock.

Redlock Algorithm — Stepstext

Setup: 5 independent Redis instances (not replicas — independent)

Acquire lock:
  1. Record current time T1
  2. Try to acquire the lock on ALL 5 instances
     SET resource-lock uuid NX EX 30  (on each instance)
  3. Record current time T2
  4. Lock is acquired if:
     a. Majority (≥3 of 5) instances granted the lock
     b. Total time (T2 - T1) < lock TTL
        (if it took 29 seconds to acquire a 30-second lock,
         it's already almost expired — reject it)
  5. Effective lock validity = TTL - (T2 - T1)

Release lock:
  Send DEL (via Lua script) to ALL 5 instances
  (even ones that didn't grant the lock — cleanup)

Failure tolerance:
  → 2 of 5 instances can fail and the lock still works
  → No single point of failure

⚠️ The Redlock Controversy

Martin Kleppmann (author of "Designing Data-Intensive Applications") published a critique arguing Redlock is fundamentally unsafe. His argument: if a process holding the lock pauses (GC pause, network delay) longer than the TTL, the lock expires, another process acquires it, and both processes operate on the shared resource simultaneously. Kleppmann recommends fencing tokens — a monotonically increasing number attached to each lock acquisition that the resource checks before accepting writes. Antirez responded defending Redlock. The debate remains unresolved. In interviews, mention both sides.

Lock Expiry and Holder Death

Key Considerations for Lock Expiry

✅TTL too short: lock expires while work is still in progress — two holders simultaneously
✅TTL too long: if the holder crashes, the resource is blocked until TTL expires
✅Lock renewal: the holder can extend the TTL periodically (watchdog pattern) to prevent premature expiry
✅Fencing tokens: attach a monotonic sequence number to each lock — the resource rejects stale tokens
✅Always use a unique value (UUID) so only the holder can release the lock

Rate Limiting

Rate limiting controls how many requests a client can make in a given time window. Redis is ideal for this because it's fast (sub-millisecond), atomic (INCR is single-threaded), and supports TTL natively.

🚦

The Traffic Light

Rate limiting is like a traffic light at a highway on-ramp. Cars (requests) arrive continuously, but the light only lets one through every few seconds. Without the light, all cars merge at once and the highway (your server) jams. The fixed window counter is like counting cars per minute. The sliding window is like a rolling 60-second view. The token bucket is like a jar of tokens — each car takes one, and tokens refill at a steady rate.

Fixed Window Counter

The simplest approach: count requests per time window using INCR and EXPIRE.

Fixed Window Counterbash

# Key format: rate_limit:{user_id}:{window}
# Window = current minute (e.g., 2024-01-15T10:30)

# For each incoming request:
key = "rate_limit:user123:2024-01-15T10:30"

INCR key
# First call: creates key with value 1 and returns 1
# Subsequent calls: increments and returns new value

# Set expiry only on first creation (when INCR returns 1)
EXPIRE key 60
# Key auto-deletes after 60 seconds

# Check the count
count = INCR key
if count > 100:
  # Rate limit exceeded — reject request (HTTP 429)
else:
  # Allow request

# Problem: boundary spike
# User sends 100 requests at 10:30:59 (end of window)
# User sends 100 requests at 10:31:01 (start of new window)
# → 200 requests in 2 seconds, but each window sees only 100

⚠️ The Boundary Problem

Fixed window counters have a well-known flaw: a burst of requests at the boundary of two windows can allow 2x the intended rate. If the limit is 100/minute, a client can send 100 at 10:30:59 and 100 at 10:31:01 — 200 requests in 2 seconds.

Sliding Window with Sorted Sets

A sorted set tracks each request's timestamp as the score. To check the rate, remove old entries and count remaining ones. This eliminates the boundary problem.

Sliding Window — Sorted Setsbash

# For each incoming request:
key = "rate_limit:user123"
now = current_timestamp_ms()       # e.g., 1705312260000
window = 60000                     # 60 seconds in ms

# Step 1: Remove entries older than the window
ZREMRANGEBYSCORE key 0 (now - window)
# Removes all entries with score < (now - 60s)

# Step 2: Count entries in the current window
count = ZCOUNT key (now - window) now

# Step 3: If under limit, add this request
if count < 100:
  ZADD key now unique_request_id
  EXPIRE key 60                    # Cleanup: auto-delete if inactive
  # Allow request
else:
  # Rate limit exceeded — reject (HTTP 429)

# Why this works:
# The window "slides" with the current time
# No boundary problem — always looking at the last 60 seconds
# Trade-off: uses more memory (one entry per request)

Token Bucket with Lua Scripts

The token bucket algorithm allows bursts while enforcing an average rate. Tokens are added at a fixed rate. Each request consumes one token. If no tokens are available, the request is rejected. A Lua script ensures atomicity.

Token Bucket — Lua Scriptlua

-- Token bucket rate limiter (Lua script)
-- KEYS[1] = bucket key
-- ARGV[1] = max tokens (bucket capacity)
-- ARGV[2] = refill rate (tokens per second)
-- ARGV[3] = current timestamp (seconds)
-- ARGV[4] = tokens to consume (usually 1)

local key = KEYS[1]
local max_tokens = tonumber(ARGV[1])
local refill_rate = tonumber(ARGV[2])
local now = tonumber(ARGV[3])
local requested = tonumber(ARGV[4])

-- Get current state
local bucket = redis.call('HMGET', key, 'tokens', 'last_refill')
local tokens = tonumber(bucket[1]) or max_tokens
local last_refill = tonumber(bucket[2]) or now

-- Calculate tokens to add since last refill
local elapsed = now - last_refill
local new_tokens = elapsed * refill_rate
tokens = math.min(max_tokens, tokens + new_tokens)

-- Try to consume tokens
if tokens >= requested then
  tokens = tokens - requested
  redis.call('HMSET', key, 'tokens', tokens, 'last_refill', now)
  redis.call('EXPIRE', key, max_tokens / refill_rate * 2)
  return 1  -- Allowed
else
  redis.call('HMSET', key, 'tokens', tokens, 'last_refill', now)
  redis.call('EXPIRE', key, max_tokens / refill_rate * 2)
  return 0  -- Rejected
end

🎯 Why Lua Scripts Matter for Atomicity

The token bucket requires reading the current state, computing new tokens, and updating — multiple steps that must happen atomically. Without Lua, another request could read stale state between your GET and SET. Redis executes Lua scripts atomically — no other command runs until the script finishes. This is why Lua is essential for complex rate limiting.

Algorithm	Accuracy	Memory	Burst Handling	Complexity
Fixed Window	Low (boundary spike)	Low (1 counter per window)	Allows 2x burst at boundary	Very simple
Sliding Window	High (no boundary issue)	High (1 entry per request)	Strict enforcement	Medium
Token Bucket	High	Low (2 fields per user)	Allows controlled bursts	Medium (Lua script)

Caching Patterns

Redis is the most widely used caching layer in production systems. But "add a cache" isn't a strategy — the pattern you choose determines consistency, performance, and failure behavior.

Cache-Aside (Lazy Loading)

The application manages the cache explicitly. On read, check the cache first. On miss, query the database, then populate the cache. The cache is only loaded with data that's actually requested.

Read: Check cache

Application calls GET key on Redis. If the value exists (cache hit), return it immediately.

Miss: Query database

If the cache returns nil (cache miss), query the database for the data.

Populate: Write to cache

Store the database result in Redis with a TTL: SET key value EX 3600. Future requests will hit the cache.

Write: Invalidate cache

When data is updated in the database, delete the cache key: DEL key. The next read will trigger a fresh load.

Cache-Aside Patternbash

# Read path
value = GET "user:42"
if value != nil:
  return deserialize(value)        # Cache HIT

# Cache MISS — load from database
value = db.query("SELECT * FROM users WHERE id = 42")
SET "user:42" serialize(value) EX 3600   # Cache for 1 hour
return value

# Write path — invalidate on update
db.query("UPDATE users SET name = 'Alice' WHERE id = 42")
DEL "user:42"                      # Invalidate cache
# Next read will load fresh data from DB

Write-Through and Write-Behind

Pattern	How It Works	Consistency	Write Latency
Cache-Aside	App manages cache. Read: check cache → miss → DB → populate cache. Write: update DB → invalidate cache.	Eventual (stale reads possible between write and invalidation)	Low (write goes to DB only)
Write-Through	Every write goes to cache AND database synchronously. Cache is always up to date.	Strong (cache always matches DB)	Higher (two synchronous writes)
Write-Behind (Write-Back)	Write to cache immediately, asynchronously flush to database in batches.	Eventual (DB lags behind cache)	Very low (write to cache only)

Write-Through vs Write-Behindtext

Write-Through:
  1. Application writes to cache: SET "user:42" new_data
  2. Cache layer synchronously writes to DB
  3. Both cache and DB are updated before returning to client
  → Pro: cache is always consistent with DB
  → Con: every write has double latency (cache + DB)

Write-Behind (Write-Back):
  1. Application writes to cache: SET "user:42" new_data
  2. Return success to client immediately
  3. Background worker flushes dirty cache entries to DB periodically
  → Pro: extremely fast writes (cache-speed)
  → Con: data loss risk if cache crashes before flush
  → Con: DB is temporarily stale

Cache Stampede — The Problem and Solutions

A cache stampede (also called thundering herd) occurs when a popular cache key expires and many concurrent requests all try to rebuild it simultaneously, overwhelming the database.

Mutex Lock

When a cache miss occurs, the first request acquires a lock (SET lock:key uuid NX EX 5) and rebuilds the cache. Other requests wait or serve stale data. Only one database query runs.

Probabilistic Early Expiry (PER)

Each request checks: should I refresh early? Using the formula: current_time - (TTL_remaining * beta * log(random())), if the result exceeds the expiry time, refresh proactively. This spreads refreshes before the actual TTL expires, preventing a synchronized stampede.

Background Refresh

A background job refreshes hot keys before they expire. The cache never actually goes empty — it's always warm. Trade-off: slightly stale data during the refresh window.

Mutex Lock — Preventing Stampedebash

# Request arrives, cache miss on "product:42"

# Try to acquire rebuild lock
acquired = SET "lock:product:42" uuid NX EX 5
# NX = only if not exists, EX = 5 second timeout

if acquired:
  # I won the lock — rebuild cache
  data = db.query("SELECT * FROM products WHERE id = 42")
  SET "product:42" serialize(data) EX 3600
  DEL "lock:product:42"
  return data
else:
  # Someone else is rebuilding — wait and retry
  SLEEP 50ms
  value = GET "product:42"
  if value != nil:
    return deserialize(value)
  # Still not ready — retry (with backoff)

Cache Warming Strategies

When and How to Warm the Cache

✅On deploy: pre-load the top N most-accessed keys from the database before accepting traffic
✅On cache restart: use a readiness probe — the server isn't 'ready' until hot keys are loaded
✅Predictive warming: if you know traffic spikes at 9 AM, warm the cache at 8:55 AM
✅Lazy warming with protection: accept traffic immediately but use mutex locks to prevent stampede during cold start
✅Never warm everything: only warm the hot set (top 1-5% of keys that serve 80%+ of traffic)

Session Storage

HTTP is stateless — every request is independent. Sessions bridge this gap by storing user state (login status, cart contents, preferences) between requests. Redis is the most popular session store for web applications because it's fast, supports TTL natively, and is shared across all application servers.

🎫

The Coat Check

A session is like a coat check at a restaurant. You hand over your coat (user state) and get a ticket (session ID cookie). On every visit, you show the ticket and get your coat back. If the coat check is local to one restaurant (sticky sessions), you can only go back to that specific location. If it's a centralized warehouse (Redis), any location can retrieve your coat with the same ticket.

Hashes vs Serialized Strings

Session Storage — Two Approachesbash

# Approach 1: Hash (individual fields)
HSET session:abc123 user_id 42
HSET session:abc123 email "alice@example.com"
HSET session:abc123 role "admin"
HSET session:abc123 cart_count 3
EXPIRE session:abc123 1800          # 30-minute TTL

# Read one field:
HGET session:abc123 role            # → "admin"

# Read all fields:
HGETALL session:abc123
# → user_id: 42, email: alice@example.com, role: admin, cart_count: 3

# Update one field without reading/writing the whole session:
HINCRBY session:abc123 cart_count 1

# ---

# Approach 2: Serialized string (JSON blob)
SET session:abc123 '{"user_id":42,"email":"alice@example.com","role":"admin","cart_count":3}' EX 1800

# Read: must deserialize the entire blob
GET session:abc123
# → parse JSON, extract what you need

# Update: must read, deserialize, modify, serialize, write back
# → NOT atomic without Lua

Approach	Partial Read	Partial Update	Memory	Complexity
Hash (HSET/HGET)	Yes — HGET one field	Yes — HSET one field, HINCRBY	Slightly more (field overhead)	Simple Redis commands
Serialized String	No — must GET and parse entire blob	No — read-modify-write cycle	Slightly less (compact JSON)	Requires serialization logic

🎯 Use Hashes for Sessions

Hashes are almost always the better choice for sessions. You can read or update individual fields without touching the rest. HINCRBY lets you atomically increment counters (cart count, page views). The memory overhead is negligible for typical session sizes.

TTL-Based Session Expiry

Session TTL Managementbash

# Create session with 30-minute TTL
HSET session:abc123 user_id 42 email "alice@example.com"
EXPIRE session:abc123 1800

# On every request, refresh the TTL (sliding expiration)
EXPIRE session:abc123 1800
# → Session stays alive as long as the user is active
# → Expires 30 minutes after the LAST request

# Check remaining TTL
TTL session:abc123
# → 1742 (seconds remaining)

# Absolute expiration (session dies at a fixed time regardless)
EXPIREAT session:abc123 1705363200
# → Expires at this Unix timestamp, no matter what

Sticky Sessions vs Centralized Session Store

Approach	How It Works	Pros	Cons
Sticky Sessions	Load balancer routes a user to the same server every time. Session stored in server memory.	No external dependency, very fast reads	Server failure loses all sessions. Can't scale horizontally. Uneven load distribution.
Centralized (Redis)	All servers read/write sessions from a shared Redis instance. Any server can handle any request.	Survives server failures. Horizontal scaling. Even load distribution.	Network hop for every session read. Redis is a dependency.

💡 Production Standard

Centralized session storage with Redis is the industry standard for any system with more than one application server. Sticky sessions are a legacy pattern that breaks auto-scaling, rolling deploys, and fault tolerance.

Leaderboards

Redis Sorted Sets are purpose-built for leaderboards. Each member has a score, and Redis maintains them in sorted order automatically. Rank lookups, score updates, and range queries are all O(log N) — fast enough for millions of entries in real time.

🏆

The Scoreboard

A Redis Sorted Set is like a digital scoreboard that automatically re-sorts itself every time a score changes. You don't need to sort manually — just update a player's score and the board instantly reflects the new ranking. You can ask 'What's the top 10?' or 'What rank is player X?' and get answers in microseconds, even with millions of players.

Building a Real-Time Leaderboard

Leaderboard — Core Commandsbash

# Add players with scores
ZADD leaderboard 1500 "alice"
ZADD leaderboard 2300 "bob"
ZADD leaderboard 1800 "charlie"
ZADD leaderboard 3100 "diana"
ZADD leaderboard 2750 "eve"

# Get top 5 players (highest scores first)
ZREVRANGE leaderboard 0 4 WITHSCORES
# → diana: 3100, eve: 2750, bob: 2300, charlie: 1800, alice: 1500

# Get a specific player's rank (0-indexed, highest first)
ZREVRANK leaderboard "bob"
# → 2 (3rd place, 0-indexed)

# Get a specific player's score
ZSCORE leaderboard "bob"
# → 2300

# Get players ranked 10th to 20th (paginated)
ZREVRANGE leaderboard 9 19 WITHSCORES

# Count total players
ZCARD leaderboard
# → 5

Updating Scores Atomically

Atomic Score Updates with ZINCRBYbash

# Player "alice" scores 200 more points
ZINCRBY leaderboard 200 "alice"
# → 1700 (returns new score)
# The sorted set automatically re-ranks alice

# This is atomic — no read-modify-write race condition
# Even with 1000 concurrent score updates, every increment is applied

# Replace score entirely (ZADD overwrites)
ZADD leaderboard 5000 "alice"
# → alice now has score 5000, re-ranked to #1

# Conditional update: only if new score is higher (GT flag, Redis 6.2+)
ZADD leaderboard GT 4000 "alice"
# → Ignored because 4000 < 5000 (current score)

ZADD leaderboard GT 6000 "alice"
# → Updated to 6000 because 6000 > 5000

Paginated Leaderboards

Pagination with ZREVRANGEbash

# Page size: 10 players per page

# Page 1 (ranks 1-10)
ZREVRANGE leaderboard 0 9 WITHSCORES

# Page 2 (ranks 11-20)
ZREVRANGE leaderboard 10 19 WITHSCORES

# Page 3 (ranks 21-30)
ZREVRANGE leaderboard 20 29 WITHSCORES

# Generic formula:
# Page N (1-indexed): ZREVRANGE leaderboard (N-1)*pageSize N*pageSize-1

# "Show me my neighborhood" — players around a specific rank
rank = ZREVRANK leaderboard "alice"    # e.g., 42
ZREVRANGE leaderboard (rank - 5) (rank + 5) WITHSCORES
# → Shows 11 players: 5 above alice, alice, 5 below alice

🎯 Interview Insight

Leaderboards are a classic Redis interview question. The key insight: Sorted Sets give you O(log N) for add, update, rank lookup, and range queries. For a leaderboard with 10 million players, that's about 23 operations — microseconds. No SQL ORDER BY can compete at this scale.

Pub/Sub

Redis Pub/Sub is a messaging system where publishers send messages to channels and subscribers receive them in real time. It's fire-and-forget — messages are delivered to connected subscribers instantly but are not persisted.

📻

The Radio Station

Pub/Sub is like a live radio broadcast. The station (publisher) broadcasts a message. Anyone tuned in (subscriber) hears it immediately. But if your radio is off (disconnected), you miss the broadcast entirely — there's no recording, no replay. This is fundamentally different from a message queue (like a voicemail box) where messages wait for you.

Core Commands

Pub/Sub — Basic Usagebash

# Terminal 1 — Subscribe to a channel
SUBSCRIBE notifications
# Waiting for messages...

# Terminal 2 — Subscribe to the same channel
SUBSCRIBE notifications
# Waiting for messages...

# Terminal 3 — Publish a message
PUBLISH notifications "New order received: #12345"
# → (integer) 2  (delivered to 2 subscribers)

# Both Terminal 1 and Terminal 2 receive:
# 1) "message"
# 2) "notifications"
# 3) "New order received: #12345"

# Pattern-based subscription (PSUBSCRIBE)
PSUBSCRIBE orders.*
# Matches: orders.created, orders.shipped, orders.cancelled

PUBLISH orders.created '{"id": 123, "total": 59.99}'
PUBLISH orders.shipped '{"id": 120, "carrier": "FedEx"}'
# Both messages delivered to the pattern subscriber

Pub/Sub vs Streams — The Critical Difference

Feature	Pub/Sub	Streams
Persistence	None — fire and forget	Yes — messages stored on disk
Missed messages	Lost forever if subscriber is disconnected	Can read from any point in history
Consumer groups	No — all subscribers get all messages	Yes — messages distributed across group members
Acknowledgment	No — no delivery guarantee	Yes — XACK confirms processing
Backpressure	None — slow subscribers get overwhelmed	Built-in — consumers read at their own pace
Use case	Real-time notifications, live dashboards	Task queues, event sourcing, reliable messaging

🚨 Pub/Sub Is NOT a Message Queue

This is the most common misconception. Pub/Sub has no persistence, no acknowledgment, and no replay. If a subscriber disconnects for even a second, it misses every message published during that time. For reliable messaging, use Redis Streams or a dedicated message broker (Kafka, RabbitMQ).

Use Cases

Good Use Cases for Pub/Sub

✅Live notifications: push alerts to connected users in real time
✅Real-time dashboards: broadcast metric updates to all dashboard clients
✅Cache invalidation: publish invalidation events so all app servers clear their local caches
✅Chat presence: broadcast 'user online/offline' status to connected clients
✅Configuration updates: notify all servers when a feature flag changes

When NOT to Use Pub/Sub

❌Task queues: use Streams or a dedicated queue — you need acknowledgment and retry
❌Event sourcing: use Streams or Kafka — you need message history
❌Critical notifications: if missing a message is unacceptable, Pub/Sub is the wrong tool
❌High-volume processing: slow subscribers will drop messages with no backpressure

Lua Scripting & Transactions

Redis is single-threaded — each command is atomic. But what about multi-step operations? If you need to read a value, compute something, and write back, another client could modify the value between your read and write. Redis provides two mechanisms for atomic multi-command execution: Lua scripts and MULTI/EXEC transactions.

Why Lua: Atomic Multi-Command Execution

A Lua script runs entirely on the Redis server. While the script executes, no other command can run — the entire script is atomic. This eliminates race conditions between read and write operations.

The Problem Without Luabash

# Goal: increment a counter only if it's below a limit
# Without Lua — race condition:

# Client A                          # Client B
GET counter  → 99                   GET counter  → 99
# 99 < 100, proceed                 # 99 < 100, proceed
INCR counter → 100                  INCR counter → 101  ← EXCEEDS LIMIT!

# Both clients read 99, both decide to increment
# Result: counter = 101, limit violated

The Fix With Lualua

-- Atomic check-and-increment (Lua script)
-- KEYS[1] = counter key
-- ARGV[1] = limit

local current = tonumber(redis.call('GET', KEYS[1]) or 0)
if current < tonumber(ARGV[1]) then
  return redis.call('INCR', KEYS[1])
else
  return -1  -- limit reached
end

-- Usage:
-- EVAL "..." 1 counter 100
-- This entire script runs atomically — no interleaving possible

EVAL and EVALSHA

EVAL vs EVALSHAbash

# EVAL — send the full script every time
EVAL "return redis.call('GET', KEYS[1])" 1 mykey
# Works but sends the entire script text on every call

# EVALSHA — send only the script's SHA1 hash
# Step 1: Load the script (returns SHA1 hash)
SCRIPT LOAD "return redis.call('GET', KEYS[1])"
# → "a42059b356c875f0717db19a51f6aaa9161571a2"

# Step 2: Call by hash (much less bandwidth)
EVALSHA "a42059b356c875f0717db19a51f6aaa9161571a2" 1 mykey

# Redis caches loaded scripts in memory
# EVALSHA is preferred in production — less network overhead
# If the script isn't cached (server restart), fall back to EVAL

MULTI/EXEC Transactions

MULTI/EXEC groups commands into a transaction. All commands between MULTI and EXEC are queued and executed atomically. But unlike Lua, you cannot read a value and use it in a later command within the same transaction — commands are queued blindly.

MULTI/EXEC Transactionbash

# Transfer 100 points from alice to bob — atomically
MULTI
DECRBY wallet:alice 100
INCRBY wallet:bob 100
EXEC
# Both commands execute atomically — no partial state visible
# Either both succeed or neither does

# IMPORTANT: You CANNOT do this with MULTI/EXEC:
MULTI
GET wallet:alice          # Returns "QUEUED", not the actual value!
# Can't check if alice has enough balance here
DECRBY wallet:alice 100   # Runs blindly
INCRBY wallet:bob 100
EXEC
# → No conditional logic possible inside MULTI/EXEC

WATCH for Optimistic Locking

WATCH — Optimistic Lockingbash

# WATCH monitors keys for changes before EXEC
# If any watched key is modified by another client,
# EXEC returns nil (transaction aborted)

WATCH wallet:alice
balance = GET wallet:alice          # → 500

if balance >= 100:
  MULTI
  DECRBY wallet:alice 100
  INCRBY wallet:bob 100
  EXEC
  # If another client modified wallet:alice after WATCH,
  # EXEC returns nil — transaction aborted, retry
else:
  UNWATCH
  # Insufficient balance

# This is optimistic locking:
# → Assume no conflict, proceed optimistically
# → If conflict detected at EXEC time, abort and retry
# → Works well when conflicts are rare

When to Prefer Lua Over MULTI/EXEC

Feature	Lua Scripts	MULTI/EXEC
Conditional logic	Yes — full if/else, loops, math	No — commands are queued blindly
Read-then-write	Yes — read a value and use it in the same script	No — GET returns 'QUEUED', not the value
Atomicity	Full — entire script is atomic	Full — all commands execute together
Performance	One round trip (script runs server-side)	Multiple round trips (MULTI, commands, EXEC)
Complexity	Requires Lua knowledge	Simple Redis commands
Use case	Complex logic: rate limiting, conditional updates, CAS operations	Simple batching: transfer funds, update multiple keys together

🎯 Rule of Thumb

If you need to read a value and make a decision based on it within the same atomic operation, use Lua. If you just need to batch multiple writes together, MULTI/EXEC is simpler. In practice, Lua scripts are used far more often in production because most real operations involve conditional logic.

Interview Questions

These questions test whether you understand how Redis patterns work in real systems and can reason about their trade-offs.

Q:How would you implement a distributed lock with Redis? What are the pitfalls?

A: Use the atomic SET command: SET lock-key uuid NX EX 30. NX ensures only one client acquires the lock. EX sets an auto-expiry so the lock is released if the holder crashes. The value must be a unique identifier (UUID) so only the holder can release it — release via a Lua script that checks the value before DEL. Pitfalls: (1) SETNX + EXPIRE as two separate commands has a race condition — if the process dies between them, the lock is held forever. (2) Lock expiry while work is still in progress — another client acquires the lock, leading to two holders. Mitigate with a watchdog that extends the TTL. (3) Single Redis instance is a single point of failure — Redlock uses 5 independent instances with majority quorum. (4) Kleppmann's critique: GC pauses or network delays can cause the lock to expire without the holder knowing. Fencing tokens provide an additional safety layer.

Q:Design a rate limiter using Redis. Which algorithm would you choose and why?

A: For most use cases, I'd use the sliding window with Sorted Sets. For each request: (1) ZREMRANGEBYSCORE to remove entries older than the window, (2) ZCOUNT to check the current count, (3) ZADD to record the new request if under the limit. This eliminates the boundary spike problem of fixed window counters. For APIs that need to allow controlled bursts (e.g., 100 requests/minute but allow a burst of 20 in one second), I'd use a token bucket implemented as a Lua script — it tracks tokens and refill rate in a Hash, and the Lua script atomically computes available tokens and consumes them. The Lua script is critical because the read-compute-write cycle must be atomic to prevent race conditions under concurrent requests.

Q:What is cache stampede and how do you prevent it?

A: Cache stampede (thundering herd) occurs when a popular cache key expires and hundreds of concurrent requests all try to rebuild it simultaneously, overwhelming the database with identical queries. Prevention: (1) Mutex lock — the first request acquires a lock (SET lock:key uuid NX EX 5) and rebuilds the cache. Other requests wait or serve stale data. This turns N database queries into 1. (2) Probabilistic early expiry — each request has a small chance of refreshing the cache before the TTL actually expires, spreading the refresh load over time. (3) Background refresh — a worker proactively refreshes hot keys before they expire, so the cache never goes empty. In production, combine mutex locks with background refresh for hot keys.

Q:When would you use Redis Pub/Sub vs Redis Streams?

A: Pub/Sub is fire-and-forget: messages are delivered to connected subscribers instantly but not persisted. If a subscriber disconnects, it misses all messages published during the disconnection. Use Pub/Sub for real-time notifications, live dashboards, cache invalidation broadcasts, and chat presence — scenarios where missing a message is acceptable. Streams are persistent: messages are stored and can be read from any point in history. They support consumer groups (messages distributed across workers), acknowledgment (XACK), and backpressure. Use Streams for task queues, event sourcing, reliable messaging, and any scenario where every message must be processed exactly once. The critical difference: Pub/Sub is a broadcast channel, Streams is a durable log.

Q:How do Redis Sorted Sets power a real-time leaderboard at scale?

A: Sorted Sets maintain members in score-sorted order with O(log N) operations. ZADD adds or updates a player's score. ZINCRBY atomically increments a score (no read-modify-write race). ZREVRANGE returns the top N players by score. ZREVRANK returns a specific player's rank. For a leaderboard with 10 million players: adding a score is O(log 10M) ≈ 23 operations — microseconds. Pagination uses ZREVRANGE with offset: page 2 is ZREVRANGE key 10 19. For 'show my neighborhood,' get the player's rank with ZREVRANK, then ZREVRANGE from rank-5 to rank+5. This is impossible to match with SQL ORDER BY at this scale — a database would need to sort millions of rows on every query.

Common Mistakes

These mistakes are common in production Redis deployments and have caused real outages.

🔓

Using SETNX + EXPIRE for distributed locks

Two separate commands to acquire a lock and set its expiry. If the process crashes between SETNX and EXPIRE, the lock is held forever — a deadlock that requires manual intervention. This race condition has caused production outages at companies of every size.

✅Always use the atomic SET command: SET key value NX EX seconds. This sets the value and expiry in a single atomic operation — no race condition possible. For release, use a Lua script that checks the value (UUID) before deleting to prevent releasing someone else's lock.

📻

Using Pub/Sub as a message queue

Teams use Pub/Sub expecting reliable message delivery. A subscriber disconnects for 5 seconds during a deploy, and misses every message published during that window. No retry, no replay, no acknowledgment. Critical events are silently lost.

✅Use Redis Streams for any scenario where messages must be reliably processed. Streams persist messages, support consumer groups for load distribution, and provide XACK for acknowledgment. Pub/Sub is only appropriate for fire-and-forget broadcasts where missing messages is acceptable.

⏱️

Setting lock TTL too short or too long

TTL too short: the lock expires while the holder is still processing. Another client acquires the lock, and two processes operate on the shared resource simultaneously — data corruption. TTL too long: if the holder crashes, the resource is blocked for the entire TTL duration. Other processes wait minutes for a lock that will never be released.

✅Set the TTL to 2-3x the expected operation duration. Implement a watchdog pattern: a background thread extends the TTL periodically while the holder is still alive. If the holder crashes, the watchdog stops, and the lock expires naturally. Libraries like Redisson implement this automatically.

🏗️

Not using Lua for multi-step atomic operations

Implementing rate limiting or conditional updates with separate GET and SET commands. Under concurrency, multiple clients read the same stale value and all proceed — the rate limit is violated, the counter overflows, or the conditional check is bypassed.

✅Any operation that reads a value and makes a decision based on it must be atomic. Use a Lua script: the entire read-compute-write cycle runs as a single atomic operation on the Redis server. No other command can interleave. For simple batching without conditional logic, MULTI/EXEC is sufficient.