StringsListsHashesSetsSorted SetsBitmapsHyperLogLogStreams

Redis Data Structures

Redis's power comes from its data structures — Strings, Lists, Hashes, Sets, Sorted Sets, Bitmaps, HyperLogLog, and Streams. Learn when and why to use each.

35 min read9 sections
01

Why Data Structures Matter

Most developers first encounter Redis as a "fast key-value cache." You store a string, you get a string back, it's faster than your database. That mental model works for simple caching — but it misses what makes Redis genuinely powerful.

Redis is not a key-value store. It's a remote data structure server. Every key in Redis maps to a typed data structure — a list, a hash map, a sorted set, a stream — and Redis gives you atomic operations on those structures. This is the mental model shift that unlocks Redis's real capabilities.

🧰

The Toolbox vs The Hammer

Thinking of Redis as a key-value cache is like owning a full toolbox but only using the hammer. Yes, the hammer (Strings) works for many jobs. But when you need to maintain a leaderboard, a sorted set is a precision tool that does it in O(log N) with zero application code. When you need a job queue, a list with blocking pops gives you a reliable queue without deploying RabbitMQ. The data structures ARE the features — Redis just happens to serve them over the network at microsecond latency.

🔑 The Key Insight

In a traditional cache, you serialize your data to a string, store it, retrieve it, deserialize it, modify it in your application, re-serialize it, and store it again. With Redis data structures, you modify the data in-place on the server. HINCRBY increments a field in a hash without reading the whole object. ZADD inserts into a sorted set without fetching and re-sorting. The server does the work — your application just sends commands.

ApproachHow It WorksNetwork CallsAtomicity
Key-Value (serialize/deserialize)GET → deserialize → modify → serialize → SET2 (GET + SET)Not atomic — race conditions possible
Redis Data StructuresSingle command: HINCRBY, ZADD, LPUSH1Atomic — Redis is single-threaded per command

Quick Reference — Which Structure for Which Job

⚡ Simple Values & Counters

  • Strings — counters, locks, session tokens, caching
  • Bitmaps — boolean flags for millions of entities
  • HyperLogLog — approximate unique counts (12 KB fixed)

🗂️ Structured Data

  • Hashes — objects with partial read/write
  • Lists — queues, stacks, recent activity
  • Sets — unique collections, intersections
  • Sorted Sets — leaderboards, priority queues
  • Streams — event logs, reliable messaging
Mental Model — Cache vs Data Structure Servertext
Traditional cache (Memcached-style):
  Application: user = JSON.parse(cache.get("user:42"))
  Application: user.loginCount += 1
  Application: cache.set("user:42", JSON.stringify(user))
  Problem: if two servers do this simultaneously, one increment is lost.

Redis data structure server:
  Application: cache.hincrby("user:42", "loginCount", 1)
  Done. Atomic. No read-modify-write cycle. No race condition.
  Redis increments the field in-place on the server.

This pattern repeats for every data structure:
  Leaderboard?  ZADD / ZINCRBYsorted set handles ranking
  Job queue?    LPUSH / BRPOPlist handles FIFO ordering
  Unique count? PFADD / PFCOUNTHyperLogLog handles cardinality
  Feature flag? SET with NX/EXstring handles atomic check-and-set
02

Strings

Strings are the simplest and most versatile Redis data type. Despite the name, a Redis string is binary safe — it can hold any sequence of bytes up to 512 MB. That means it stores serialized JSON, integers, floats, raw binary data, or even images (though you probably shouldn't).

When the value is an integer or float, Redis provides atomic increment and decrement operations. This makes strings the foundation for counters, rate limiters, and feature flags.

Key Commands

Strings — Core Commandstext
# Basic get/set
SET user:42:name "Alice"          # Store a string
GET user:42:name                  # → "Alice"

# Integer operations (atomic)
SET page:views 0                  # Initialize counter
INCR page:views                   # → 1 (atomic increment by 1)
INCRBY page:views 10              # → 11 (atomic increment by N)
DECR page:views                   # → 10 (atomic decrement by 1)
DECRBY page:views 3               # → 7

# Float operations
SET temperature 36.6
INCRBYFLOAT temperature 0.5       # → "37.1"

# String manipulation
APPEND user:42:log " login"       # Append to existing value
GETRANGE user:42:name 0 2         # → "Ali" (substring)
STRLEN user:42:name               # → 5

# Conditional set
SET lock:order:99 "server-1" NX   # Set only if key does NOT exist
SET lock:order:99 "server-1" XX   # Set only if key DOES exist

# Set with expiration (atomic)
SET session:abc123 "user:42" EX 3600   # Expires in 3600 seconds
SET session:abc123 "user:42" PX 5000   # Expires in 5000 milliseconds

TTL Management

Strings — TTL Commandstext
# Set expiration on existing key
SET mykey "hello"
EXPIRE mykey 60                   # Expire in 60 seconds
EXPIREAT mykey 1735689600         # Expire at Unix timestamp

# Check remaining TTL
TTL mykey                         # → seconds remaining (-1 = no expiry, -2 = key gone)
PTTL mykey                        # → milliseconds remaining

# Remove expiration (make persistent)
PERSIST mykey                     # Remove TTLkey lives forever

# Set with expiration atomically (preferred)
SET mykey "hello" EX 60           # Always prefer this over SET + EXPIRE

🎯 Always Use SET with EX/PX

Never use SET followed by a separate EXPIRE command. If your application crashes between the two commands, you'll have a key with no expiration — a memory leak. SET with EX/PX is atomic: either both happen or neither does.

Use Cases

When to Use Strings

  • Counters — page views, API rate limits, login attempts (INCR/DECR are atomic)
  • Session tokens — store session ID → user ID with TTL for auto-expiration
  • Feature flags — SET flag:dark-mode 1 with NX for atomic check-and-set
  • Simple caching — serialized JSON responses with TTL
  • Distributed locks — SET lock:resource NX EX 30 (set-if-not-exists with timeout)
Rate Limiter with Stringstext
# Simple rate limiter: max 100 requests per minute per user
# Key: ratelimit:{user_id}:{current_minute}

function checkRateLimit(userId):
  key = "ratelimit:" + userId + ":" + currentMinute()
  count = INCR key                # Atomic increment
  if count == 1:
    EXPIRE key 60                 # First requestset 60s TTL
  if count > 100:
    return 429 Too Many Requests  # Rate limit exceeded
  return OK                       # Request allowed
03

Lists

Redis Lists are doubly linked lists of strings. They support push and pop operations from both ends in O(1), making them ideal for queues, stacks, and activity feeds. Accessing elements by index is O(N) — this is a linked list, not an array.

📋

The Restaurant Order Queue

Think of a Redis List as the order queue in a restaurant kitchen. New orders are pushed to the back (RPUSH). The chef picks up orders from the front (LPOP). If no orders are waiting, the chef can block and wait (BLPOP) instead of constantly checking. Multiple chefs (consumers) can pull from the same queue — each order is processed exactly once. This is exactly how Redis Lists power job queues in production.

Key Commands

Lists — Core Commandstext
# Push elements
LPUSH queue:jobs "job-1"          # Push to head (left)
RPUSH queue:jobs "job-2"          # Push to tail (right)
RPUSH queue:jobs "job-3" "job-4"  # Push multiple at once

# Pop elements
LPOP queue:jobs                   # Pop from head"job-1"
RPOP queue:jobs                   # Pop from tail"job-4"

# Blocking pop (wait for datakey for job queues)
BLPOP queue:jobs 30               # Block up to 30s until data arrives
BRPOP queue:jobs 0                # Block indefinitely until data arrives

# Read without removing
LRANGE queue:jobs 0 -1            # Get ALL elements (0 to last)
LRANGE queue:jobs 0 9             # Get first 10 elements
LINDEX queue:jobs 0               # Get element at index 0

# List metadata
LLEN queue:jobs                   # Number of elements

# Insert relative to existing element
LINSERT queue:jobs BEFORE "job-3" "job-2.5"

# Trim (keep only a rangeuseful for capped lists)
LTRIM activity:user:42 0 99      # Keep only the 100 most recent items

Use Cases

When to Use Lists

  • Job queues — RPUSH to enqueue, BLPOP to dequeue (blocking wait for workers)
  • Activity feeds — LPUSH new events, LRANGE 0 19 to get the latest 20
  • Message queues — lightweight alternative to RabbitMQ for simple producer/consumer
  • Recent items — LPUSH + LTRIM to maintain a fixed-size list of recent actions
  • Undo stacks — LPUSH actions, LPOP to undo the most recent

When NOT to Use Lists

  • Random access by index — LINDEX is O(N), use Hashes or Sorted Sets instead
  • Large lists with frequent mid-list operations — LINSERT is O(N)
  • When you need deduplication — Lists allow duplicates, use Sets instead
  • When you need ordering by score — use Sorted Sets instead
Job Queue Pattern with Liststext
# Producer: add jobs to the queue
RPUSH queue:emails '{"to":"alice@example.com","subject":"Welcome"}'
RPUSH queue:emails '{"to":"bob@example.com","subject":"Invoice"}'

# Consumer (worker): block-wait for jobs
# BLPOP returns the queue name and the value
# If no job is available, the worker sleeps (no CPU waste)

loop:
  result = BLPOP queue:emails 30     # Wait up to 30 seconds
  if result is null:
    continue                          # Timeoutcheck again
  job = JSON.parse(result[1])
  sendEmail(job.to, job.subject)      # Process the job

# Multiple workers can BLPOP the same queue
# Redis guarantees each job goes to exactly one worker

# Reliable queue pattern (RPOPLPUSH / LMOVE):
# Pop from main queue, push to processing queue
LMOVE queue:emails queue:processing LEFT RIGHT
# If worker crashes, items in queue:processing can be recovered

💡 BLPOP vs Polling

Without BLPOP, workers would need to poll the queue in a tight loop: "Any jobs? No. Any jobs? No. Any jobs?..." This wastes CPU and network. BLPOP makes the worker sleep until data arrives — zero CPU usage while waiting. This is why Redis Lists are a legitimate lightweight alternative to dedicated message brokers for simple queue patterns.

04

Hashes

Redis Hashes are maps of field-value pairs stored under a single key — essentially a mini key-value store inside a key. They're the natural way to represent objects: a user profile, a session, a product. Instead of serializing the entire object to a JSON string, you store each field separately and can read or update individual fields without touching the rest.

📇

The Index Card

A Redis Hash is like an index card in a filing cabinet. The key is the card's label ('user:42'). Each field on the card is a property: name, email, loginCount. You can read just the email without pulling the entire card. You can increment loginCount without rewriting the whole card. And if the card is small enough (few fields, short values), Redis stores it in a compressed format (listpack) that uses less memory than separate keys.

Key Commands

Hashes — Core Commandstext
# Set fields
HSET user:42 name "Alice" email "alice@example.com" age 30
HSET user:42 loginCount 0        # Add/update a single field

# Get fields
HGET user:42 name                # → "Alice" (single field)
HMGET user:42 name email         # → ["Alice", "alice@example.com"] (multiple)
HGETALL user:42                  # → all fields and values (use carefully on large hashes)

# Check existence
HEXISTS user:42 email            # → 1 (exists) or 0

# Atomic increment (no read-modify-write needed)
HINCRBY user:42 loginCount 1     # → 1 (atomic increment)
HINCRBY user:42 age 1            # → 31
HINCRBYFLOAT user:42 balance 9.99

# Delete fields
HDEL user:42 temporaryField      # Remove a single field

# Metadata
HLEN user:42                     # Number of fields
HKEYS user:42                    # All field names
HVALS user:42                    # All values

Hash vs JSON String

AspectHash (HSET/HGET)JSON String (SET/GET)
Partial readHGET user:42 email — reads one fieldGET user:42 → deserialize → access .email
Partial updateHSET user:42 email new@email.comGET → deserialize → modify → serialize → SET
Atomic incrementHINCRBY user:42 views 1 — single commandGET → parse → increment → stringify → SET (race condition)
Memory (small objects)Listpack encoding — very compactJSON overhead (quotes, braces, colons)
Memory (large objects)Hashtable encoding — more overhead per fieldSingle allocation — more compact for 100+ fields
Nested objectsNot supported — flat field-value pairs onlyFull JSON nesting supported
TTL granularityTTL on the whole hash only, not per fieldTTL on the whole key

Memory Efficiency: Listpack Encoding

When a hash has fewer than 128 fields and all values are shorter than 64 bytes, Redis stores it as a listpack — a compact, sequential data structure that uses significantly less memory than a full hashtable. This is why storing user objects as hashes is often more memory-efficient than individual string keys.

Memory Comparisontext
# Approach 1: Separate string keys (wasteful)
SET user:42:name "Alice"          # Each key has ~50 bytes of overhead
SET user:42:email "alice@ex.com"  # (key metadata, expiry pointer, etc.)
SET user:42:age "30"
# 3 keys × ~90 bytes overhead = ~270 bytes of overhead

# Approach 2: Single hash (efficient)
HSET user:42 name "Alice" email "alice@ex.com" age 30
# 1 key × ~90 bytes overhead = ~90 bytes of overhead
# Fields stored in listpackcompact sequential memory
# Savings: ~60% less memory for small objects

# Approach 3: JSON string
SET user:42 '{"name":"Alice","email":"alice@ex.com","age":30}'
# 1 key, but JSON syntax adds ~20 bytes of overhead
# Cannot update individual fields atomically

# Rule of thumb:
#   < 100 fieldsHash (listpack encoding, partial updates)
#   > 100 fields or nested dataJSON string
#   Individual fields needed independentlyseparate string keys

When to Use Hashes

  • User profiles — store name, email, preferences as fields; update individually
  • Session data — store session fields; increment request counts atomically
  • Configuration — store feature flags as fields in a single hash
  • Object caching — when you need partial reads/updates (not the whole object every time)
  • Counters per entity — HINCRBY for multiple counters on one object (views, likes, shares)
05

Sets & Sorted Sets

Sets — Unordered Unique Collections

A Redis Set is an unordered collection of unique strings. Adding the same element twice has no effect. Sets support powerful operations like union, intersection, and difference — making them ideal for tags, friend lists, and unique visitor tracking.

Sets — Core Commandstext
# Add members
SADD tags:article:1 "redis" "database" "caching"
SADD tags:article:2 "redis" "performance" "scaling"

# Remove members
SREM tags:article:1 "caching"

# Check membership
SISMEMBER tags:article:1 "redis"  # → 1 (yes)
SISMEMBER tags:article:1 "java"   # → 0 (no)

# Get all members
SMEMBERS tags:article:1           # → ["redis", "database"]

# Set size
SCARD tags:article:1              # → 2

# Set operations (the real power)
SUNION tags:article:1 tags:article:2
  # → ["redis", "database", "performance", "scaling"]

SINTER tags:article:1 tags:article:2
  # → ["redis"] (common tags)

SDIFF tags:article:1 tags:article:2
  # → ["database"] (in article:1 but not article:2)

# Random member (useful for sampling)
SRANDMEMBER tags:article:1        # → random tag
SPOP tags:article:1               # → random tag AND removes it

When to Use Sets

  • Unique visitors — SADD visitors:2024-01-15 userId (automatic dedup)
  • Tags and categories — SADD + SINTER to find articles with common tags
  • Friend lists — SADD friends:alice bob; SINTER friends:alice friends:bob for mutual friends
  • Online users — SADD online userId; SREM online userId; SCARD online for count
  • Deduplication — check SISMEMBER before processing to avoid duplicate work

Sorted Sets — Ordered by Score

Sorted Sets (ZSets) are like Sets, but every member has an associated float score. Members are unique, but scores can repeat. Redis keeps the set ordered by score at all times, giving you O(log N) inserts and range queries. Under the hood, Sorted Sets use a skip list — a probabilistic data structure that provides balanced-tree-like performance with simpler implementation.

🏆

The Leaderboard

A Sorted Set is a live leaderboard. Each player (member) has a score. When a player scores points, ZINCRBY updates their score and Redis automatically re-ranks them. ZREVRANGE 0 9 gives you the top 10 instantly. ZRANK tells you any player's current position. No sorting needed — Redis maintains the order continuously. This is why every gaming leaderboard, trending list, and priority queue in production uses Sorted Sets.

Sorted Sets — Core Commandstext
# Add members with scores
ZADD leaderboard 1500 "alice"
ZADD leaderboard 2200 "bob"
ZADD leaderboard 1800 "charlie"
ZADD leaderboard 2200 "diana"     # Same score as boballowed

# Increment score (atomic)
ZINCRBY leaderboard 300 "alice"   # alice: 15001800

# Get by rank (ascendinglowest score first)
ZRANGE leaderboard 0 -1           # → ["alice", "charlie", "bob", "diana"]
ZRANGE leaderboard 0 -1 WITHSCORES
  # → ["alice", "1800", "charlie", "1800", "bob", "2200", "diana", "2200"]

# Get by rank (descendinghighest score first)
ZREVRANGE leaderboard 0 2         # Top 3 → ["bob", "diana", "charlie"]

# Get rank of a member
ZRANK leaderboard "alice"         # → 0 (ascending rank, 0-indexed)
ZREVRANK leaderboard "alice"      # → 3 (descending rank)

# Get score
ZSCORE leaderboard "bob"          # → "2200"

# Range by score
ZRANGEBYSCORE leaderboard 1800 2200
  # → ["alice", "charlie", "bob", "diana"] (score between 1800-2200)

# Count members in score range
ZCOUNT leaderboard 1800 2200      # → 4

# Remove members
ZREM leaderboard "diana"

# Remove by rank (trim to top N)
ZREMRANGEBYRANK leaderboard 0 -11 # Keep only top 10

Skip List Internals

Skip List — How Sorted Sets Worktext
# A skip list is a layered linked list with "express lanes"
# Think of it like a subway system with local and express trains

Level 3:  HEAD ────────────────────────── 2200 ──── NIL
Level 2:  HEAD ──────── 1500 ──────────── 2200 ──── NIL
Level 1:  HEAD ── 1000 ── 1500 ── 1800 ── 2200 ──── NIL
Level 0:  HEAD ── 1000 ── 1500 ── 1800 ── 2000 ── 2200 ── NIL

# To find score 1800:
#   Start at Level 3: HEAD2200 (too far) → drop down
#   Level 2: HEAD1500 (ok) → 2200 (too far) → drop down
#   Level 1: 15001800Found!

# Average complexity: O(log N) for insert, delete, and lookup
# Worst case: O(N) — but probabilistically very unlikely

# Why skip list instead of balanced tree (AVL/Red-Black)?
#   1. Simpler implementationno rotations
#   2. Better cache locality for range queries
#   3. Easier to make concurrent (though Redis is single-threaded)
#   4. ZRANGE is a simple linked list traversal at the bottom level

Sorted Set Use Cases

When to Use Sorted Sets

  • Leaderboards — ZADD/ZINCRBY for scores, ZREVRANGE for top N, ZREVRANK for player position
  • Rate limiting (sliding window) — ZADD with timestamp as score, ZREMRANGEBYSCORE to expire old entries
  • Priority queues — score = priority, ZPOPMIN to dequeue highest priority
  • Trending/hot content — score = engagement metric, ZREVRANGE for trending items
  • Delayed job scheduling — score = execution timestamp, poll ZRANGEBYSCORE for due jobs
  • Time-series data — score = timestamp, ZRANGEBYSCORE for time range queries
Sliding Window Rate Limiter with Sorted Setstext
# Rate limit: max 100 requests per 60-second sliding window
# Key: ratelimit:{user_id}

function checkRateLimit(userId):
  key = "ratelimit:" + userId
  now = currentTimestampMs()
  windowStart = now - 60000          # 60 seconds ago

  # Start a pipeline (atomic)
  MULTI
    ZREMRANGEBYSCORE key 0 windowStart   # Remove expired entries
    ZADD key now now                      # Add current request (score = timestamp)
    ZCARD key                             # Count requests in window
    EXPIRE key 60                         # Auto-cleanup
  EXEC

  count = result[2]                  # ZCARD result
  if count > 100:
    return 429 Too Many Requests
  return OK

# Why this is better than the string counter approach:
#   String counter: fixed 60s window, resets at boundary (allows burst)
#   Sorted set: true sliding window, no boundary burst problem

🎯 Interview Insight

Sorted Sets are the most interview-relevant Redis data structure. "Design a leaderboard" and "implement rate limiting" are both solved elegantly with Sorted Sets. Know ZADD, ZINCRBY, ZREVRANGE, ZRANK, and ZRANGEBYSCORE by heart.

06

Bitmaps & HyperLogLog

Bitmaps — Bit-Level Operations

Bitmaps aren't a separate data type — they're strings that Redis lets you manipulate at the bit level. Each bit position can be 0 or 1, and you can set, get, and count bits efficiently. The killer use case: tracking boolean states for millions of entities using almost no memory.

📊

The Attendance Sheet

Imagine a school attendance sheet where each student has a number (0 to 999). Instead of writing each student's name and 'present/absent', you have a row of 1,000 checkboxes. Check box #42 means student 42 is present. The entire attendance for 1,000 students fits in 125 bytes (1,000 bits). For 100 million users, daily active user tracking takes just 12.5 MB. That's the power of bitmaps — one bit per entity.

Bitmaps — Core Commandstext
# Set individual bits
SETBIT daily:active:2024-01-15 42 1      # User 42 was active today
SETBIT daily:active:2024-01-15 99 1      # User 99 was active today
SETBIT daily:active:2024-01-15 1000000 1 # User 1M was active today

# Get individual bits
GETBIT daily:active:2024-01-15 42        # → 1 (active)
GETBIT daily:active:2024-01-15 50        # → 0 (not active)

# Count set bits (population count)
BITCOUNT daily:active:2024-01-15         # → 3 (total active users)

# Bitwise operations across multiple bitmaps
# Users active on BOTH Jan 15 AND Jan 16:
BITOP AND active:both daily:active:2024-01-15 daily:active:2024-01-16

# Users active on Jan 15 OR Jan 16:
BITOP OR active:either daily:active:2024-01-15 daily:active:2024-01-16

# Count users active on both days:
BITCOUNT active:both
Bitmap Memory — The Mathtext
# Memory usage: ceil(max_user_id / 8) bytes

Users: 1,000125 bytes        (negligible)
Users: 1,000,000125 KB           (tiny)
Users: 100,000,00012.5 MB          (one bitmap per day)
Users: 1,000,000,000125 MB           (still manageable)

# 365 days of DAU tracking for 100M users:
365 × 12.5 MB = ~4.5 GB

# Compare to a Set approach:
# SADD active:2024-01-15 "user:42" "user:99" ...
# Each member: ~50 bytes overhead
# 10M active users × 50 bytes = 500 MB per day
# 365 days = 182 GB — 40x more memory!

# Caveat: bitmaps waste space when user IDs are sparse
# If you have 1,000 users but max ID is 100,000,000:
#   Bitmap: 12.5 MB (mostly zeroswasteful)
#   Set: 50 KB (only stores actual membersefficient)
# Rule: use bitmaps when user IDs are dense and sequential

HyperLogLog — Approximate Cardinality

HyperLogLog (HLL) is a probabilistic data structure that estimates the number of unique elements in a set. It uses a fixed 12 KB of memory regardless of how many elements you add — whether it's 1,000 or 1 billion. The trade-off: the count is approximate, with a standard error of 0.81%.

🎲

The Coin Flip Estimator

Imagine estimating how many people visited a website by asking each visitor to flip a coin repeatedly and report their longest streak of heads. If the longest streak anyone reports is 20 heads in a row, you can estimate roughly 2^20 ≈ 1 million visitors — because getting 20 heads in a row is a 1-in-a-million event. HyperLogLog works on this principle: it hashes each element and tracks the longest run of leading zeros. More unique elements → longer maximum run → higher cardinality estimate. The math is more sophisticated, but the intuition is the same.

HyperLogLog — Core Commandstext
# Add elements (like SADD, but probabilistic)
PFADD visitors:2024-01-15 "user:42" "user:99" "user:42"
  # → 1 (at least one new element added)
  # Note: "user:42" added twiceHLL handles dedup automatically

PFADD visitors:2024-01-15 "user:100" "user:200"

# Count unique elements (approximate)
PFCOUNT visitors:2024-01-15          # → ~4 (standard error: 0.81%)

# Merge multiple HLLs (union of unique visitors)
PFMERGE visitors:week visitors:2024-01-15 visitors:2024-01-16 visitors:2024-01-17
PFCOUNT visitors:week                # Unique visitors across all 3 days

# Memory: ALWAYS 12 KB per HLL, regardless of cardinality
# 365 HLLs (one per day) = 365 × 12 KB = 4.3 MB for a year of data
ApproachMemory (100M unique elements)AccuracyOperations
Set (SADD/SCARD)~6.4 GB100% exactAdd, remove, membership check, intersection
Bitmap (SETBIT/BITCOUNT)12.5 MB100% exactSet/get bit, count, bitwise AND/OR
HyperLogLog (PFADD/PFCOUNT)12 KB (fixed)~99.19% (0.81% error)Add, count, merge — no membership check

When to Use HyperLogLog

  • Unique page views — count distinct visitors without storing every visitor ID
  • Distinct IP addresses — monitor unique IPs hitting your API
  • Unique search queries — count how many different queries users make per day
  • Cardinality estimation — any 'count distinct' where 0.81% error is acceptable
  • Merging counts — PFMERGE lets you combine daily counts into weekly/monthly

When NOT to Use HyperLogLog

  • When you need exact counts — billing, financial transactions, inventory
  • When you need membership checks — HLL cannot tell you IF a specific element was added
  • When you need to list the elements — HLL only gives you the count, not the members
  • When cardinality is small (< 10,000) — a Set uses less memory and gives exact results

💡 The 12 KB Magic Number

HyperLogLog uses 16,384 registers (2^14), each 6 bits wide. That's 16,384 × 6 / 8 = 12,288 bytes ≈ 12 KB. This is fixed regardless of how many elements you add. You could add every IPv4 address (4.3 billion) and it would still use 12 KB with ~0.81% error. This makes HLL one of the most memory-efficient data structures in all of computer science.

07

Streams

Redis Streams are an append-only log data structure — Redis's answer to Kafka for simpler use cases. Each entry in a stream has a unique auto-generated ID (timestamp-based), contains field-value pairs, and is immutable once written. Streams support consumer groups, message acknowledgement, and persistent storage — making them a legitimate lightweight event streaming solution.

📜

The Append-Only Ledger

A Redis Stream is like a shared ledger in an office. Anyone can write a new entry at the bottom (XADD) — you can never modify or insert into the middle. Multiple teams (consumer groups) can read the ledger independently, each tracking where they left off. If a team member reads an entry but crashes before processing it, the entry stays 'pending' until someone acknowledges it (XACK). Unlike a List (where LPOP removes the item), Stream entries persist — you can re-read them, audit them, or have new consumers start from the beginning.

Key Commands

Streams — Core Commandstext
# Add entries to a stream
XADD events * user "alice" action "login" ip "192.168.1.1"
  # → "1705334400000-0" (auto-generated ID: timestamp-sequence)

XADD events * user "bob" action "purchase" item "widget" amount "29.99"
  # → "1705334400001-0"

# The * means auto-generate the ID. You can also specify:
XADD events 1705334400000-0 user "charlie" action "logout"

# Read entries
XRANGE events - +                 # All entries (- = min, + = max)
XRANGE events - + COUNT 10       # First 10 entries
XRANGE events 1705334400000 +    # Entries from timestamp onward

XREVRANGE events + -              # All entries, newest first

# Stream metadata
XLEN events                       # Number of entries
XINFO STREAM events               # Detailed stream info

# Read new entries (blockinglike BLPOP for streams)
XREAD COUNT 10 BLOCK 5000 STREAMS events $
  # Block up to 5 seconds, return new entries after last seen
  # $ means "only new entries from now on"
  # Use a specific ID to resume from a position

Consumer Groups

Consumer groups are what make Streams production-ready. They allow multiple consumers to cooperatively process a stream, with each message delivered to exactly one consumer in the group. If a consumer crashes, pending messages can be reclaimed by another consumer.

Streams — Consumer Groupstext
# Create a consumer group
XGROUP CREATE events email-workers $ MKSTREAM
  # Group "email-workers" starts reading from now ($)
  # MKSTREAM creates the stream if it doesn't exist

XGROUP CREATE events analytics-workers 0
  # Group "analytics-workers" starts from the beginning (0)

# Consumer reads from group (each message goes to ONE consumer)
XREADGROUP GROUP email-workers worker-1 COUNT 5 BLOCK 2000 STREAMS events >
  # worker-1 reads up to 5 new messages (> means undelivered only)

XREADGROUP GROUP email-workers worker-2 COUNT 5 BLOCK 2000 STREAMS events >
  # worker-2 gets DIFFERENT messages (load balanced)

# Acknowledge processing (remove from pending list)
XACK events email-workers 1705334400000-0 1705334400001-0
  # "I've processed these entries — don't redeliver them"

# Check pending messages (unacknowledged)
XPENDING events email-workers
  # Shows: total pending, min/max ID, consumers with pending counts

# Claim abandoned messages (consumer crashed)
XAUTOCLAIM events email-workers worker-3 3600000 0
  # Claim messages pending for > 1 hour (3600000ms), assign to worker-3

Message Flow with Consumer Groups

1

Producer adds events

XADD events * user 'alice' action 'purchase'. The entry gets an auto-generated ID and is appended to the stream. Multiple producers can write concurrently.

2

Consumer group distributes messages

Consumer group 'order-processors' has 3 workers. Each XREADGROUP call delivers unprocessed messages to one worker. Redis tracks which messages each consumer has received.

3

Consumer processes and acknowledges

Worker processes the message (e.g., sends confirmation email). Then calls XACK to mark it done. The message is removed from the pending entries list (PEL).

4

Failed messages are reclaimed

If a worker crashes without XACK, the message stays in the PEL. After a timeout, XAUTOCLAIM reassigns it to a healthy worker. No messages are lost.

Streams vs Lists vs Pub/Sub

FeatureStreamsListsPub/Sub
PersistenceYes — entries persist after readingLPOP removes the itemNo — fire and forget
Consumer groupsYes — built-in load balancingNo — manual coordinationNo — all subscribers get all messages
Message acknowledgementYes — XACK with pending listNo — once popped, it's goneNo — no delivery tracking
Replay / rewindYes — read from any positionNo — consumed items are removedNo — missed messages are lost
OrderingStrict — by auto-generated IDStrict — FIFONo ordering guarantees
Use caseEvent sourcing, audit logs, reliable queuesSimple job queues, stacksReal-time notifications, chat
ComplexityHigh — consumer groups, PEL, claimingLow — push/popLow — publish/subscribe

When to Use Streams

  • Event sourcing lite — append-only log of domain events with replay capability
  • Audit logs — immutable record of actions with timestamps
  • Reliable message processing — consumer groups with acknowledgement prevent message loss
  • IoT data ingestion — high-throughput append with multiple consumer groups for different processing pipelines
  • Activity feeds — multiple services consume the same event stream independently

When to Use Kafka Instead

  • Multi-datacenter replication — Kafka has built-in cross-DC replication
  • Very high throughput (> 100K messages/sec) — Kafka is designed for this scale
  • Long-term storage (weeks/months) — Redis Streams are memory-bound
  • Complex stream processing — Kafka Streams / KSQL provide richer processing
  • Strict ordering across partitions — Kafka's partition model is more mature

🎯 Interview Insight

When asked "how would you implement an event log?" or "how would you handle reliable message processing?", Redis Streams are a strong answer for moderate scale. Mention consumer groups, XACK for at-least-once delivery, and XAUTOCLAIM for handling consumer failures. Then note that you'd graduate to Kafka if you need cross-DC replication or sustained throughput above 100K messages/sec.

08

Interview Questions

These questions test whether you understand Redis data structures beyond surface-level commands — the trade-offs, the internals, and the design decisions.

Q:How would you design a real-time leaderboard for a game with 10 million players?

A: Use a Redis Sorted Set. Each player is a member, their score is the sorted set score. ZADD/ZINCRBY to update scores (O(log N)). ZREVRANGE 0 9 for the top 10 (O(log N + 10)). ZREVRANK for any player's rank (O(log N)). For 10M players, this uses ~500 MB of memory and every operation completes in microseconds. The skip list internals keep the set sorted at all times — no batch re-sorting needed. For sharding across multiple Redis instances, partition by score ranges or use Redis Cluster with hash tags to keep the full leaderboard on one shard.

Q:When would you use a Hash vs a JSON string in Redis?

A: Use a Hash when you need partial reads or updates — HGET reads one field without deserializing the whole object, HINCRBY atomically increments a counter field. Hashes with fewer than 128 fields use listpack encoding, which is very memory-efficient. Use a JSON string when you have deeply nested data (Hashes are flat), when you always read/write the entire object, or when you have 100+ fields (hashtable encoding overhead per field exceeds JSON overhead). The key question: do you ever need to read or update a single field? If yes, use a Hash.

Q:How would you implement a sliding window rate limiter with Redis?

A: Use a Sorted Set per user. For each request: (1) ZREMRANGEBYSCORE to remove entries older than the window (e.g., 60 seconds ago). (2) ZADD with the current timestamp as both score and member. (3) ZCARD to count entries in the window. (4) EXPIRE the key for auto-cleanup. If ZCARD exceeds the limit, reject the request. This gives a true sliding window — unlike the fixed-window string counter approach, there's no boundary burst problem. Wrap the commands in MULTI/EXEC for atomicity. Memory: each request stores a timestamp (~20 bytes), so 100 requests/minute/user = ~2 KB per user.

Q:You need to track daily active users for 500 million users. What Redis data structure would you use?

A: It depends on what queries you need. For just the count: HyperLogLog — 12 KB per day, 0.81% error, PFMERGE for weekly/monthly rollups. For individual membership checks ('was user X active today?'): Bitmap — SETBIT user_id 1, GETBIT for lookup, BITCOUNT for total. 500M users = 62.5 MB per day. BITOP AND across days for retention analysis. For exact count AND membership AND listing: Set — but at ~50 bytes per member, 500M users = 25 GB per day. In practice, most teams use Bitmaps for the daily tracking (membership + count) and HyperLogLog for cross-day aggregations (weekly/monthly uniques).

Q:What's the difference between Redis Streams and Redis Lists for a message queue?

A: Lists are simpler: RPUSH to enqueue, BLPOP to dequeue. But BLPOP removes the message — if the consumer crashes after popping but before processing, the message is lost. Lists have no consumer groups, no acknowledgement, and no replay. Streams solve all of these: entries persist after reading, consumer groups distribute messages across workers, XACK provides explicit acknowledgement, and XAUTOCLAIM recovers messages from crashed consumers. Streams also support multiple independent consumer groups reading the same data. Use Lists for simple, non-critical queues. Use Streams when you need at-least-once delivery, consumer groups, or audit trails.

09

Common Mistakes

These mistakes come from treating Redis as a simple cache instead of understanding its data structures.

🔨

Using Strings for everything

The most common Redis anti-pattern: serializing every object to JSON, storing it as a string, and doing all manipulation in application code. Need to increment a counter? GET → parse → increment → stringify → SET. Need to update one field of a user? GET the whole object → parse → modify → stringify → SET. This wastes network bandwidth, creates race conditions, and ignores the data structures Redis was built around.

Match the data structure to the access pattern. Counters → Strings with INCR. Objects with partial updates → Hashes with HSET/HGET. Ranked data → Sorted Sets. Queues → Lists or Streams. Let Redis do the work on the server side instead of round-tripping data to your application.

💾

Storing large objects in a single key

Storing a 10 MB JSON blob as a single Redis string. Every read transfers 10 MB over the network, even if you only need one field. Every write requires sending the full 10 MB back. Redis is single-threaded — while it's serving your 10 MB value, every other client waits. Large keys also cause memory fragmentation and slow down persistence (RDB/AOF).

Break large objects into smaller pieces. Use Hashes for flat objects (one field per attribute). Use multiple keys with a naming convention (user:42:profile, user:42:preferences). Keep individual values under 100 KB as a rule of thumb. If you must store large blobs, consider whether Redis is the right tool — S3 or a database might be better.

⚠️

Using KEYS in production

KEYS * scans every key in the database. On a Redis instance with 10 million keys, this blocks the server for seconds. Since Redis is single-threaded, every other client is blocked during the scan. This has caused production outages at multiple companies. Even KEYS user:* is dangerous — it's still a full scan with pattern matching.

Use SCAN instead of KEYS. SCAN is cursor-based and non-blocking — it returns a small batch of keys per call and lets other commands execute between batches. For type-specific scanning, use HSCAN, SSCAN, and ZSCAN. Never use KEYS in production code, even for debugging — use SCAN with a COUNT hint.

📏

Ignoring memory encoding thresholds

Redis uses compact encodings (listpack) for small data structures — but silently switches to full encodings (hashtable, skiplist) when thresholds are exceeded. A Hash with 129 fields uses 3-5x more memory than one with 128 fields. A Sorted Set with 129 members jumps from ziplist to skiplist. Teams add 'just one more field' and wonder why memory usage doubled overnight.

Know the thresholds: hash-max-listpack-entries (default 128), zset-max-listpack-entries (default 128), list-max-listpack-size. Monitor memory usage with MEMORY USAGE key. If you're near a threshold, consider splitting into multiple keys. Use OBJECT ENCODING key to check what encoding Redis is using for a specific key.