BenchmarkingSlow CommandsMemory OptimizationMonitoringRedis vs AlternativesProduction Checklist

Redis Production & Operations

Running Redis in production — performance benchmarks, slow command pitfalls, memory optimization, monitoring, and knowing when Redis is NOT the right choice.

25 min read8 sections

Benchmarking & Limits

Redis is single-threaded for command execution. One core processes all commands sequentially. Despite this, Redis achieves remarkable throughput because it operates entirely in memory and uses an efficient event loop (epoll/kqueue). Understanding the theoretical limits helps you plan capacity and identify bottlenecks before they hit production.

🏎️

The Single-Lane Highway

Redis is like a single-lane highway with no speed limit. Only one car passes at a time, but each car moves at 300 mph. You can push ~100,000 cars per second through that lane. The bottleneck isn't the lane speed — it's what happens before and after: the on-ramp (network latency), the car size (value serialization), and slow drivers (O(N) commands) that block everyone behind them.

redis-benchmark — Quick Performance Testbash

# Basic benchmark: 100K requests, 50 parallel connections, SET/GET
redis-benchmark -h 127.0.0.1 -p 6379 -n 100000 -c 50

# Benchmark specific commands
redis-benchmark -t set,get,incr,lpush,rpush -n 100000 -q

# Benchmark with pipelining (batching 16 commands per round-trip)
redis-benchmark -t set -n 100000 -P 16 -q

# Benchmark with specific data size (1KB values)
redis-benchmark -t set -n 100000 -d 1024 -q

# Sample output:
# SET: 112,359.55 requests per second
# GET: 118,483.41 requests per second
# INCR: 115,606.94 requests per second
# LPUSH: 114,025.09 requests per second
# With pipelining (P=16): ~800,000 requests per second

Throughput Expectations

Scenario	Throughput	Notes
Simple GET/SET (small values)	~100K–120K ops/sec	Single core, no pipelining, <1KB values
With pipelining (P=16)	~500K–800K ops/sec	Batches 16 commands per network round-trip
Large values (10KB+)	~30K–50K ops/sec	Serialization and network become bottlenecks
Complex commands (ZRANGEBYSCORE)	~20K–60K ops/sec	Depends on set size and result count
Lua scripts (simple)	~80K–100K ops/sec	Atomic execution, avoids round-trips

Latency Sources

Network Round-Trip

The biggest latency contributor in most setups. A local Redis call takes ~0.1ms. Over a network, it's 0.5–2ms. Cross-region calls can be 50–100ms. Solution: co-locate Redis with your application servers. Use pipelining to batch multiple commands into a single round-trip.

Value Serialization

Large values (10KB+) take time to serialize, transmit, and deserialize. A 1MB value takes ~10x longer than a 1KB value. Solution: keep values small. Compress large values with LZ4 or Snappy before storing. Split large objects into smaller keys.

Slow Commands

O(N) commands like KEYS *, SMEMBERS on a 1M-element set, or HGETALL on a huge hash block the entire server. While that command runs, every other client waits. Solution: avoid O(N) commands in production. Use SCAN for iteration. Set slowlog thresholds to catch offenders.

Persistence I/O

RDB snapshots fork the process, which can cause latency spikes on large datasets. AOF fsync=always adds disk I/O to every write. Solution: use fsync=everysec for AOF. Schedule RDB snapshots during low-traffic periods. Monitor fork time with INFO stats.

🎯 Interview Insight

When asked "how fast is Redis?" — don't just say "it's fast." Say: "~100K ops/sec on a single core for simple GET/SET with small values. With pipelining, 500K+. The real bottlenecks are network latency, large values, and O(N) commands — not Redis itself."

Slow Commands to Avoid

Because Redis is single-threaded, one slow command blocks every other client. A single KEYS * on a database with 10 million keys can freeze your entire Redis instance for seconds. In production, this is an outage.

⚠️ KEYS * Is a Production Killer

Never run KEYS * in production. It scans every key in the database — O(N) where N is the total number of keys. On a Redis instance with 10M keys, this blocks the server for several seconds. Every other client times out. Use SCAN instead.

Dangerous O(N) Commands

Command	Time Complexity	Why It's Dangerous	Safe Alternative
KEYS *	O(N) — all keys	Scans entire keyspace, blocks server	SCAN with cursor-based iteration
SMEMBERS	O(N) — set size	Returns all members of a large set at once	SSCAN for iteration, or SRANDMEMBER for sampling
HGETALL	O(N) — hash fields	Returns all fields of a large hash at once	HSCAN for iteration, or HMGET for specific fields
LRANGE 0 -1	O(N) — list length	Returns entire list, blocks on large lists	LRANGE with bounded offsets (paginate)
SORT	O(N+M*log(M))	Sorts in-place, expensive on large collections	Use Sorted Sets (ZRANGEBYSCORE) instead
FLUSHDB / FLUSHALL	O(N) — all keys	Deletes everything, blocks until complete	FLUSHDB ASYNC (Redis 4.0+) for non-blocking

SCAN vs KEYS — Cursor-Based Iteration

SCAN — Safe Key Iterationbash

# ❌ DANGEROUS: blocks the server until all keys are scanned
KEYS user:*

# ✅ SAFE: cursor-based iteration, returns ~10 keys per call
# Start with cursor 0
SCAN 0 MATCH user:* COUNT 100

# Returns: next_cursor + batch of matching keys
# 1) "17920"          ← next cursor (0 means done)
# 2) 1) "user:42"
#    2) "user:108"
#    3) "user:7"

# Continue with returned cursor
SCAN 17920 MATCH user:* COUNT 100

# Keep going until cursor returns "0"
# Each call takes O(COUNT) time — doesn't block the server

# Same pattern for other data structures:
SSCAN myset 0 COUNT 100      # iterate set members
HSCAN myhash 0 COUNT 100     # iterate hash fields
ZSCAN myzset 0 COUNT 100     # iterate sorted set members

Auditing with SLOWLOG

SLOWLOG — Find Slow Commandsbash

# Configure slowlog threshold (in microseconds)
# Log any command that takes longer than 10ms
CONFIG SET slowlog-log-slower-than 10000

# Keep the last 128 slow commands
CONFIG SET slowlog-max-len 128

# View the slowlog
SLOWLOG GET 10

# Sample output:
# 1) 1) (integer) 14            ← entry ID
#    2) (integer) 1693420800    ← Unix timestamp
#    3) (integer) 38102         ← execution time (μs) = 38ms
#    4) 1) "KEYS"               ← the command
#       2) "session:*"          ← the argument
#    5) "10.0.1.42:52340"       ← client address

# Reset the slowlog
SLOWLOG RESET

# Check how many entries are in the slowlog
SLOWLOG LEN

💡 Production Tip

Set slowlog-log-slower-than to 5000 (5ms) in production. Review the slowlog weekly. Common offenders: KEYS commands from admin scripts, HGETALL on growing hashes, and LRANGE on unbounded lists. Fix them before they cause outages.

Memory Optimization

Redis stores everything in memory, so every byte counts. Understanding how Redis encodes data internally — and when it switches between compact and full encodings — is the key to fitting more data into less RAM.

📦

Packing a Suitcase

Small items (socks, underwear) can be rolled up tightly and stuffed into corners — this is ziplist encoding. But once you add too many items or something bulky (a winter coat), you need to switch to a bigger suitcase with compartments — this is hashtable encoding. Redis does the same: small hashes, lists, and sets use compact ziplist encoding. Once they grow past a threshold, Redis switches to a full data structure that uses more memory but handles large sizes efficiently.

Memory Encoding Thresholds

Data Type	Compact Encoding	Switches To	Threshold
Hash	ziplist (listpack in 7.0+)	hashtable	>128 fields OR any value >64 bytes
List	ziplist / quicklist	quicklist with larger nodes	>128 elements OR any element >64 bytes
Set	intset (integers only)	hashtable	>128 elements OR any non-integer member
Sorted Set	ziplist (listpack)	skiplist + hashtable	>128 elements OR any member >64 bytes

Tuning Encoding Thresholdsbash

# Check current thresholds
CONFIG GET hash-max-ziplist-entries
CONFIG GET hash-max-ziplist-value
CONFIG GET list-max-ziplist-size
CONFIG GET set-max-intset-entries
CONFIG GET zset-max-ziplist-entries
CONFIG GET zset-max-ziplist-value

# Increase hash ziplist threshold (default: 128 entries, 64 bytes)
# If your hashes have 200 small fields, raising this saves memory
CONFIG SET hash-max-ziplist-entries 256
CONFIG SET hash-max-ziplist-value 128

# Trade-off: higher thresholds = less memory, but slower O(N) scans
# on the ziplist. Sweet spot is usually 128-512 entries.

# Redis 7.0+ uses listpack instead of ziplist (same concept, better impl)
CONFIG GET hash-max-listpack-entries
CONFIG GET hash-max-listpack-value

Memory Analysis Tools

Analyzing Memory Usagebash

# Check memory usage of a specific key (Redis 4.0+)
MEMORY USAGE user:42
# (integer) 72    ← 72 bytes including key overhead

# Check memory usage with samples for aggregate types
MEMORY USAGE myhash SAMPLES 5
# Samples 5 random fields to estimate total memory

# Overall memory stats
INFO memory
# used_memory: 1,234,567,890        ← total bytes used
# used_memory_human: 1.15G          ← human-readable
# used_memory_rss: 1,400,000,000    ← RSS (actual OS allocation)
# mem_fragmentation_ratio: 1.13     ← RSS / used_memory
# used_memory_peak: 2,000,000,000   ← historical peak

# Memory fragmentation ratio:
#   < 1.0  → Redis is swapping (BAD — add more RAM)
#   1.0-1.5 → healthy
#   > 1.5  → significant fragmentation (consider restart)

# Memory doctor (Redis 4.0+)
MEMORY DOCTOR
# Returns advice about memory issues

# External tool: redis-rdb-tools (analyze RDB dump offline)
# pip install rdbtools
# rdb --command memory dump.rdb --bytes 128 -f memory.csv
# Generates CSV with key, type, encoding, size, num_elements

Key Naming Conventions & Memory Impact

Key Naming — Memory Matterstext

# Every key name is stored in memory. Shorter = less RAM.

# ❌ Verbose keys (wastes memory at scale)
user:profile:details:john.doe@example.com:settings:notifications
# ~60 bytes just for the key name × 10M users = 600MB wasted

# ✅ Compact keys (saves memory)
u:42:s:n
# ~8 bytes × 10M users = 80MB

# ✅ Balanced approach (readable + compact)
u:{id}:profile     instead of  user:profile:details:{email}
s:{id}:cart        instead of  shopping:cart:items:user:{id}
sess:{token}       instead of  session:auth:token:{full-uuid}

# Use hashes to group related data (1 key instead of 5)
# ❌ 5 separate keys per user:
SET u:42:name "John"
SET u:42:email "john@example.com"
SET u:42:age "30"
SET u:42:city "NYC"
SET u:42:role "admin"
# Overhead: 5 keys × ~50 bytes overhead each = 250 bytes

# ✅ 1 hash with 5 fields:
HSET u:42 name "John" email "john@example.com" age 30 city "NYC" role "admin"
# Overhead: 1 key × ~50 bytes + ziplist encoding = ~120 bytes
# Saves ~50% memory for small field counts

🎯 Memory Rule of Thumb

Each Redis key has ~50-70 bytes of overhead (for the dict entry, SDS string, expiry pointer, etc.) regardless of the value size. If you have millions of tiny values, the key overhead dominates. Group related small values into hashes to reduce the number of keys.

Monitoring & Metrics

You can't fix what you can't see. Redis exposes a rich set of metrics through the INFO command. Knowing which metrics matter — and what their values mean — is the difference between catching problems early and debugging outages at 3 AM.

INFO Command — Overviewbash

# Full info dump (all sections)
INFO

# Specific sections
INFO memory          # memory usage, fragmentation, peak
INFO stats           # ops/sec, hits/misses, connections
INFO replication     # master/replica status, lag
INFO clients         # connected clients, blocked clients
INFO keyspace        # keys per database, expires
INFO server          # version, uptime, config file
INFO persistence     # RDB/AOF status, last save time

# Single stat shortcut
INFO stats | grep instantaneous_ops_per_sec

Key Metrics to Monitor

Metric	What It Tells You	Healthy Range	Red Flag
used_memory	Total bytes allocated by Redis	Below maxmemory	>90% of maxmemory (evictions imminent)
mem_fragmentation_ratio	RSS / used_memory — OS vs Redis view	1.0–1.5	<1.0 (swapping) or >2.0 (heavy fragmentation)
connected_clients	Number of active client connections	Stable, within maxclients	Sudden spikes or approaching maxclients
instantaneous_ops_per_sec	Current throughput	Consistent with baseline	Sudden drops (server blocked) or spikes
keyspace_hits / keyspace_misses	Cache hit rate	Hit rate >95%	Hit rate <80% (cache not effective)
rdb_last_bgsave_status	Last RDB snapshot result	ok	err (persistence broken, data loss risk)
latest_fork_usec	Time to fork for RDB/AOF rewrite	<500ms	>1s (large dataset, causes latency spike)

Hit Rate Calculation

Cache Hit Rate — The Most Important Metrictext

# Formula:
hit_rate = keyspace_hits / (keyspace_hits + keyspace_misses) × 100

# Example from INFO stats:
keyspace_hits:   9,500,000
keyspace_misses:   500,000
hit_rate = 9,500,000 / 10,000,000 = 95%  ✅ Healthy

# What hit rates mean:
# >99%  → Excellent. Cache is highly effective.
# 95-99% → Good. Normal for most workloads.
# 80-95% → Concerning. Review TTLs, key patterns, eviction.
# <80%  → Bad. Cache is not protecting the database.
#          Possible causes:
#          - TTLs too short (keys expire before reuse)
#          - Working set larger than maxmemory (evictions)
#          - Cold start (cache not yet warmed)
#          - Wrong data being cached (cache what's read often)

# Monitor hit rate over time, not as a snapshot.
# A drop from 98% to 85% over an hour = investigate immediately.

Latency Monitoring

LATENCY — Tracking Latency Eventsbash

# Enable latency monitoring (threshold in ms)
CONFIG SET latency-monitor-threshold 5

# View latency history for specific events
LATENCY HISTORY command
# Returns timestamped latency samples:
# 1) 1) (integer) 1693420800    ← timestamp
#    2) (integer) 12            ← latency in ms

# View latest latency for all event types
LATENCY LATEST
# 1) 1) "command"               ← event type
#    2) (integer) 1693420800    ← last occurrence
#    3) (integer) 12            ← latest latency (ms)
#    4) (integer) 38            ← all-time max latency (ms)

# Reset latency data
LATENCY RESET

# Built-in latency diagnostic
redis-cli --latency           # continuous ping test
redis-cli --latency-history   # latency over time (15s intervals)
redis-cli --latency-dist      # latency distribution (spectrum)

# Intrinsic latency test (measures system, not Redis)
redis-cli --intrinsic-latency 5   # test for 5 seconds

💡 Monitoring Stack

In production, export Redis metrics to Prometheus using redis_exporter and visualize with Grafana. Set alerts on: memory usage >85%, hit rate <90%, connected clients spike, and replication lag >1s. The INFO command is for debugging — automated monitoring catches problems while you sleep.

Redis vs Alternatives

Redis is powerful, but it's not the right tool for every job. Understanding when to use Redis — and when to reach for something else — is a critical production skill and a common interview topic.

Feature	Redis	Memcached	MongoDB	etcd	Kafka
Primary Use	Cache, sessions, real-time data	Simple key-value cache	Document database	Distributed config/coordination	Event streaming / message queue
Data Model	Strings, Hashes, Lists, Sets, Sorted Sets, Streams	Strings only (key → blob)	JSON documents (BSON)	Key-value (small values)	Append-only log (topics/partitions)
Persistence	Optional (RDB, AOF)	None (pure cache)	Yes (disk-based)	Yes (Raft consensus)	Yes (disk-based log)
Throughput	~100K ops/sec (single node)	~200K ops/sec (multi-threaded)	~20K-50K ops/sec	~10K ops/sec	~100K msgs/sec per partition
Data Size	Must fit in RAM	Must fit in RAM	Disk-based (TBs)	Small (few GB max)	Disk-based (TBs of logs)
Clustering	Redis Cluster (hash slots)	Client-side sharding	Built-in sharding (mongos)	Raft consensus (3-5 nodes)	Partitions across brokers
Best For	Caching, leaderboards, rate limiting, pub/sub	Simple caching at massive scale	Flexible schema, complex queries	Service discovery, leader election, config	Event sourcing, log aggregation, streaming

When Redis Is NOT the Right Choice

Data Larger Than RAM

Redis stores everything in memory. If your dataset is 500GB and your server has 64GB RAM, Redis won't work. Use a disk-based database (PostgreSQL, MongoDB) or a tiered solution (Redis on Flash). Redis is a cache or working-set store, not a primary database for large datasets.

Complex Queries and Joins

Redis has no query language, no joins, no aggregations. If you need 'SELECT users WHERE age > 25 AND city = NYC ORDER BY signup_date', use PostgreSQL or MongoDB. Redis is for key-based lookups, not ad-hoc queries.

Strong Consistency Requirements

Redis replication is asynchronous by default. A write acknowledged by the master may not yet be on the replica. If the master crashes, that write is lost. For strong consistency (banking, inventory counts), use a database with synchronous replication or a consensus system like etcd/ZooKeeper.

Durable Message Queuing

Redis Streams and Pub/Sub work for lightweight messaging, but they lack the durability guarantees of Kafka or RabbitMQ. If you need guaranteed delivery, message replay, and consumer group management at scale, use a dedicated message broker.

Multi-Key ACID Transactions

Redis MULTI/EXEC provides atomicity for a single connection, but not isolation across clients. There's no rollback on partial failure. For true ACID transactions across multiple entities, use a relational database.

Decision Framework

Should You Use Redis? — Decision Treetext

Q: Does the data fit in RAM?
  NO  → Use a disk-based database (PostgreSQL, MongoDB)
  YES ↓

Q: Do you need complex queries (joins, aggregations, filters)?
  YES → Use PostgreSQL/MongoDB. Cache hot results in Redis.
  NO  ↓

Q: Do you need strong consistency (zero data loss)?
  YES → Use PostgreSQL with synchronous replication, or etcd.
        Redis can still be a cache layer in front.
  NO  ↓

Q: What's the access pattern?
  Key-value lookups     → Redis (Strings, Hashes)
  Ranked data           → Redis (Sorted Sets)
  Queue / task system   → Redis (Lists, Streams) or Kafka for scale
  Pub/Sub messaging     → Redis Pub/Sub (small scale) or Kafka (large scale)
  Session storage       → Redis (Strings with TTL)
  Rate limiting         → Redis (INCR + EXPIRE)
  Real-time analytics   → Redis (HyperLogLog, Bitmaps)

Rule of thumb: Redis is best as a cache, session store, or
real-time data layer — not as a primary database.

🎯 Interview Insight

Interviewers love asking "when would you NOT use Redis?" The answer shows maturity: "Redis is wrong when data exceeds RAM, when you need complex queries or joins, when you need strong consistency, or when you need durable message queuing at scale. Redis excels as a cache, session store, and real-time data layer."

Production Checklist

Before deploying Redis to production, every item on this checklist should be configured and verified. Skipping any of these is a ticking time bomb.

Memory & Eviction

✅Set maxmemory to 70-80% of available RAM (leave room for fork overhead and OS)
✅Configure an eviction policy — allkeys-lru for caches, noeviction for primary data stores
✅Monitor used_memory and set alerts at 85% and 95% of maxmemory
✅Test what happens when maxmemory is reached — does your app handle eviction/rejection gracefully?

Persistence & Durability

✅Choose persistence mode: RDB for snapshots, AOF for durability, both for safety
✅For AOF, use appendfsync everysec (balance between durability and performance)
✅Schedule RDB snapshots during low-traffic windows
✅Test recovery: stop Redis, restart from RDB/AOF, verify data integrity
✅Store backups off-server (S3, GCS) — a backup on the same disk is not a backup

Security

✅Set requirepass with a strong password (Redis has no auth by default)
✅Bind to specific interfaces: bind 127.0.0.1 or your private network IP
✅Never expose Redis to the public internet (port 6379 is actively scanned)
✅Disable dangerous commands in production: rename-command FLUSHALL '' and rename-command KEYS ''
✅Use TLS for connections if Redis is accessed over a network
✅Enable ACLs (Redis 6.0+) for fine-grained user permissions

Monitoring & Alerting

✅Export metrics to Prometheus/Grafana using redis_exporter
✅Alert on: used_memory >85%, hit rate <90%, connected_clients spike, replication lag >1s
✅Configure SLOWLOG with a 5ms threshold and review weekly
✅Enable LATENCY MONITOR with a 5ms threshold
✅Monitor rdb_last_bgsave_status — a failed save means persistence is broken

High Availability

✅Deploy at least one replica for read scaling and failover
✅Use Redis Sentinel (3+ nodes) for automatic failover, or Redis Cluster for sharding
✅Test failover: kill the master, verify Sentinel promotes a replica within seconds
✅Set min-replicas-to-write 1 to prevent writes when no replicas are connected
✅Configure client libraries with Sentinel/Cluster awareness for automatic reconnection

Connection Management

✅Use connection pooling in your application (don't create a new connection per request)
✅Set timeout to close idle connections (default 0 = never, set to 300 seconds)
✅Set maxclients appropriately (default 10,000 — lower if your server has limited file descriptors)
✅Set tcp-keepalive to 60 seconds to detect dead connections

redis.conf — Production Essentialsbash

# Memory
maxmemory 12gb
maxmemory-policy allkeys-lru

# Persistence (both RDB + AOF for safety)
save 900 1
save 300 10
save 60 10000
appendonly yes
appendfsync everysec

# Security
requirepass your-strong-password-here
bind 127.0.0.1 10.0.1.0
rename-command FLUSHALL ""
rename-command FLUSHDB ""
rename-command KEYS ""
rename-command DEBUG ""

# Connections
timeout 300
tcp-keepalive 60
maxclients 5000

# Slow log
slowlog-log-slower-than 5000
slowlog-max-len 256

# Latency monitoring
latency-monitor-threshold 5

# Replication safety
min-replicas-to-write 1
min-replicas-max-lag 10

Interview Questions

These questions test whether you can operate Redis in production — not just use it as a cache.

Q:How would you diagnose a sudden latency spike in Redis?

A: Start with SLOWLOG GET 10 to see if any slow commands ran recently. Check INFO stats for instantaneous_ops_per_sec — a drop means the server was blocked. Check INFO memory for mem_fragmentation_ratio — if it's <1.0, Redis is swapping to disk. Check latest_fork_usec — a large value means an RDB save or AOF rewrite caused a fork spike. Run LATENCY LATEST to see timestamped latency events. Common culprits: O(N) commands (KEYS, HGETALL on large hashes), RDB fork on a large dataset, memory swapping, or network issues. Fix: identify the slow command via SLOWLOG, replace it with a SCAN-based alternative, and set slowlog-log-slower-than to catch future offenders.

Q:Your Redis instance is using 95% of maxmemory. What do you do?

A: Immediate: check the eviction policy — if it's noeviction, writes are being rejected. Switch to allkeys-lru if this is a cache. Short-term: use MEMORY USAGE on suspect keys to find memory hogs. Run redis-rdb-tools on an RDB dump to get a full memory breakdown by key pattern. Look for: large hashes/sets that grew unbounded, keys with no TTL that should have one, and duplicate data stored under different key patterns. Medium-term: add TTLs to keys that don't have them, compress large values, switch from individual keys to hashes (saves per-key overhead), and consider Redis Cluster to shard across multiple nodes. Long-term: review your data model — are you storing data in Redis that belongs in a database?

Q:Why is KEYS * dangerous and what should you use instead?

A: KEYS * is O(N) where N is the total number of keys in the database. It scans every single key and blocks the Redis server until complete. On an instance with 10M keys, this can take several seconds — during which every other client is blocked. In production, this causes timeouts and cascading failures. Use SCAN instead: it's cursor-based, returning ~COUNT keys per call without blocking the server. SCAN 0 MATCH user:* COUNT 100 returns a batch of matching keys and a cursor for the next batch. Each call is O(COUNT), not O(N). The trade-off: SCAN may return duplicates and doesn't guarantee consistency if keys are added/removed during iteration — but it won't crash your production server.

Q:How do you decide between Redis and Memcached for caching?

A: Use Memcached when: you need a simple key-value cache with string values only, you want multi-threaded performance (Memcached uses multiple cores, Redis is single-threaded), and you don't need persistence or data structures. Use Redis when: you need data structures beyond strings (hashes, sorted sets, lists), you need persistence (survive restarts), you need pub/sub or Lua scripting, you need atomic operations on complex types (ZINCRBY, LPUSH), or you need features like TTL per key, transactions, or Streams. In practice, Redis is the default choice for most teams because its feature set covers Memcached's use case plus much more. Memcached wins only on raw multi-threaded throughput for simple string caching.

Q:What eviction policy would you choose for a Redis cache vs. a Redis session store?

A: For a cache (expendable data, can be re-fetched): use allkeys-lru. When memory is full, Redis evicts the least recently used key across all keys. This is ideal because any cached value can be regenerated from the source. For a session store (user sessions that shouldn't be randomly dropped): use volatile-lru — only evict keys that have a TTL set. Sessions naturally have TTLs (e.g., 30 minutes), so expired sessions get evicted first. Active sessions without expired TTLs are preserved. Alternative: volatile-ttl evicts keys closest to expiration first, which is also good for sessions. Never use noeviction for a cache — it causes write errors when memory is full. Never use allkeys-random — it evicts active hot keys as readily as cold ones.

Common Mistakes

These mistakes have caused real production incidents. Each one is preventable with the right configuration and awareness.

🔓

Running Redis without authentication on a network

Redis ships with no password by default. If it's bound to 0.0.0.0 (all interfaces), anyone on the network — or the internet — can connect, read all data, and run FLUSHALL. Attackers actively scan port 6379. There have been widespread attacks where exposed Redis instances were used to write SSH keys and gain server access.

✅Always set requirepass in redis.conf. Bind to 127.0.0.1 or your private network interface. Never expose port 6379 to the public internet. Use firewall rules as a second layer. On Redis 6.0+, use ACLs for per-user permissions.

💾

Not setting maxmemory on a cache

Without maxmemory, Redis grows until it consumes all available RAM. The OS starts swapping Redis memory to disk, which makes Redis 100x slower. Eventually the OOM killer terminates the Redis process — or worse, another critical process. All cached data is lost, and the database gets hammered.

✅Always set maxmemory to 70-80% of available RAM (leave room for fork overhead during RDB saves). Set an eviction policy — allkeys-lru for caches. Monitor used_memory and alert at 85%. Test what happens when maxmemory is reached: does your application handle it gracefully?

🐌

Using KEYS * in application code

A developer writes KEYS user:* to find all user keys. It works in development with 100 keys. In production with 10 million keys, it blocks the Redis server for 5 seconds. Every other client times out. The monitoring system detects Redis as 'down' and triggers alerts. The developer doesn't realize KEYS is O(N) because it worked fine locally.

✅Ban KEYS from application code entirely. Use rename-command KEYS '' in redis.conf to disable it. Use SCAN for iteration — it's cursor-based and non-blocking. For finding keys by pattern, maintain a secondary index (a Set containing all keys of a type) instead of scanning the keyspace.

📈

Ignoring memory fragmentation

Redis allocates and frees memory frequently. Over time, the allocator (jemalloc) can't reuse freed blocks efficiently, leading to fragmentation. INFO memory shows used_memory at 8GB but used_memory_rss at 14GB — the OS allocated 14GB but Redis only uses 8GB. The extra 6GB is wasted. If maxmemory is set to 12GB, Redis thinks it has 4GB free, but the OS is already at 14GB.

✅Monitor mem_fragmentation_ratio (RSS / used_memory). Healthy is 1.0-1.5. Above 2.0, consider restarting Redis during a maintenance window to reset fragmentation. Redis 4.0+ has activedefrag yes which defragments memory online without restart. For write-heavy workloads with variable-size values, fragmentation is expected — plan your maxmemory accordingly.