BenchmarkingSlow CommandsMemory OptimizationMonitoringRedis vs AlternativesProduction Checklist

Redis Production & Operations

Running Redis in production — performance benchmarks, slow command pitfalls, memory optimization, monitoring, and knowing when Redis is NOT the right choice.

25 min read8 sections
01

Benchmarking & Limits

Redis is single-threaded for command execution. One core processes all commands sequentially. Despite this, Redis achieves remarkable throughput because it operates entirely in memory and uses an efficient event loop (epoll/kqueue). Understanding the theoretical limits helps you plan capacity and identify bottlenecks before they hit production.

🏎️

The Single-Lane Highway

Redis is like a single-lane highway with no speed limit. Only one car passes at a time, but each car moves at 300 mph. You can push ~100,000 cars per second through that lane. The bottleneck isn't the lane speed — it's what happens before and after: the on-ramp (network latency), the car size (value serialization), and slow drivers (O(N) commands) that block everyone behind them.

redis-benchmark — Quick Performance Testbash
# Basic benchmark: 100K requests, 50 parallel connections, SET/GET
redis-benchmark -h 127.0.0.1 -p 6379 -n 100000 -c 50

# Benchmark specific commands
redis-benchmark -t set,get,incr,lpush,rpush -n 100000 -q

# Benchmark with pipelining (batching 16 commands per round-trip)
redis-benchmark -t set -n 100000 -P 16 -q

# Benchmark with specific data size (1KB values)
redis-benchmark -t set -n 100000 -d 1024 -q

# Sample output:
# SET: 112,359.55 requests per second
# GET: 118,483.41 requests per second
# INCR: 115,606.94 requests per second
# LPUSH: 114,025.09 requests per second
# With pipelining (P=16): ~800,000 requests per second

Throughput Expectations

ScenarioThroughputNotes
Simple GET/SET (small values)~100K–120K ops/secSingle core, no pipelining, <1KB values
With pipelining (P=16)~500K–800K ops/secBatches 16 commands per network round-trip
Large values (10KB+)~30K–50K ops/secSerialization and network become bottlenecks
Complex commands (ZRANGEBYSCORE)~20K–60K ops/secDepends on set size and result count
Lua scripts (simple)~80K–100K ops/secAtomic execution, avoids round-trips

Latency Sources

1

Network Round-Trip

The biggest latency contributor in most setups. A local Redis call takes ~0.1ms. Over a network, it's 0.5–2ms. Cross-region calls can be 50–100ms. Solution: co-locate Redis with your application servers. Use pipelining to batch multiple commands into a single round-trip.

2

Value Serialization

Large values (10KB+) take time to serialize, transmit, and deserialize. A 1MB value takes ~10x longer than a 1KB value. Solution: keep values small. Compress large values with LZ4 or Snappy before storing. Split large objects into smaller keys.

3

Slow Commands

O(N) commands like KEYS *, SMEMBERS on a 1M-element set, or HGETALL on a huge hash block the entire server. While that command runs, every other client waits. Solution: avoid O(N) commands in production. Use SCAN for iteration. Set slowlog thresholds to catch offenders.

4

Persistence I/O

RDB snapshots fork the process, which can cause latency spikes on large datasets. AOF fsync=always adds disk I/O to every write. Solution: use fsync=everysec for AOF. Schedule RDB snapshots during low-traffic periods. Monitor fork time with INFO stats.

🎯 Interview Insight

When asked "how fast is Redis?" — don't just say "it's fast." Say: "~100K ops/sec on a single core for simple GET/SET with small values. With pipelining, 500K+. The real bottlenecks are network latency, large values, and O(N) commands — not Redis itself."

02

Slow Commands to Avoid

Because Redis is single-threaded, one slow command blocks every other client. A single KEYS * on a database with 10 million keys can freeze your entire Redis instance for seconds. In production, this is an outage.

⚠️ KEYS * Is a Production Killer

Never run KEYS * in production. It scans every key in the database — O(N) where N is the total number of keys. On a Redis instance with 10M keys, this blocks the server for several seconds. Every other client times out. Use SCAN instead.

Dangerous O(N) Commands

CommandTime ComplexityWhy It's DangerousSafe Alternative
KEYS *O(N) — all keysScans entire keyspace, blocks serverSCAN with cursor-based iteration
SMEMBERSO(N) — set sizeReturns all members of a large set at onceSSCAN for iteration, or SRANDMEMBER for sampling
HGETALLO(N) — hash fieldsReturns all fields of a large hash at onceHSCAN for iteration, or HMGET for specific fields
LRANGE 0 -1O(N) — list lengthReturns entire list, blocks on large listsLRANGE with bounded offsets (paginate)
SORTO(N+M*log(M))Sorts in-place, expensive on large collectionsUse Sorted Sets (ZRANGEBYSCORE) instead
FLUSHDB / FLUSHALLO(N) — all keysDeletes everything, blocks until completeFLUSHDB ASYNC (Redis 4.0+) for non-blocking

SCAN vs KEYS — Cursor-Based Iteration

SCAN — Safe Key Iterationbash
# ❌ DANGEROUS: blocks the server until all keys are scanned
KEYS user:*

# ✅ SAFE: cursor-based iteration, returns ~10 keys per call
# Start with cursor 0
SCAN 0 MATCH user:* COUNT 100

# Returns: next_cursor + batch of matching keys
# 1) "17920"next cursor (0 means done)
# 2) 1) "user:42"
#    2) "user:108"
#    3) "user:7"

# Continue with returned cursor
SCAN 17920 MATCH user:* COUNT 100

# Keep going until cursor returns "0"
# Each call takes O(COUNT) timedoesn't block the server

# Same pattern for other data structures:
SSCAN myset 0 COUNT 100      # iterate set members
HSCAN myhash 0 COUNT 100     # iterate hash fields
ZSCAN myzset 0 COUNT 100     # iterate sorted set members

Auditing with SLOWLOG

SLOWLOG — Find Slow Commandsbash
# Configure slowlog threshold (in microseconds)
# Log any command that takes longer than 10ms
CONFIG SET slowlog-log-slower-than 10000

# Keep the last 128 slow commands
CONFIG SET slowlog-max-len 128

# View the slowlog
SLOWLOG GET 10

# Sample output:
# 1) 1) (integer) 14entry ID
#    2) (integer) 1693420800Unix timestamp
#    3) (integer) 38102execution time (μs) = 38ms
#    4) 1) "KEYS"the command
#       2) "session:*"the argument
#    5) "10.0.1.42:52340"client address

# Reset the slowlog
SLOWLOG RESET

# Check how many entries are in the slowlog
SLOWLOG LEN

💡 Production Tip

Set slowlog-log-slower-than to 5000 (5ms) in production. Review the slowlog weekly. Common offenders: KEYS commands from admin scripts, HGETALL on growing hashes, and LRANGE on unbounded lists. Fix them before they cause outages.

03

Memory Optimization

Redis stores everything in memory, so every byte counts. Understanding how Redis encodes data internally — and when it switches between compact and full encodings — is the key to fitting more data into less RAM.

📦

Packing a Suitcase

Small items (socks, underwear) can be rolled up tightly and stuffed into corners — this is ziplist encoding. But once you add too many items or something bulky (a winter coat), you need to switch to a bigger suitcase with compartments — this is hashtable encoding. Redis does the same: small hashes, lists, and sets use compact ziplist encoding. Once they grow past a threshold, Redis switches to a full data structure that uses more memory but handles large sizes efficiently.

Memory Encoding Thresholds

Data TypeCompact EncodingSwitches ToThreshold
Hashziplist (listpack in 7.0+)hashtable>128 fields OR any value >64 bytes
Listziplist / quicklistquicklist with larger nodes>128 elements OR any element >64 bytes
Setintset (integers only)hashtable>128 elements OR any non-integer member
Sorted Setziplist (listpack)skiplist + hashtable>128 elements OR any member >64 bytes
Tuning Encoding Thresholdsbash
# Check current thresholds
CONFIG GET hash-max-ziplist-entries
CONFIG GET hash-max-ziplist-value
CONFIG GET list-max-ziplist-size
CONFIG GET set-max-intset-entries
CONFIG GET zset-max-ziplist-entries
CONFIG GET zset-max-ziplist-value

# Increase hash ziplist threshold (default: 128 entries, 64 bytes)
# If your hashes have 200 small fields, raising this saves memory
CONFIG SET hash-max-ziplist-entries 256
CONFIG SET hash-max-ziplist-value 128

# Trade-off: higher thresholds = less memory, but slower O(N) scans
# on the ziplist. Sweet spot is usually 128-512 entries.

# Redis 7.0+ uses listpack instead of ziplist (same concept, better impl)
CONFIG GET hash-max-listpack-entries
CONFIG GET hash-max-listpack-value

Memory Analysis Tools

Analyzing Memory Usagebash
# Check memory usage of a specific key (Redis 4.0+)
MEMORY USAGE user:42
# (integer) 7272 bytes including key overhead

# Check memory usage with samples for aggregate types
MEMORY USAGE myhash SAMPLES 5
# Samples 5 random fields to estimate total memory

# Overall memory stats
INFO memory
# used_memory: 1,234,567,890total bytes used
# used_memory_human: 1.15Ghuman-readable
# used_memory_rss: 1,400,000,000RSS (actual OS allocation)
# mem_fragmentation_ratio: 1.13RSS / used_memory
# used_memory_peak: 2,000,000,000historical peak

# Memory fragmentation ratio:
#   < 1.0Redis is swapping (BADadd more RAM)
#   1.0-1.5healthy
#   > 1.5significant fragmentation (consider restart)

# Memory doctor (Redis 4.0+)
MEMORY DOCTOR
# Returns advice about memory issues

# External tool: redis-rdb-tools (analyze RDB dump offline)
# pip install rdbtools
# rdb --command memory dump.rdb --bytes 128 -f memory.csv
# Generates CSV with key, type, encoding, size, num_elements

Key Naming Conventions & Memory Impact

Key Naming — Memory Matterstext
# Every key name is stored in memory. Shorter = less RAM.

# ❌ Verbose keys (wastes memory at scale)
user:profile:details:john.doe@example.com:settings:notifications
# ~60 bytes just for the key name × 10M users = 600MB wasted

# ✅ Compact keys (saves memory)
u:42:s:n
# ~8 bytes × 10M users = 80MB

# ✅ Balanced approach (readable + compact)
u:{id}:profile     instead of  user:profile:details:{email}
s:{id}:cart        instead of  shopping:cart:items:user:{id}
sess:{token}       instead of  session:auth:token:{full-uuid}

# Use hashes to group related data (1 key instead of 5)
# ❌ 5 separate keys per user:
SET u:42:name "John"
SET u:42:email "john@example.com"
SET u:42:age "30"
SET u:42:city "NYC"
SET u:42:role "admin"
# Overhead: 5 keys × ~50 bytes overhead each = 250 bytes

# ✅ 1 hash with 5 fields:
HSET u:42 name "John" email "john@example.com" age 30 city "NYC" role "admin"
# Overhead: 1 key × ~50 bytes + ziplist encoding = ~120 bytes
# Saves ~50% memory for small field counts

🎯 Memory Rule of Thumb

Each Redis key has ~50-70 bytes of overhead (for the dict entry, SDS string, expiry pointer, etc.) regardless of the value size. If you have millions of tiny values, the key overhead dominates. Group related small values into hashes to reduce the number of keys.

04

Monitoring & Metrics

You can't fix what you can't see. Redis exposes a rich set of metrics through the INFO command. Knowing which metrics matter — and what their values mean — is the difference between catching problems early and debugging outages at 3 AM.

INFO Command — Overviewbash
# Full info dump (all sections)
INFO

# Specific sections
INFO memory          # memory usage, fragmentation, peak
INFO stats           # ops/sec, hits/misses, connections
INFO replication     # master/replica status, lag
INFO clients         # connected clients, blocked clients
INFO keyspace        # keys per database, expires
INFO server          # version, uptime, config file
INFO persistence     # RDB/AOF status, last save time

# Single stat shortcut
INFO stats | grep instantaneous_ops_per_sec

Key Metrics to Monitor

MetricWhat It Tells YouHealthy RangeRed Flag
used_memoryTotal bytes allocated by RedisBelow maxmemory>90% of maxmemory (evictions imminent)
mem_fragmentation_ratioRSS / used_memory — OS vs Redis view1.0–1.5<1.0 (swapping) or >2.0 (heavy fragmentation)
connected_clientsNumber of active client connectionsStable, within maxclientsSudden spikes or approaching maxclients
instantaneous_ops_per_secCurrent throughputConsistent with baselineSudden drops (server blocked) or spikes
keyspace_hits / keyspace_missesCache hit rateHit rate >95%Hit rate <80% (cache not effective)
rdb_last_bgsave_statusLast RDB snapshot resultokerr (persistence broken, data loss risk)
latest_fork_usecTime to fork for RDB/AOF rewrite<500ms>1s (large dataset, causes latency spike)

Hit Rate Calculation

Cache Hit Rate — The Most Important Metrictext
# Formula:
hit_rate = keyspace_hits / (keyspace_hits + keyspace_misses) × 100

# Example from INFO stats:
keyspace_hits:   9,500,000
keyspace_misses:   500,000
hit_rate = 9,500,000 / 10,000,000 = 95%  ✅ Healthy

# What hit rates mean:
# >99%  → Excellent. Cache is highly effective.
# 95-99% → Good. Normal for most workloads.
# 80-95% → Concerning. Review TTLs, key patterns, eviction.
# <80%  → Bad. Cache is not protecting the database.
#          Possible causes:
#          - TTLs too short (keys expire before reuse)
#          - Working set larger than maxmemory (evictions)
#          - Cold start (cache not yet warmed)
#          - Wrong data being cached (cache what's read often)

# Monitor hit rate over time, not as a snapshot.
# A drop from 98% to 85% over an hour = investigate immediately.

Latency Monitoring

LATENCY — Tracking Latency Eventsbash
# Enable latency monitoring (threshold in ms)
CONFIG SET latency-monitor-threshold 5

# View latency history for specific events
LATENCY HISTORY command
# Returns timestamped latency samples:
# 1) 1) (integer) 1693420800timestamp
#    2) (integer) 12latency in ms

# View latest latency for all event types
LATENCY LATEST
# 1) 1) "command"event type
#    2) (integer) 1693420800last occurrence
#    3) (integer) 12latest latency (ms)
#    4) (integer) 38all-time max latency (ms)

# Reset latency data
LATENCY RESET

# Built-in latency diagnostic
redis-cli --latency           # continuous ping test
redis-cli --latency-history   # latency over time (15s intervals)
redis-cli --latency-dist      # latency distribution (spectrum)

# Intrinsic latency test (measures system, not Redis)
redis-cli --intrinsic-latency 5   # test for 5 seconds

💡 Monitoring Stack

In production, export Redis metrics to Prometheus using redis_exporter and visualize with Grafana. Set alerts on: memory usage >85%, hit rate <90%, connected clients spike, and replication lag >1s. The INFO command is for debugging — automated monitoring catches problems while you sleep.

05

Redis vs Alternatives

Redis is powerful, but it's not the right tool for every job. Understanding when to use Redis — and when to reach for something else — is a critical production skill and a common interview topic.

FeatureRedisMemcachedMongoDBetcdKafka
Primary UseCache, sessions, real-time dataSimple key-value cacheDocument databaseDistributed config/coordinationEvent streaming / message queue
Data ModelStrings, Hashes, Lists, Sets, Sorted Sets, StreamsStrings only (key → blob)JSON documents (BSON)Key-value (small values)Append-only log (topics/partitions)
PersistenceOptional (RDB, AOF)None (pure cache)Yes (disk-based)Yes (Raft consensus)Yes (disk-based log)
Throughput~100K ops/sec (single node)~200K ops/sec (multi-threaded)~20K-50K ops/sec~10K ops/sec~100K msgs/sec per partition
Data SizeMust fit in RAMMust fit in RAMDisk-based (TBs)Small (few GB max)Disk-based (TBs of logs)
ClusteringRedis Cluster (hash slots)Client-side shardingBuilt-in sharding (mongos)Raft consensus (3-5 nodes)Partitions across brokers
Best ForCaching, leaderboards, rate limiting, pub/subSimple caching at massive scaleFlexible schema, complex queriesService discovery, leader election, configEvent sourcing, log aggregation, streaming

When Redis Is NOT the Right Choice

1

Data Larger Than RAM

Redis stores everything in memory. If your dataset is 500GB and your server has 64GB RAM, Redis won't work. Use a disk-based database (PostgreSQL, MongoDB) or a tiered solution (Redis on Flash). Redis is a cache or working-set store, not a primary database for large datasets.

2

Complex Queries and Joins

Redis has no query language, no joins, no aggregations. If you need 'SELECT users WHERE age > 25 AND city = NYC ORDER BY signup_date', use PostgreSQL or MongoDB. Redis is for key-based lookups, not ad-hoc queries.

3

Strong Consistency Requirements

Redis replication is asynchronous by default. A write acknowledged by the master may not yet be on the replica. If the master crashes, that write is lost. For strong consistency (banking, inventory counts), use a database with synchronous replication or a consensus system like etcd/ZooKeeper.

4

Durable Message Queuing

Redis Streams and Pub/Sub work for lightweight messaging, but they lack the durability guarantees of Kafka or RabbitMQ. If you need guaranteed delivery, message replay, and consumer group management at scale, use a dedicated message broker.

5

Multi-Key ACID Transactions

Redis MULTI/EXEC provides atomicity for a single connection, but not isolation across clients. There's no rollback on partial failure. For true ACID transactions across multiple entities, use a relational database.

Decision Framework

Should You Use Redis? — Decision Treetext
Q: Does the data fit in RAM?
  NOUse a disk-based database (PostgreSQL, MongoDB)
  YES

Q: Do you need complex queries (joins, aggregations, filters)?
  YESUse PostgreSQL/MongoDB. Cache hot results in Redis.
  NO

Q: Do you need strong consistency (zero data loss)?
  YESUse PostgreSQL with synchronous replication, or etcd.
        Redis can still be a cache layer in front.
  NO

Q: What's the access pattern?
  Key-value lookupsRedis (Strings, Hashes)
  Ranked dataRedis (Sorted Sets)
  Queue / task systemRedis (Lists, Streams) or Kafka for scale
  Pub/Sub messagingRedis Pub/Sub (small scale) or Kafka (large scale)
  Session storageRedis (Strings with TTL)
  Rate limitingRedis (INCR + EXPIRE)
  Real-time analyticsRedis (HyperLogLog, Bitmaps)

Rule of thumb: Redis is best as a cache, session store, or
real-time data layernot as a primary database.

🎯 Interview Insight

Interviewers love asking "when would you NOT use Redis?" The answer shows maturity: "Redis is wrong when data exceeds RAM, when you need complex queries or joins, when you need strong consistency, or when you need durable message queuing at scale. Redis excels as a cache, session store, and real-time data layer."

06

Production Checklist

Before deploying Redis to production, every item on this checklist should be configured and verified. Skipping any of these is a ticking time bomb.

Memory & Eviction

  • Set maxmemory to 70-80% of available RAM (leave room for fork overhead and OS)
  • Configure an eviction policy — allkeys-lru for caches, noeviction for primary data stores
  • Monitor used_memory and set alerts at 85% and 95% of maxmemory
  • Test what happens when maxmemory is reached — does your app handle eviction/rejection gracefully?

Persistence & Durability

  • Choose persistence mode: RDB for snapshots, AOF for durability, both for safety
  • For AOF, use appendfsync everysec (balance between durability and performance)
  • Schedule RDB snapshots during low-traffic windows
  • Test recovery: stop Redis, restart from RDB/AOF, verify data integrity
  • Store backups off-server (S3, GCS) — a backup on the same disk is not a backup

Security

  • Set requirepass with a strong password (Redis has no auth by default)
  • Bind to specific interfaces: bind 127.0.0.1 or your private network IP
  • Never expose Redis to the public internet (port 6379 is actively scanned)
  • Disable dangerous commands in production: rename-command FLUSHALL '' and rename-command KEYS ''
  • Use TLS for connections if Redis is accessed over a network
  • Enable ACLs (Redis 6.0+) for fine-grained user permissions

Monitoring & Alerting

  • Export metrics to Prometheus/Grafana using redis_exporter
  • Alert on: used_memory >85%, hit rate <90%, connected_clients spike, replication lag >1s
  • Configure SLOWLOG with a 5ms threshold and review weekly
  • Enable LATENCY MONITOR with a 5ms threshold
  • Monitor rdb_last_bgsave_status — a failed save means persistence is broken

High Availability

  • Deploy at least one replica for read scaling and failover
  • Use Redis Sentinel (3+ nodes) for automatic failover, or Redis Cluster for sharding
  • Test failover: kill the master, verify Sentinel promotes a replica within seconds
  • Set min-replicas-to-write 1 to prevent writes when no replicas are connected
  • Configure client libraries with Sentinel/Cluster awareness for automatic reconnection

Connection Management

  • Use connection pooling in your application (don't create a new connection per request)
  • Set timeout to close idle connections (default 0 = never, set to 300 seconds)
  • Set maxclients appropriately (default 10,000 — lower if your server has limited file descriptors)
  • Set tcp-keepalive to 60 seconds to detect dead connections
redis.conf — Production Essentialsbash
# Memory
maxmemory 12gb
maxmemory-policy allkeys-lru

# Persistence (both RDB + AOF for safety)
save 900 1
save 300 10
save 60 10000
appendonly yes
appendfsync everysec

# Security
requirepass your-strong-password-here
bind 127.0.0.1 10.0.1.0
rename-command FLUSHALL ""
rename-command FLUSHDB ""
rename-command KEYS ""
rename-command DEBUG ""

# Connections
timeout 300
tcp-keepalive 60
maxclients 5000

# Slow log
slowlog-log-slower-than 5000
slowlog-max-len 256

# Latency monitoring
latency-monitor-threshold 5

# Replication safety
min-replicas-to-write 1
min-replicas-max-lag 10
07

Interview Questions

These questions test whether you can operate Redis in production — not just use it as a cache.

Q:How would you diagnose a sudden latency spike in Redis?

A: Start with SLOWLOG GET 10 to see if any slow commands ran recently. Check INFO stats for instantaneous_ops_per_sec — a drop means the server was blocked. Check INFO memory for mem_fragmentation_ratio — if it's <1.0, Redis is swapping to disk. Check latest_fork_usec — a large value means an RDB save or AOF rewrite caused a fork spike. Run LATENCY LATEST to see timestamped latency events. Common culprits: O(N) commands (KEYS, HGETALL on large hashes), RDB fork on a large dataset, memory swapping, or network issues. Fix: identify the slow command via SLOWLOG, replace it with a SCAN-based alternative, and set slowlog-log-slower-than to catch future offenders.

Q:Your Redis instance is using 95% of maxmemory. What do you do?

A: Immediate: check the eviction policy — if it's noeviction, writes are being rejected. Switch to allkeys-lru if this is a cache. Short-term: use MEMORY USAGE on suspect keys to find memory hogs. Run redis-rdb-tools on an RDB dump to get a full memory breakdown by key pattern. Look for: large hashes/sets that grew unbounded, keys with no TTL that should have one, and duplicate data stored under different key patterns. Medium-term: add TTLs to keys that don't have them, compress large values, switch from individual keys to hashes (saves per-key overhead), and consider Redis Cluster to shard across multiple nodes. Long-term: review your data model — are you storing data in Redis that belongs in a database?

Q:Why is KEYS * dangerous and what should you use instead?

A: KEYS * is O(N) where N is the total number of keys in the database. It scans every single key and blocks the Redis server until complete. On an instance with 10M keys, this can take several seconds — during which every other client is blocked. In production, this causes timeouts and cascading failures. Use SCAN instead: it's cursor-based, returning ~COUNT keys per call without blocking the server. SCAN 0 MATCH user:* COUNT 100 returns a batch of matching keys and a cursor for the next batch. Each call is O(COUNT), not O(N). The trade-off: SCAN may return duplicates and doesn't guarantee consistency if keys are added/removed during iteration — but it won't crash your production server.

Q:How do you decide between Redis and Memcached for caching?

A: Use Memcached when: you need a simple key-value cache with string values only, you want multi-threaded performance (Memcached uses multiple cores, Redis is single-threaded), and you don't need persistence or data structures. Use Redis when: you need data structures beyond strings (hashes, sorted sets, lists), you need persistence (survive restarts), you need pub/sub or Lua scripting, you need atomic operations on complex types (ZINCRBY, LPUSH), or you need features like TTL per key, transactions, or Streams. In practice, Redis is the default choice for most teams because its feature set covers Memcached's use case plus much more. Memcached wins only on raw multi-threaded throughput for simple string caching.

Q:What eviction policy would you choose for a Redis cache vs. a Redis session store?

A: For a cache (expendable data, can be re-fetched): use allkeys-lru. When memory is full, Redis evicts the least recently used key across all keys. This is ideal because any cached value can be regenerated from the source. For a session store (user sessions that shouldn't be randomly dropped): use volatile-lru — only evict keys that have a TTL set. Sessions naturally have TTLs (e.g., 30 minutes), so expired sessions get evicted first. Active sessions without expired TTLs are preserved. Alternative: volatile-ttl evicts keys closest to expiration first, which is also good for sessions. Never use noeviction for a cache — it causes write errors when memory is full. Never use allkeys-random — it evicts active hot keys as readily as cold ones.

08

Common Mistakes

These mistakes have caused real production incidents. Each one is preventable with the right configuration and awareness.

🔓

Running Redis without authentication on a network

Redis ships with no password by default. If it's bound to 0.0.0.0 (all interfaces), anyone on the network — or the internet — can connect, read all data, and run FLUSHALL. Attackers actively scan port 6379. There have been widespread attacks where exposed Redis instances were used to write SSH keys and gain server access.

Always set requirepass in redis.conf. Bind to 127.0.0.1 or your private network interface. Never expose port 6379 to the public internet. Use firewall rules as a second layer. On Redis 6.0+, use ACLs for per-user permissions.

💾

Not setting maxmemory on a cache

Without maxmemory, Redis grows until it consumes all available RAM. The OS starts swapping Redis memory to disk, which makes Redis 100x slower. Eventually the OOM killer terminates the Redis process — or worse, another critical process. All cached data is lost, and the database gets hammered.

Always set maxmemory to 70-80% of available RAM (leave room for fork overhead during RDB saves). Set an eviction policy — allkeys-lru for caches. Monitor used_memory and alert at 85%. Test what happens when maxmemory is reached: does your application handle it gracefully?

🐌

Using KEYS * in application code

A developer writes KEYS user:* to find all user keys. It works in development with 100 keys. In production with 10 million keys, it blocks the Redis server for 5 seconds. Every other client times out. The monitoring system detects Redis as 'down' and triggers alerts. The developer doesn't realize KEYS is O(N) because it worked fine locally.

Ban KEYS from application code entirely. Use rename-command KEYS '' in redis.conf to disable it. Use SCAN for iteration — it's cursor-based and non-blocking. For finding keys by pattern, maintain a secondary index (a Set containing all keys of a type) instead of scanning the keyspace.

📈

Ignoring memory fragmentation

Redis allocates and frees memory frequently. Over time, the allocator (jemalloc) can't reuse freed blocks efficiently, leading to fragmentation. INFO memory shows used_memory at 8GB but used_memory_rss at 14GB — the OS allocated 14GB but Redis only uses 8GB. The extra 6GB is wasted. If maxmemory is set to 12GB, Redis thinks it has 4GB free, but the OS is already at 14GB.

Monitor mem_fragmentation_ratio (RSS / used_memory). Healthy is 1.0-1.5. Above 2.0, consider restarting Redis during a maintenance window to reset fragmentation. Redis 4.0+ has activedefrag yes which defragments memory online without restart. For write-heavy workloads with variable-size values, fragmentation is expected — plan your maxmemory accordingly.