Redis Production & Operations
Running Redis in production — performance benchmarks, slow command pitfalls, memory optimization, monitoring, and knowing when Redis is NOT the right choice.
Table of Contents
Benchmarking & Limits
Redis is single-threaded for command execution. One core processes all commands sequentially. Despite this, Redis achieves remarkable throughput because it operates entirely in memory and uses an efficient event loop (epoll/kqueue). Understanding the theoretical limits helps you plan capacity and identify bottlenecks before they hit production.
The Single-Lane Highway
Redis is like a single-lane highway with no speed limit. Only one car passes at a time, but each car moves at 300 mph. You can push ~100,000 cars per second through that lane. The bottleneck isn't the lane speed — it's what happens before and after: the on-ramp (network latency), the car size (value serialization), and slow drivers (O(N) commands) that block everyone behind them.
# Basic benchmark: 100K requests, 50 parallel connections, SET/GET redis-benchmark -h 127.0.0.1 -p 6379 -n 100000 -c 50 # Benchmark specific commands redis-benchmark -t set,get,incr,lpush,rpush -n 100000 -q # Benchmark with pipelining (batching 16 commands per round-trip) redis-benchmark -t set -n 100000 -P 16 -q # Benchmark with specific data size (1KB values) redis-benchmark -t set -n 100000 -d 1024 -q # Sample output: # SET: 112,359.55 requests per second # GET: 118,483.41 requests per second # INCR: 115,606.94 requests per second # LPUSH: 114,025.09 requests per second # With pipelining (P=16): ~800,000 requests per second
Throughput Expectations
| Scenario | Throughput | Notes |
|---|---|---|
| Simple GET/SET (small values) | ~100K–120K ops/sec | Single core, no pipelining, <1KB values |
| With pipelining (P=16) | ~500K–800K ops/sec | Batches 16 commands per network round-trip |
| Large values (10KB+) | ~30K–50K ops/sec | Serialization and network become bottlenecks |
| Complex commands (ZRANGEBYSCORE) | ~20K–60K ops/sec | Depends on set size and result count |
| Lua scripts (simple) | ~80K–100K ops/sec | Atomic execution, avoids round-trips |
Latency Sources
Network Round-Trip
The biggest latency contributor in most setups. A local Redis call takes ~0.1ms. Over a network, it's 0.5–2ms. Cross-region calls can be 50–100ms. Solution: co-locate Redis with your application servers. Use pipelining to batch multiple commands into a single round-trip.
Value Serialization
Large values (10KB+) take time to serialize, transmit, and deserialize. A 1MB value takes ~10x longer than a 1KB value. Solution: keep values small. Compress large values with LZ4 or Snappy before storing. Split large objects into smaller keys.
Slow Commands
O(N) commands like KEYS *, SMEMBERS on a 1M-element set, or HGETALL on a huge hash block the entire server. While that command runs, every other client waits. Solution: avoid O(N) commands in production. Use SCAN for iteration. Set slowlog thresholds to catch offenders.
Persistence I/O
RDB snapshots fork the process, which can cause latency spikes on large datasets. AOF fsync=always adds disk I/O to every write. Solution: use fsync=everysec for AOF. Schedule RDB snapshots during low-traffic periods. Monitor fork time with INFO stats.
🎯 Interview Insight
When asked "how fast is Redis?" — don't just say "it's fast." Say: "~100K ops/sec on a single core for simple GET/SET with small values. With pipelining, 500K+. The real bottlenecks are network latency, large values, and O(N) commands — not Redis itself."
Slow Commands to Avoid
Because Redis is single-threaded, one slow command blocks every other client. A single KEYS * on a database with 10 million keys can freeze your entire Redis instance for seconds. In production, this is an outage.
⚠️ KEYS * Is a Production Killer
Never run KEYS * in production. It scans every key in the database — O(N) where N is the total number of keys. On a Redis instance with 10M keys, this blocks the server for several seconds. Every other client times out. Use SCAN instead.
Dangerous O(N) Commands
| Command | Time Complexity | Why It's Dangerous | Safe Alternative |
|---|---|---|---|
| KEYS * | O(N) — all keys | Scans entire keyspace, blocks server | SCAN with cursor-based iteration |
| SMEMBERS | O(N) — set size | Returns all members of a large set at once | SSCAN for iteration, or SRANDMEMBER for sampling |
| HGETALL | O(N) — hash fields | Returns all fields of a large hash at once | HSCAN for iteration, or HMGET for specific fields |
| LRANGE 0 -1 | O(N) — list length | Returns entire list, blocks on large lists | LRANGE with bounded offsets (paginate) |
| SORT | O(N+M*log(M)) | Sorts in-place, expensive on large collections | Use Sorted Sets (ZRANGEBYSCORE) instead |
| FLUSHDB / FLUSHALL | O(N) — all keys | Deletes everything, blocks until complete | FLUSHDB ASYNC (Redis 4.0+) for non-blocking |
SCAN vs KEYS — Cursor-Based Iteration
# ❌ DANGEROUS: blocks the server until all keys are scanned KEYS user:* # ✅ SAFE: cursor-based iteration, returns ~10 keys per call # Start with cursor 0 SCAN 0 MATCH user:* COUNT 100 # Returns: next_cursor + batch of matching keys # 1) "17920" ← next cursor (0 means done) # 2) 1) "user:42" # 2) "user:108" # 3) "user:7" # Continue with returned cursor SCAN 17920 MATCH user:* COUNT 100 # Keep going until cursor returns "0" # Each call takes O(COUNT) time — doesn't block the server # Same pattern for other data structures: SSCAN myset 0 COUNT 100 # iterate set members HSCAN myhash 0 COUNT 100 # iterate hash fields ZSCAN myzset 0 COUNT 100 # iterate sorted set members
Auditing with SLOWLOG
# Configure slowlog threshold (in microseconds) # Log any command that takes longer than 10ms CONFIG SET slowlog-log-slower-than 10000 # Keep the last 128 slow commands CONFIG SET slowlog-max-len 128 # View the slowlog SLOWLOG GET 10 # Sample output: # 1) 1) (integer) 14 ← entry ID # 2) (integer) 1693420800 ← Unix timestamp # 3) (integer) 38102 ← execution time (μs) = 38ms # 4) 1) "KEYS" ← the command # 2) "session:*" ← the argument # 5) "10.0.1.42:52340" ← client address # Reset the slowlog SLOWLOG RESET # Check how many entries are in the slowlog SLOWLOG LEN
💡 Production Tip
Set slowlog-log-slower-than to 5000 (5ms) in production. Review the slowlog weekly. Common offenders: KEYS commands from admin scripts, HGETALL on growing hashes, and LRANGE on unbounded lists. Fix them before they cause outages.
Memory Optimization
Redis stores everything in memory, so every byte counts. Understanding how Redis encodes data internally — and when it switches between compact and full encodings — is the key to fitting more data into less RAM.
Packing a Suitcase
Small items (socks, underwear) can be rolled up tightly and stuffed into corners — this is ziplist encoding. But once you add too many items or something bulky (a winter coat), you need to switch to a bigger suitcase with compartments — this is hashtable encoding. Redis does the same: small hashes, lists, and sets use compact ziplist encoding. Once they grow past a threshold, Redis switches to a full data structure that uses more memory but handles large sizes efficiently.
Memory Encoding Thresholds
| Data Type | Compact Encoding | Switches To | Threshold |
|---|---|---|---|
| Hash | ziplist (listpack in 7.0+) | hashtable | >128 fields OR any value >64 bytes |
| List | ziplist / quicklist | quicklist with larger nodes | >128 elements OR any element >64 bytes |
| Set | intset (integers only) | hashtable | >128 elements OR any non-integer member |
| Sorted Set | ziplist (listpack) | skiplist + hashtable | >128 elements OR any member >64 bytes |
# Check current thresholds CONFIG GET hash-max-ziplist-entries CONFIG GET hash-max-ziplist-value CONFIG GET list-max-ziplist-size CONFIG GET set-max-intset-entries CONFIG GET zset-max-ziplist-entries CONFIG GET zset-max-ziplist-value # Increase hash ziplist threshold (default: 128 entries, 64 bytes) # If your hashes have 200 small fields, raising this saves memory CONFIG SET hash-max-ziplist-entries 256 CONFIG SET hash-max-ziplist-value 128 # Trade-off: higher thresholds = less memory, but slower O(N) scans # on the ziplist. Sweet spot is usually 128-512 entries. # Redis 7.0+ uses listpack instead of ziplist (same concept, better impl) CONFIG GET hash-max-listpack-entries CONFIG GET hash-max-listpack-value
Memory Analysis Tools
# Check memory usage of a specific key (Redis 4.0+) MEMORY USAGE user:42 # (integer) 72 ← 72 bytes including key overhead # Check memory usage with samples for aggregate types MEMORY USAGE myhash SAMPLES 5 # Samples 5 random fields to estimate total memory # Overall memory stats INFO memory # used_memory: 1,234,567,890 ← total bytes used # used_memory_human: 1.15G ← human-readable # used_memory_rss: 1,400,000,000 ← RSS (actual OS allocation) # mem_fragmentation_ratio: 1.13 ← RSS / used_memory # used_memory_peak: 2,000,000,000 ← historical peak # Memory fragmentation ratio: # < 1.0 → Redis is swapping (BAD — add more RAM) # 1.0-1.5 → healthy # > 1.5 → significant fragmentation (consider restart) # Memory doctor (Redis 4.0+) MEMORY DOCTOR # Returns advice about memory issues # External tool: redis-rdb-tools (analyze RDB dump offline) # pip install rdbtools # rdb --command memory dump.rdb --bytes 128 -f memory.csv # Generates CSV with key, type, encoding, size, num_elements
Key Naming Conventions & Memory Impact
# Every key name is stored in memory. Shorter = less RAM. # ❌ Verbose keys (wastes memory at scale) user:profile:details:john.doe@example.com:settings:notifications # ~60 bytes just for the key name × 10M users = 600MB wasted # ✅ Compact keys (saves memory) u:42:s:n # ~8 bytes × 10M users = 80MB # ✅ Balanced approach (readable + compact) u:{id}:profile instead of user:profile:details:{email} s:{id}:cart instead of shopping:cart:items:user:{id} sess:{token} instead of session:auth:token:{full-uuid} # Use hashes to group related data (1 key instead of 5) # ❌ 5 separate keys per user: SET u:42:name "John" SET u:42:email "john@example.com" SET u:42:age "30" SET u:42:city "NYC" SET u:42:role "admin" # Overhead: 5 keys × ~50 bytes overhead each = 250 bytes # ✅ 1 hash with 5 fields: HSET u:42 name "John" email "john@example.com" age 30 city "NYC" role "admin" # Overhead: 1 key × ~50 bytes + ziplist encoding = ~120 bytes # Saves ~50% memory for small field counts
🎯 Memory Rule of Thumb
Each Redis key has ~50-70 bytes of overhead (for the dict entry, SDS string, expiry pointer, etc.) regardless of the value size. If you have millions of tiny values, the key overhead dominates. Group related small values into hashes to reduce the number of keys.
Monitoring & Metrics
You can't fix what you can't see. Redis exposes a rich set of metrics through the INFO command. Knowing which metrics matter — and what their values mean — is the difference between catching problems early and debugging outages at 3 AM.
# Full info dump (all sections) INFO # Specific sections INFO memory # memory usage, fragmentation, peak INFO stats # ops/sec, hits/misses, connections INFO replication # master/replica status, lag INFO clients # connected clients, blocked clients INFO keyspace # keys per database, expires INFO server # version, uptime, config file INFO persistence # RDB/AOF status, last save time # Single stat shortcut INFO stats | grep instantaneous_ops_per_sec
Key Metrics to Monitor
| Metric | What It Tells You | Healthy Range | Red Flag |
|---|---|---|---|
| used_memory | Total bytes allocated by Redis | Below maxmemory | >90% of maxmemory (evictions imminent) |
| mem_fragmentation_ratio | RSS / used_memory — OS vs Redis view | 1.0–1.5 | <1.0 (swapping) or >2.0 (heavy fragmentation) |
| connected_clients | Number of active client connections | Stable, within maxclients | Sudden spikes or approaching maxclients |
| instantaneous_ops_per_sec | Current throughput | Consistent with baseline | Sudden drops (server blocked) or spikes |
| keyspace_hits / keyspace_misses | Cache hit rate | Hit rate >95% | Hit rate <80% (cache not effective) |
| rdb_last_bgsave_status | Last RDB snapshot result | ok | err (persistence broken, data loss risk) |
| latest_fork_usec | Time to fork for RDB/AOF rewrite | <500ms | >1s (large dataset, causes latency spike) |
Hit Rate Calculation
# Formula: hit_rate = keyspace_hits / (keyspace_hits + keyspace_misses) × 100 # Example from INFO stats: keyspace_hits: 9,500,000 keyspace_misses: 500,000 hit_rate = 9,500,000 / 10,000,000 = 95% ✅ Healthy # What hit rates mean: # >99% → Excellent. Cache is highly effective. # 95-99% → Good. Normal for most workloads. # 80-95% → Concerning. Review TTLs, key patterns, eviction. # <80% → Bad. Cache is not protecting the database. # Possible causes: # - TTLs too short (keys expire before reuse) # - Working set larger than maxmemory (evictions) # - Cold start (cache not yet warmed) # - Wrong data being cached (cache what's read often) # Monitor hit rate over time, not as a snapshot. # A drop from 98% to 85% over an hour = investigate immediately.
Latency Monitoring
# Enable latency monitoring (threshold in ms) CONFIG SET latency-monitor-threshold 5 # View latency history for specific events LATENCY HISTORY command # Returns timestamped latency samples: # 1) 1) (integer) 1693420800 ← timestamp # 2) (integer) 12 ← latency in ms # View latest latency for all event types LATENCY LATEST # 1) 1) "command" ← event type # 2) (integer) 1693420800 ← last occurrence # 3) (integer) 12 ← latest latency (ms) # 4) (integer) 38 ← all-time max latency (ms) # Reset latency data LATENCY RESET # Built-in latency diagnostic redis-cli --latency # continuous ping test redis-cli --latency-history # latency over time (15s intervals) redis-cli --latency-dist # latency distribution (spectrum) # Intrinsic latency test (measures system, not Redis) redis-cli --intrinsic-latency 5 # test for 5 seconds
💡 Monitoring Stack
In production, export Redis metrics to Prometheus using redis_exporter and visualize with Grafana. Set alerts on: memory usage >85%, hit rate <90%, connected clients spike, and replication lag >1s. The INFO command is for debugging — automated monitoring catches problems while you sleep.
Redis vs Alternatives
Redis is powerful, but it's not the right tool for every job. Understanding when to use Redis — and when to reach for something else — is a critical production skill and a common interview topic.
| Feature | Redis | Memcached | MongoDB | etcd | Kafka |
|---|---|---|---|---|---|
| Primary Use | Cache, sessions, real-time data | Simple key-value cache | Document database | Distributed config/coordination | Event streaming / message queue |
| Data Model | Strings, Hashes, Lists, Sets, Sorted Sets, Streams | Strings only (key → blob) | JSON documents (BSON) | Key-value (small values) | Append-only log (topics/partitions) |
| Persistence | Optional (RDB, AOF) | None (pure cache) | Yes (disk-based) | Yes (Raft consensus) | Yes (disk-based log) |
| Throughput | ~100K ops/sec (single node) | ~200K ops/sec (multi-threaded) | ~20K-50K ops/sec | ~10K ops/sec | ~100K msgs/sec per partition |
| Data Size | Must fit in RAM | Must fit in RAM | Disk-based (TBs) | Small (few GB max) | Disk-based (TBs of logs) |
| Clustering | Redis Cluster (hash slots) | Client-side sharding | Built-in sharding (mongos) | Raft consensus (3-5 nodes) | Partitions across brokers |
| Best For | Caching, leaderboards, rate limiting, pub/sub | Simple caching at massive scale | Flexible schema, complex queries | Service discovery, leader election, config | Event sourcing, log aggregation, streaming |
When Redis Is NOT the Right Choice
Data Larger Than RAM
Redis stores everything in memory. If your dataset is 500GB and your server has 64GB RAM, Redis won't work. Use a disk-based database (PostgreSQL, MongoDB) or a tiered solution (Redis on Flash). Redis is a cache or working-set store, not a primary database for large datasets.
Complex Queries and Joins
Redis has no query language, no joins, no aggregations. If you need 'SELECT users WHERE age > 25 AND city = NYC ORDER BY signup_date', use PostgreSQL or MongoDB. Redis is for key-based lookups, not ad-hoc queries.
Strong Consistency Requirements
Redis replication is asynchronous by default. A write acknowledged by the master may not yet be on the replica. If the master crashes, that write is lost. For strong consistency (banking, inventory counts), use a database with synchronous replication or a consensus system like etcd/ZooKeeper.
Durable Message Queuing
Redis Streams and Pub/Sub work for lightweight messaging, but they lack the durability guarantees of Kafka or RabbitMQ. If you need guaranteed delivery, message replay, and consumer group management at scale, use a dedicated message broker.
Multi-Key ACID Transactions
Redis MULTI/EXEC provides atomicity for a single connection, but not isolation across clients. There's no rollback on partial failure. For true ACID transactions across multiple entities, use a relational database.
Decision Framework
Q: Does the data fit in RAM? NO → Use a disk-based database (PostgreSQL, MongoDB) YES ↓ Q: Do you need complex queries (joins, aggregations, filters)? YES → Use PostgreSQL/MongoDB. Cache hot results in Redis. NO ↓ Q: Do you need strong consistency (zero data loss)? YES → Use PostgreSQL with synchronous replication, or etcd. Redis can still be a cache layer in front. NO ↓ Q: What's the access pattern? Key-value lookups → Redis (Strings, Hashes) Ranked data → Redis (Sorted Sets) Queue / task system → Redis (Lists, Streams) or Kafka for scale Pub/Sub messaging → Redis Pub/Sub (small scale) or Kafka (large scale) Session storage → Redis (Strings with TTL) Rate limiting → Redis (INCR + EXPIRE) Real-time analytics → Redis (HyperLogLog, Bitmaps) Rule of thumb: Redis is best as a cache, session store, or real-time data layer — not as a primary database.
🎯 Interview Insight
Interviewers love asking "when would you NOT use Redis?" The answer shows maturity: "Redis is wrong when data exceeds RAM, when you need complex queries or joins, when you need strong consistency, or when you need durable message queuing at scale. Redis excels as a cache, session store, and real-time data layer."
Production Checklist
Before deploying Redis to production, every item on this checklist should be configured and verified. Skipping any of these is a ticking time bomb.
Memory & Eviction
- ✅Set maxmemory to 70-80% of available RAM (leave room for fork overhead and OS)
- ✅Configure an eviction policy — allkeys-lru for caches, noeviction for primary data stores
- ✅Monitor used_memory and set alerts at 85% and 95% of maxmemory
- ✅Test what happens when maxmemory is reached — does your app handle eviction/rejection gracefully?
Persistence & Durability
- ✅Choose persistence mode: RDB for snapshots, AOF for durability, both for safety
- ✅For AOF, use appendfsync everysec (balance between durability and performance)
- ✅Schedule RDB snapshots during low-traffic windows
- ✅Test recovery: stop Redis, restart from RDB/AOF, verify data integrity
- ✅Store backups off-server (S3, GCS) — a backup on the same disk is not a backup
Security
- ✅Set requirepass with a strong password (Redis has no auth by default)
- ✅Bind to specific interfaces: bind 127.0.0.1 or your private network IP
- ✅Never expose Redis to the public internet (port 6379 is actively scanned)
- ✅Disable dangerous commands in production: rename-command FLUSHALL '' and rename-command KEYS ''
- ✅Use TLS for connections if Redis is accessed over a network
- ✅Enable ACLs (Redis 6.0+) for fine-grained user permissions
Monitoring & Alerting
- ✅Export metrics to Prometheus/Grafana using redis_exporter
- ✅Alert on: used_memory >85%, hit rate <90%, connected_clients spike, replication lag >1s
- ✅Configure SLOWLOG with a 5ms threshold and review weekly
- ✅Enable LATENCY MONITOR with a 5ms threshold
- ✅Monitor rdb_last_bgsave_status — a failed save means persistence is broken
High Availability
- ✅Deploy at least one replica for read scaling and failover
- ✅Use Redis Sentinel (3+ nodes) for automatic failover, or Redis Cluster for sharding
- ✅Test failover: kill the master, verify Sentinel promotes a replica within seconds
- ✅Set min-replicas-to-write 1 to prevent writes when no replicas are connected
- ✅Configure client libraries with Sentinel/Cluster awareness for automatic reconnection
Connection Management
- ✅Use connection pooling in your application (don't create a new connection per request)
- ✅Set timeout to close idle connections (default 0 = never, set to 300 seconds)
- ✅Set maxclients appropriately (default 10,000 — lower if your server has limited file descriptors)
- ✅Set tcp-keepalive to 60 seconds to detect dead connections
# Memory maxmemory 12gb maxmemory-policy allkeys-lru # Persistence (both RDB + AOF for safety) save 900 1 save 300 10 save 60 10000 appendonly yes appendfsync everysec # Security requirepass your-strong-password-here bind 127.0.0.1 10.0.1.0 rename-command FLUSHALL "" rename-command FLUSHDB "" rename-command KEYS "" rename-command DEBUG "" # Connections timeout 300 tcp-keepalive 60 maxclients 5000 # Slow log slowlog-log-slower-than 5000 slowlog-max-len 256 # Latency monitoring latency-monitor-threshold 5 # Replication safety min-replicas-to-write 1 min-replicas-max-lag 10
Interview Questions
These questions test whether you can operate Redis in production — not just use it as a cache.
Q:How would you diagnose a sudden latency spike in Redis?
A: Start with SLOWLOG GET 10 to see if any slow commands ran recently. Check INFO stats for instantaneous_ops_per_sec — a drop means the server was blocked. Check INFO memory for mem_fragmentation_ratio — if it's <1.0, Redis is swapping to disk. Check latest_fork_usec — a large value means an RDB save or AOF rewrite caused a fork spike. Run LATENCY LATEST to see timestamped latency events. Common culprits: O(N) commands (KEYS, HGETALL on large hashes), RDB fork on a large dataset, memory swapping, or network issues. Fix: identify the slow command via SLOWLOG, replace it with a SCAN-based alternative, and set slowlog-log-slower-than to catch future offenders.
Q:Your Redis instance is using 95% of maxmemory. What do you do?
A: Immediate: check the eviction policy — if it's noeviction, writes are being rejected. Switch to allkeys-lru if this is a cache. Short-term: use MEMORY USAGE on suspect keys to find memory hogs. Run redis-rdb-tools on an RDB dump to get a full memory breakdown by key pattern. Look for: large hashes/sets that grew unbounded, keys with no TTL that should have one, and duplicate data stored under different key patterns. Medium-term: add TTLs to keys that don't have them, compress large values, switch from individual keys to hashes (saves per-key overhead), and consider Redis Cluster to shard across multiple nodes. Long-term: review your data model — are you storing data in Redis that belongs in a database?
Q:Why is KEYS * dangerous and what should you use instead?
A: KEYS * is O(N) where N is the total number of keys in the database. It scans every single key and blocks the Redis server until complete. On an instance with 10M keys, this can take several seconds — during which every other client is blocked. In production, this causes timeouts and cascading failures. Use SCAN instead: it's cursor-based, returning ~COUNT keys per call without blocking the server. SCAN 0 MATCH user:* COUNT 100 returns a batch of matching keys and a cursor for the next batch. Each call is O(COUNT), not O(N). The trade-off: SCAN may return duplicates and doesn't guarantee consistency if keys are added/removed during iteration — but it won't crash your production server.
Q:How do you decide between Redis and Memcached for caching?
A: Use Memcached when: you need a simple key-value cache with string values only, you want multi-threaded performance (Memcached uses multiple cores, Redis is single-threaded), and you don't need persistence or data structures. Use Redis when: you need data structures beyond strings (hashes, sorted sets, lists), you need persistence (survive restarts), you need pub/sub or Lua scripting, you need atomic operations on complex types (ZINCRBY, LPUSH), or you need features like TTL per key, transactions, or Streams. In practice, Redis is the default choice for most teams because its feature set covers Memcached's use case plus much more. Memcached wins only on raw multi-threaded throughput for simple string caching.
Q:What eviction policy would you choose for a Redis cache vs. a Redis session store?
A: For a cache (expendable data, can be re-fetched): use allkeys-lru. When memory is full, Redis evicts the least recently used key across all keys. This is ideal because any cached value can be regenerated from the source. For a session store (user sessions that shouldn't be randomly dropped): use volatile-lru — only evict keys that have a TTL set. Sessions naturally have TTLs (e.g., 30 minutes), so expired sessions get evicted first. Active sessions without expired TTLs are preserved. Alternative: volatile-ttl evicts keys closest to expiration first, which is also good for sessions. Never use noeviction for a cache — it causes write errors when memory is full. Never use allkeys-random — it evicts active hot keys as readily as cold ones.
Common Mistakes
These mistakes have caused real production incidents. Each one is preventable with the right configuration and awareness.
Running Redis without authentication on a network
Redis ships with no password by default. If it's bound to 0.0.0.0 (all interfaces), anyone on the network — or the internet — can connect, read all data, and run FLUSHALL. Attackers actively scan port 6379. There have been widespread attacks where exposed Redis instances were used to write SSH keys and gain server access.
✅Always set requirepass in redis.conf. Bind to 127.0.0.1 or your private network interface. Never expose port 6379 to the public internet. Use firewall rules as a second layer. On Redis 6.0+, use ACLs for per-user permissions.
Not setting maxmemory on a cache
Without maxmemory, Redis grows until it consumes all available RAM. The OS starts swapping Redis memory to disk, which makes Redis 100x slower. Eventually the OOM killer terminates the Redis process — or worse, another critical process. All cached data is lost, and the database gets hammered.
✅Always set maxmemory to 70-80% of available RAM (leave room for fork overhead during RDB saves). Set an eviction policy — allkeys-lru for caches. Monitor used_memory and alert at 85%. Test what happens when maxmemory is reached: does your application handle it gracefully?
Using KEYS * in application code
A developer writes KEYS user:* to find all user keys. It works in development with 100 keys. In production with 10 million keys, it blocks the Redis server for 5 seconds. Every other client times out. The monitoring system detects Redis as 'down' and triggers alerts. The developer doesn't realize KEYS is O(N) because it worked fine locally.
✅Ban KEYS from application code entirely. Use rename-command KEYS '' in redis.conf to disable it. Use SCAN for iteration — it's cursor-based and non-blocking. For finding keys by pattern, maintain a secondary index (a Set containing all keys of a type) instead of scanning the keyspace.
Ignoring memory fragmentation
Redis allocates and frees memory frequently. Over time, the allocator (jemalloc) can't reuse freed blocks efficiently, leading to fragmentation. INFO memory shows used_memory at 8GB but used_memory_rss at 14GB — the OS allocated 14GB but Redis only uses 8GB. The extra 6GB is wasted. If maxmemory is set to 12GB, Redis thinks it has 4GB free, but the OS is already at 14GB.
✅Monitor mem_fragmentation_ratio (RSS / used_memory). Healthy is 1.0-1.5. Above 2.0, consider restarting Redis during a maintenance window to reset fragmentation. Redis 4.0+ has activedefrag yes which defragments memory online without restart. For write-heavy workloads with variable-size values, fragmentation is expected — plan your maxmemory accordingly.