Consistent Hashing
Understand consistent hashing — hash ring mechanics, virtual nodes, minimal rebalancing on node changes, and hotspot mitigation. The algorithm behind every distributed cache and database.
Table of Contents
The Big Picture — Why Traditional Hashing Fails
The simplest way to distribute data across N servers ishash(key) % N. Key "user:42" hashes to 7, 7 % 3 = server 1. Fast, simple, deterministic. But it has a fatal flaw: when N changes, almost everything moves.
3 servers: hash(key) % 3 key A → hash=7 → 7 % 3 = 1 → Server 1 key B → hash=12 → 12 % 3 = 0 → Server 0 key C → hash=19 → 19 % 3 = 1 → Server 1 key D → hash=25 → 25 % 3 = 1 → Server 1 key E → hash=31 → 31 % 3 = 1 → Server 1 Add 1 server → 4 servers: hash(key) % 4 key A → hash=7 → 7 % 4 = 3 → Server 3 ← MOVED key B → hash=12 → 12 % 4 = 0 → Server 0 ✓ same key C → hash=19 → 19 % 4 = 3 → Server 3 ← MOVED key D → hash=25 → 25 % 4 = 1 → Server 1 ✓ same key E → hash=31 → 31 % 4 = 3 → Server 3 ← MOVED Result: 3 out of 5 keys moved (60%). At scale: adding 1 server to a 100-server cluster moves ~99% of keys. Every moved key = cache miss = database hit = potential overload.
The Circular Table Analogy
Imagine people sitting around a circular table. Each person handles the tasks (keys) closest to them going clockwise. If someone leaves, only their tasks are redistributed — to the next person clockwise. Everyone else keeps their tasks. If a new person sits down, they only take over the tasks between them and the previous person. Minimal disruption. This is consistent hashing — a circular arrangement where adding or removing a node only affects its immediate neighbors.
🔥 The Core Problem
With hash(key) % N, changing N reshuffles almost everything. With consistent hashing, changing N only movesK/N keys (where K is total keys). Adding 1 server to a 100-server cluster moves ~1% of keys instead of ~99%.
What Is Consistent Hashing?
Consistent hashing maps both keys and nodes onto the same circular hash space (a "ring"). Each key is assigned to the nearest node clockwise on the ring. When a node is added or removed, only the keys between it and its neighbors are affected — everything else stays put.
🎯 Goals
- Distribute data evenly across nodes
- Minimize data movement when nodes change
- Deterministic — same key always maps to same node
- No central coordinator needed
🏗️ Used In
- Amazon DynamoDB (partition routing)
- Apache Cassandra (token ring)
- Memcached / Redis cluster (key distribution)
- CDNs (request routing to edge servers)
- Load balancers (sticky session routing)
Hash Ring Mechanics
The hash ring is a circular number line from 0 to 2³² - 1 (or any large range). Both nodes and keys are hashed onto this ring. A key is assigned to the first node encountered going clockwise from the key's position.
The Clock Face
Think of a clock. Servers are placed at specific hours (e.g., Node A at 2 o'clock, Node B at 6 o'clock, Node C at 10 o'clock). When a key hashes to 4 o'clock, you walk clockwise until you hit the next server — that's Node B at 6 o'clock. Node B owns all keys between 2 o'clock and 6 o'clock.
Hash space: 0 ────────────────────────── 2³² Wrapped into a ring (0 connects back to 2³²) Step 1: Place nodes on the ring hash("NodeA") = 100 → position 100 hash("NodeB") = 300 → position 300 hash("NodeC") = 600 → position 600 Ring: 0 │ NodeA ● (100) │ NodeB ● (300) │ │ NodeC ● (600) │ └──── wraps back to 0 Step 2: Place keys on the ring hash("user:42") = 150 → walk clockwise → NodeB (300) hash("user:7") = 50 → walk clockwise → NodeA (100) hash("user:99") = 450 → walk clockwise → NodeC (600) hash("user:123") = 700 → walk clockwise → NodeA (100) ← wraps around Key ownership: NodeA owns: keys in range (600, 100] ← wraps around 0 NodeB owns: keys in range (100, 300] NodeC owns: keys in range (300, 600]
Lookup Algorithm
Hash the key
Compute hash('user:42') = 150. This gives the key's position on the ring.
Walk clockwise
Starting from position 150, find the first node clockwise. Nodes are at 100, 300, 600. The first node after 150 is NodeB at 300.
Route to that node
Send the request for 'user:42' to NodeB. Every client independently computes the same result — no coordinator needed.
🎯 Interview Insight
The ring-based approach is powerful because it's decentralized. Every client can independently compute which node owns a key — just hash the key and walk clockwise. No central routing table, no single point of failure, no coordination.
Virtual Nodes (VNodes)
With only 3 physical nodes on the ring, the distribution is often uneven — one node might own 60% of the ring while another owns 10%. Virtual nodes solve this by giving each physical node multiple positions on the ring.
Multiple Booths Instead of One
Imagine a circular food court. With 3 vendors, each gets one booth — but the booths might be unevenly spaced, so one vendor gets most of the foot traffic. Virtual nodes: each vendor sets up 5 smaller booths spread evenly around the circle. Now traffic is distributed much more evenly, even though there are still only 3 vendors. If a vendor leaves, their 5 booths are removed — and the keys from each booth go to 5 different neighbors, not all to one.
Without vnodes (3 physical nodes, 3 positions): NodeA at position 100 → owns range (600, 100] = 50% of ring NodeB at position 300 → owns range (100, 300] = 20% of ring NodeC at position 600 → owns range (300, 600] = 30% of ring ❌ Uneven: NodeA has 2.5x more data than NodeB With vnodes (3 physical nodes, 5 vnodes each = 15 positions): NodeA: positions 50, 200, 400, 550, 800 NodeB: positions 100, 280, 450, 650, 900 NodeC: positions 150, 350, 500, 750, 950 Ring has 15 evenly-ish distributed points. Each physical node owns ~33% of the ring. ✅ Much more balanced distribution. When NodeB is removed: Its 5 vnodes are removed from the ring. Keys from each vnode go to the NEXT node clockwise. Load is spread across NodeA and NodeC (5 small transfers). ✅ No single node gets overwhelmed. Without vnodes, removing NodeB dumps ALL its keys onto one neighbor. With vnodes, the load is distributed across multiple neighbors.
| Feature | Without VNodes | With VNodes |
|---|---|---|
| Positions per node | 1 | 50-200 (configurable) |
| Load balance | Often uneven | Near-uniform |
| Node removal impact | All keys go to 1 neighbor | Keys spread across many neighbors |
| Node addition impact | Takes keys from 1 neighbor | Takes small slices from many neighbors |
| Heterogeneous hardware | Can't handle (all nodes equal) | More vnodes for stronger machines |
| Complexity | Simple | Slightly more complex (more ring entries) |
VNode Benefits
- ✅Even data distribution regardless of node count
- ✅Graceful scaling — load spreads across many nodes
- ✅Heterogeneous clusters — give powerful nodes more vnodes
- ✅Faster rebalancing — many small transfers vs one large one
- ✅Used by Cassandra (256 vnodes default), DynamoDB, Riak
VNode Trade-offs
- ❌More metadata to maintain (ring has N × vnodes entries)
- ❌Slightly more complex lookup (more positions to search)
- ❌Rebalancing touches more nodes (coordination overhead)
- ❌Too many vnodes = excessive metadata, diminishing returns
- ❌Typical sweet spot: 100-256 vnodes per physical node
🎯 Interview Insight
Virtual nodes are the answer to "how do you handle uneven distribution in consistent hashing?" Always mention them. Explain: "Each physical node gets 100-200 virtual positions on the ring. This ensures even distribution and means adding or removing a node spreads the impact across many neighbors instead of one."
Rebalancing on Node Changes
The defining feature of consistent hashing: when a node is added or removed, only a small fraction of keys need to move. This is what makes elastic scaling possible without massive data reshuffling.
Adding a Node
Before (3 nodes): Ring: ... NodeA(100) ... NodeB(300) ... NodeC(600) ... NodeB owns: keys in range (100, 300] Add NodeD at position 200: Ring: ... NodeA(100) ... NodeD(200) ... NodeB(300) ... NodeC(600) ... What moves: Keys in range (100, 200] → move FROM NodeB TO NodeD Keys in range (200, 300] → stay on NodeB Keys on NodeA → unchanged Keys on NodeC → unchanged Total movement: only keys between NodeA and NodeD = roughly 1/N of NodeB's keys (with vnodes, even less per node) Contrast with hash(key) % N: Adding 1 node to 3 → hash(key) % 4 ~75% of ALL keys across ALL nodes move. Catastrophic.
Removing a Node
Before (3 nodes): Ring: ... NodeA(100) ... NodeB(300) ... NodeC(600) ... NodeB owns: keys in range (100, 300] Remove NodeB: Ring: ... NodeA(100) ... NodeC(600) ... What moves: Keys in range (100, 300] → move FROM NodeB TO NodeC (NodeC is the next node clockwise after the gap) Keys on NodeA → unchanged Keys on NodeC (original) → unchanged With vnodes: NodeB had 100 vnodes spread across the ring. Each vnode's keys go to the next clockwise node. Load is distributed across NodeA AND NodeC (and others). No single node absorbs all of NodeB's data.
| Operation | hash(key) % N | Consistent Hashing | Consistent + VNodes |
|---|---|---|---|
| Add 1 node to 10 | ~90% keys move | ~10% keys move | ~10% spread across all nodes |
| Remove 1 node from 10 | ~90% keys move | ~10% keys move to 1 neighbor | ~10% spread across all nodes |
| Add 1 node to 100 | ~99% keys move | ~1% keys move | ~1% spread across all nodes |
| Scaling impact | Catastrophic | Minimal | Minimal + balanced |
🎯 Interview Insight
This is the #1 reason consistent hashing exists. When asked "why not just use modulo hashing?" — explain: "Adding a server to a 100-node cluster with modulo moves 99% of keys. With consistent hashing, it moves 1%. That's the difference between a 5-second rebalance and a 5-hour one."
Hotspot Mitigation
Even with consistent hashing and virtual nodes, hotspots can occur when certain keys receive disproportionately more traffic than others. A celebrity's profile, a viral product, or a trending hashtag can overwhelm the node that owns that key.
One Counter Getting All the Customers
Even if you distribute booths evenly around the food court, if one vendor sells the viral TikTok drink, their booth gets 100x more traffic than everyone else. The even distribution of booths doesn't help — the problem is the uneven distribution of demand. You need to either split that vendor's booth into multiple locations or redirect overflow to other vendors.
Causes of Hotspots
Skewed Key Distribution
If most keys hash to a narrow range, one node handles disproportionate load. Poor hash functions or correlated key patterns cause this.
Popular Keys
A celebrity's profile (user:beyonce) or a viral product gets millions of reads. The node owning that key is overwhelmed while others are idle.
Time-Based Patterns
All users in one timezone hit the same shard at the same time. Or a batch job writes millions of sequential keys that all hash to one node.
Solutions
Virtual Nodes (baseline)
More vnodes = more even distribution. If one physical node has 200 positions on the ring, the chance of a large contiguous range landing on one node is very low. This handles skewed key distribution.
Key Salting / Splitting
For known hot keys, append a random suffix: 'user:beyonce' becomes 'user:beyonce:0', 'user:beyonce:1', ..., 'user:beyonce:9'. These 10 keys hash to different nodes. Reads fan out to all 10 and merge. Writes go to a random suffix.
Read Replicas
Replicate hot data to multiple nodes. Reads are distributed across replicas. The primary node handles writes, replicas handle reads. This is how Cassandra handles hot partitions.
Caching Layer
Put a cache (Redis) in front of the hot key. The cache absorbs 99% of reads. Only cache misses reach the node. This is the simplest and most common solution for read-heavy hotspots.
🎯 Interview Insight
When an interviewer asks about hotspots, mention the layered approach: "Virtual nodes handle general distribution. For known hot keys, I'd use key salting to spread them across multiple nodes. For read-heavy hotspots, a cache layer absorbs most traffic. For write-heavy hotspots, I'd consider splitting the partition."
End-to-End Scenario
Let's design a distributed cache cluster using consistent hashing and walk through real operational scenarios.
System: Redis Cache Cluster (5 nodes)
Cluster: 5 Redis nodes, 200 vnodes each = 1,000 ring positions Hash function: SHA-256 (uniform distribution) Total keys: 10 million cached objects Initial distribution: Node 1: ~2.0M keys (20.1%) Node 2: ~2.0M keys (19.8%) Node 3: ~2.0M keys (20.3%) Node 4: ~2.0M keys (19.7%) Node 5: ~2.0M keys (20.1%) ✅ Near-perfect balance with 200 vnodes
Traffic doubles — you need to add Node 6
What happens to the data?
Answer: Node 6 gets 200 vnodes placed on the ring. Each vnode takes over a small slice from its clockwise neighbor. Total keys moved: ~10M / 6 = ~1.67M keys (16.7%). These keys are distributed across all 5 existing nodes — no single node loses more than ~333K keys. The cluster continues serving traffic during rebalancing. Clients that request a moved key get a cache miss (one-time), fetch from the database, and cache on the new node.
Node 3 crashes unexpectedly
What happens and how does the system recover?
Answer: Node 3's 200 vnodes are removed from the ring. Its ~2M keys are now owned by the next clockwise nodes — distributed across Nodes 1, 2, 4, and 5 (thanks to vnodes). These keys are cache misses until re-fetched from the database. With 4 remaining nodes, each absorbs ~500K extra keys temporarily. The database sees a spike of ~2M cache misses spread over a few minutes. If Node 3 recovers, its vnodes rejoin the ring and keys migrate back.
A celebrity product goes viral — 100K requests/sec for one key
How do you handle the hotspot?
Answer: The key 'product:viral-item' hashes to one node, which is overwhelmed. Solutions: (1) Add a local in-process cache (L1) on each app server — 30-second TTL absorbs 99% of reads. (2) If still hot, use key salting: split into 'product:viral-item:0' through ':9', fan out reads across 10 nodes. (3) For the cache layer specifically, set a longer TTL on this key and use background refresh to prevent miss storms.
💡 This Is How Production Systems Work
DynamoDB, Cassandra, and every major distributed database use exactly this approach. Consistent hashing with vnodes for distribution, replication for durability, and caching for hotspots. The ring is the foundation — everything else builds on top of it.
Trade-offs & Decision Making
| Approach | Pros | Cons | Best For |
|---|---|---|---|
| Modulo Hashing hash(key) % N | Simplest possible, zero overhead | ~99% keys move on resize, catastrophic at scale | Fixed-size clusters that never change |
| Consistent Hashing (no vnodes) | Minimal key movement on resize (~1/N) | Uneven distribution, one neighbor absorbs all on removal | Small clusters with uniform keys |
| Consistent Hashing + VNodes | Even distribution, balanced rebalancing | More metadata, slightly complex lookup | Production distributed systems (DynamoDB, Cassandra) |
| Directory-Based Routing | Full control, can handle any distribution | Central coordinator = single point of failure, bottleneck | Systems needing custom placement (geo-aware sharding) |
VNode Count Trade-offs
| VNodes per Node | Distribution | Metadata | Rebalancing |
|---|---|---|---|
| 1 (no vnodes) | Very uneven | Minimal | All keys to 1 neighbor |
| 10-50 | Moderate | Low | Spread across a few neighbors |
| 100-256 (sweet spot) | Near-uniform | Moderate | Well-distributed |
| 1000+ | Very uniform | High (memory overhead) | Diminishing returns |
🎯 Interview Framework
When discussing data distribution, start with: "I'd use consistent hashing with virtual nodes. Each physical node gets 150-200 vnodes for even distribution. When we scale from 10 to 11 nodes, only ~9% of keys move, spread across all existing nodes." Then mention hotspot mitigation if relevant.
Interview Questions
Q:Why not use hash(key) % N for distributing data?
A: Because when N changes (adding or removing a server), almost all keys get reassigned. Adding 1 server to a 100-server cluster moves ~99% of keys. In a cache cluster, this means 99% cache misses — the database is flooded with requests. Consistent hashing solves this: adding 1 server to 100 moves only ~1% of keys. The rest stay on their current nodes. This is the difference between a smooth scale-up and a cascading failure.
Q:What are virtual nodes and why are they needed?
A: Virtual nodes give each physical node multiple positions on the hash ring (typically 100-256). Without vnodes, 3 nodes might own 60%, 25%, and 15% of the ring — very uneven. With 200 vnodes each, the 600 ring positions create near-uniform distribution (~33% each). VNodes also improve rebalancing: when a node is removed, its keys are spread across many neighbors (one per vnode) instead of all going to one neighbor. This prevents any single node from being overwhelmed during scaling.
Q:How does consistent hashing handle scaling?
A: When a new node joins: it gets vnodes placed on the ring. Each vnode takes over a small key range from its clockwise neighbor. Total keys moved: ~K/N (K=total keys, N=new node count). When a node leaves: its vnodes are removed. Each vnode's keys go to the next clockwise node. With vnodes, this load is distributed across many nodes. The key property: only keys between the new/removed node and its neighbors move. All other keys stay put. This enables elastic scaling without downtime.
You're designing a distributed cache for 1 billion keys across 50 servers
How would you distribute the data?
Answer: Consistent hashing with 200 vnodes per server = 10,000 ring positions. Hash function: SHA-256 for uniform distribution. Each server owns ~20M keys (~2% of total). Adding a 51st server moves ~20M keys (~2%), spread across all 50 existing servers. For hotspot protection: L1 in-process cache on app servers for the top 1,000 keys, key salting for known viral keys. For durability: replicate each key to the next 2 clockwise nodes (replication factor 3).
After adding a new node, your cache hit rate drops from 99% to 85%
What happened and how do you fix it?
Answer: When the new node joined, ~1/N of keys were reassigned to it. Those keys are now cache misses on the new node (it has empty cache). The 85% hit rate is temporary — as requests come in, the new node's cache warms up. To fix: (1) Pre-warm the new node by copying data from its neighbors before it joins the ring. (2) Use a 'shadow' period where the new node receives traffic but the old node still serves as fallback. (3) Accept the temporary hit rate drop — it recovers within minutes as the cache warms.
Common Mistakes
Not using virtual nodes
Using consistent hashing with one position per node. With 5 nodes, one might own 40% of the ring while another owns 10%. When the overloaded node is removed, all its keys go to one neighbor — which then becomes overloaded. The cascading failure that consistent hashing was supposed to prevent still happens.
✅Always use virtual nodes. 100-256 vnodes per physical node is the standard. This ensures even distribution and spreads rebalancing load across all nodes. Every production system (Cassandra, DynamoDB, Riak) uses vnodes.
Ignoring hotspots
Assuming consistent hashing solves all distribution problems. It distributes keys evenly, but it can't distribute traffic evenly if some keys are 1000x more popular than others. One viral key can overwhelm a node even with perfect key distribution.
✅Monitor per-node traffic, not just key count. For read hotspots: add a cache layer or read replicas. For write hotspots: use key salting to split the hot key across multiple nodes. For predictable hotspots (celebrity accounts): pre-split them.
Poor hash function choice
Using a hash function that doesn't distribute uniformly (e.g., simple modulo, CRC32 for small key spaces). Keys cluster in certain ring regions, creating uneven load. Or using a hash function that's too slow (SHA-512) for a hot path.
✅Use a well-distributed hash function: MD5 (fast, good distribution, not for security), MurmurHash3 (very fast, excellent distribution), or SHA-256 (slower but cryptographically uniform). Test distribution with your actual key patterns.
Misunderstanding rebalancing scope
Thinking that adding a node requires rehashing all keys. Or thinking that consistent hashing means zero keys move. Some keys always move — the point is that it's O(K/N) keys instead of O(K). Teams either over-engineer (trying to move zero keys) or under-prepare (not handling the keys that do move).
✅Expect ~1/N of keys to move when adding a node. Plan for temporary cache misses during rebalancing. Pre-warm new nodes when possible. Use replication so moved keys have a backup on another node.