Consistent Hashing: The Algorithm Behind Distributed Systems
Consistent Hashing#
When you distribute data across multiple nodes, the naive approach — node = hash(key) % N — falls apart the moment you add or remove a node. Consistent hashing solves this by minimizing data movement during topology changes.
The Problem with Modular Hashing#
3 nodes: hash(key) % 3
key "user:1" → hash = 7 → 7 % 3 = node 1
key "user:2" → hash = 12 → 12 % 3 = node 0
key "user:3" → hash = 5 → 5 % 3 = node 2
Add a 4th node: hash(key) % 4
key "user:1" → hash = 7 → 7 % 4 = node 3 ← MOVED
key "user:2" → hash = 12 → 12 % 4 = node 0 ← stayed
key "user:3" → hash = 5 → 5 % 4 = node 1 ← MOVED
Adding one node remaps ~75% of keys (in general, (N-1)/N keys move). For a cache cluster, that means a massive cache miss storm. For a database, it means migrating terabytes of data.
Ring-Based Consistent Hashing#
Place nodes and keys on the same circular hash space (0 to 2^32 - 1):
Hash ring (0 → 2^32):
Node A (pos: 1000)
/
----●--------●--------●--------●----
0 Node B Node C Node D 2^32
(3000) (6000) (9000)
Assignment rule: each key is assigned to the first node clockwise from its hash position.
hash("user:1") = 2500 → walks clockwise → Node B (3000)
hash("user:2") = 7500 → walks clockwise → Node D (9000)
hash("user:3") = 500 → walks clockwise → Node A (1000)
Adding a Node#
Add Node E at position 4500:
Keys between 3001-4500 move from Node C to Node E
All other keys stay where they are
Only K/N keys move on average (K = total keys, N = total nodes). With 4 nodes and 1 million keys, adding a 5th node moves ~200K keys instead of ~800K.
Removing a Node#
Remove Node B (3000):
Keys between 1001-3000 move from Node B to Node C
All other keys stay where they are
Only the removed node's keys redistribute — to the next clockwise node.
Virtual Nodes (Vnodes)#
With few physical nodes, the hash ring has poor balance. One node might own 60% of the ring while another owns 10%.
Solution: map each physical node to multiple positions on the ring:
Node A → vnode_A1 (1000), vnode_A2 (4500), vnode_A3 (8200)
Node B → vnode_B1 (2300), vnode_B2 (5800), vnode_B3 (9500)
Ring: A1--B1--A2--B2--A3--B3
Benefits of Virtual Nodes#
Load balancing: with 100-200 vnodes per physical node, the standard deviation of load drops below 5%.
Heterogeneous hardware: a powerful server gets 200 vnodes; a weaker one gets 50. Load distributes proportionally.
Smoother rebalancing: when a node joins, it takes small ranges from many nodes instead of one large range from a single node.
Vnode Count Trade-offs#
Vnodes per node | Load balance | Metadata overhead | Rebalancing speed
----------------|--------------|-------------------|------------------
10 | Poor (~20%) | Low | Fast
100 | Good (~5%) | Medium | Medium
256 | Excellent | Higher | Slower
1000 | Near-perfect | Significant | Slowest
Cassandra defaults to 256 vnodes per node. DynamoDB uses a fixed partition scheme with similar principles.
Implementation#
A basic consistent hash ring in pseudocode:
class ConsistentHashRing:
ring = SortedMap() # position → node
vnodes_per_node = 150
add_node(node):
for i in 0..vnodes_per_node:
position = hash(node.id + ":" + i)
ring[position] = node
remove_node(node):
for i in 0..vnodes_per_node:
position = hash(node.id + ":" + i)
ring.remove(position)
get_node(key):
position = hash(key)
# Find first ring position >= key position
entry = ring.ceiling(position)
if entry is null:
entry = ring.first() # wrap around
return entry.value
Lookup time: O(log N) with a balanced tree, where N is total vnodes. For 100 physical nodes with 150 vnodes each, that's log(15,000) ≈ 14 comparisons.
Jump Consistent Hashing#
Google's jump consistent hash uses no ring — just a mathematical function:
int jump_consistent_hash(uint64 key, int num_buckets):
b = -1; j = 0
while j < num_buckets:
b = j
key = key * 2862933555777941757 + 1
j = (b + 1) * (1L << 31) / ((key >> 33) + 1)
return b
Properties#
- Zero memory — no ring, no vnodes, just computation
- Perfect balance — keys distribute uniformly across buckets
- Minimal movement — adding bucket N+1 moves exactly 1/(N+1) keys
- Limitation: only supports adding/removing the last bucket — you can't remove an arbitrary node
Best for: numbered, sequential shards where you only scale by appending.
Rendezvous Hashing (HRW)#
Each key computes a score for every node and picks the highest:
get_node(key, nodes):
best_node = null
best_score = -1
for node in nodes:
score = hash(key + node.id)
if score > best_score:
best_score = score
best_node = node
return best_node
Properties#
- O(N) lookup — must check all nodes (fine for small N)
- Any node removable — only keys assigned to the removed node move
- Simple implementation — no ring data structure needed
- Good for: systems with fewer than ~100 nodes where simplicity matters
Real-World Applications#
Apache Cassandra#
Uses consistent hashing as its core data distribution strategy:
Partitioner: Murmur3Partitioner
Token range: -2^63 to 2^63 - 1
Default vnodes: 256 per node
Replication: each key stored on N consecutive ring nodes
When a node joins, it takes ownership of token ranges from existing nodes. Cassandra streams data for those ranges in the background.
Amazon DynamoDB#
Uses consistent hashing with a fixed number of partitions:
Partitions are fixed at table creation
Each partition maps to a range on the hash ring
Partitions split when they exceed size/throughput limits
Split partitions assigned to potentially different nodes
CDN Request Routing#
CDNs use consistent hashing to route requests to edge cache servers:
cache_server = consistent_hash(url, edge_servers)
If an edge server goes down, only its URLs remap to other servers. The rest of the cache stays warm — critical when a cache miss means fetching from origin across the globe.
Load Balancers#
Session-sticky load balancing with consistent hashing:
backend = consistent_hash(client_ip, healthy_backends)
When a backend fails health checks, only its clients redistribute. Traditional round-robin would scramble all sessions.
Comparison#
| Algorithm | Lookup | Memory | Arbitrary removal | Balance |
|---|---|---|---|---|
| Modular hash | O(1) | O(1) | Remaps everything | Perfect |
| Ring + vnodes | O(log N) | O(N × vnodes) | Minimal remapping | Good |
| Jump hash | O(ln N) | O(1) | Last bucket only | Perfect |
| Rendezvous | O(N) | O(1) | Minimal remapping | Good |
Key Takeaways#
- Consistent hashing minimizes key movement when nodes change — only K/N keys move on average
- Virtual nodes solve load imbalance by giving each physical node multiple ring positions
- Jump consistent hashing is ideal for append-only shard scaling with zero memory overhead
- Rendezvous hashing is simpler to implement for small node counts
- Every major distributed database and CDN relies on some form of consistent hashing
- Choose your variant based on node count, removal patterns, and memory constraints
This is article #227 in the Codelit system design series. Explore all articles at codelit.io.
Try it on Codelit
GitHub Integration
Paste any repo URL to generate an interactive architecture diagram from real code
Related articles
Try these templates
Build this architecture
Generate an interactive architecture for Consistent Hashing in seconds.
Try it in Codelit →
Comments