Design a Distributed Lock — Redis, ZooKeeper, and Consensus
When you need a distributed lock#
In a single-server app, a mutex works fine. But when your code runs on multiple servers, you need a lock that all servers agree on.
Common use cases:
- Prevent double processing — Only one worker processes a payment
- Leader election — Only one instance runs the scheduled job
- Resource access — Only one client writes to a shared file
The simplest approach: Redis SETNX#
SET lock_key unique_value NX EX 30
NX— Only set if key doesn't exist (acquire lock)EX 30— Auto-expire after 30 seconds (prevent deadlocks)unique_value— Random UUID so only the owner can release
Release: Only delete if the value matches (Lua script for atomicity):
if redis.call("GET", KEYS[1]) == ARGV[1] then
return redis.call("DEL", KEYS[1])
else
return 0
end
Why the unique value matters: Without it, Client A's lock expires, Client B acquires it, then Client A's release deletes Client B's lock.
The problem with single-node Redis locks#
If your Redis server crashes after granting a lock but before replicating, the lock is lost. Two clients think they hold the lock simultaneously.
Redlock: Multi-node Redis locking#
Redis creator Salvatore Sanfilippo proposed Redlock:
- Get current timestamp
- Try to acquire lock on N Redis instances (e.g., 5) with short timeout
- Lock is acquired if successful on majority (N/2 + 1 = 3)
- Lock validity = initial TTL minus elapsed time
- If lock fails, release on all instances
Controversy: Martin Kleppmann argued Redlock is fundamentally flawed because:
- Clock drift between Redis nodes can cause overlapping locks
- GC pauses can cause a client to hold an expired lock
- No fencing token mechanism
Fencing tokens#
The most robust pattern. Each lock acquisition returns a monotonically increasing token:
Client A acquires lock → token 33
Client A pauses (GC)
Lock expires
Client B acquires lock → token 34
Client A resumes, tries to write with token 33
Storage rejects token 33 (already seen 34) → SAFE
The storage system rejects any operation with a token older than the latest it has seen.
ZooKeeper distributed locks#
ZooKeeper provides stronger guarantees using consensus (ZAB protocol):
Recipe: Ephemeral sequential nodes#
- Client creates
/locks/lock-(ephemeral + sequential) → gets/locks/lock-0000000042 - Client lists all children of
/locks/ - If client's node has the lowest sequence number → lock acquired
- Otherwise, watch the next-lowest node for deletion
- When watched node is deleted → re-check if now lowest → acquire
Why ephemeral? If the client crashes, ZooKeeper automatically deletes the node (heartbeat timeout), releasing the lock.
Why sequential? Avoids thundering herd — only the next waiter is notified, not all waiters.
etcd distributed locks#
Similar to ZooKeeper but using the Raft consensus protocol:
# Acquire lease (TTL-based)
etcdctl lease grant 30
# Put with lease
etcdctl put /locks/my-lock "owner-id" --lease=<lease-id>
# Keep alive
etcdctl lease keep-alive <lease-id>
etcd's linearizable reads guarantee that if a lock is acquired, all subsequent reads see it.
Comparison#
| Aspect | Redis SETNX | Redlock | ZooKeeper | etcd |
|---|---|---|---|---|
| Consistency | Weak | Debated | Strong (ZAB) | Strong (Raft) |
| Performance | Fastest | Fast | Medium | Medium |
| Complexity | Simple | Medium | High | Medium |
| Fault tolerance | None (single) | Majority | Majority | Majority |
| Auto-release | TTL expiry | TTL expiry | Ephemeral node | Lease expiry |
When you don't need a distributed lock#
- Idempotent operations — If re-processing is harmless, skip the lock
- Database constraints — Unique indexes prevent duplicates without locks
- Optimistic concurrency — Version numbers / ETags detect conflicts
- Queue-based processing — One consumer per message naturally serializes work
Visualize your distributed system#
See how locks, consensus, and coordination fit into your architecture — try Codelit to generate an interactive diagram.
Key takeaways#
- Redis SETNX is simple but not safe for critical sections
- Fencing tokens are the gold standard — reject stale operations
- ZooKeeper/etcd provide consensus-backed locks (strongest guarantees)
- Auto-expiry is essential — prevent deadlocks from crashed clients
- Consider alternatives — idempotency and DB constraints often eliminate the need
- Distributed locks are a last resort — redesign to avoid them when possible
Try it on Codelit
Chaos Mode
Simulate node failures and watch cascading impact across your architecture
Cost Estimator
See estimated AWS monthly costs for every component in your architecture
Related articles
Try these templates
Distributed Rate Limiter
API rate limiting with sliding window, token bucket, and per-user quotas.
7 componentsBlockchain DApp Platform
Web3 decentralized application with smart contracts, wallet integration, token management, and on-chain indexing.
8 componentsDistributed Key-Value Store
Redis/DynamoDB-like distributed KV store with consistent hashing, replication, and tunable consistency.
8 componentsBuild this architecture
Generate an interactive architecture for Design a Distributed Lock in seconds.
Try it in Codelit →
Comments