Distributed Locking Patterns: Redis, ZooKeeper, Database Locks & Beyond
When multiple processes across different machines need to coordinate access to a shared resource, you need a distributed lock. Getting it wrong leads to data corruption, double-processing, or deadlocks.
This guide covers the most important distributed locking patterns, from Redis single-instance locks to ZooKeeper recipes, along with the pitfalls that catch most engineers off guard.
Why Distributed Locks?#
In a single-process application, a mutex or semaphore is enough. In a distributed system you face additional challenges:
- No shared memory — processes run on different machines.
- Partial failures — a lock holder can crash without releasing the lock.
- Clock skew — nodes disagree on the current time.
- Network partitions — a node may be isolated but still believe it holds the lock.
Distributed locks provide mutual exclusion across these failure modes — when implemented correctly.
Redis-Based Locking#
SET NX (Single Instance)#
The simplest Redis lock uses a single command:
SET resource_name unique_value NX PX 30000
- NX — only set if the key does not exist (acquire).
- PX 30000 — auto-expire after 30 seconds (safety net).
- unique_value — a UUID so only the holder can release the lock.
Release with a Lua script to make the check-and-delete atomic:
if redis.call("get", KEYS[1]) == ARGV[1] then
return redis.call("del", KEYS[1])
else
return 0
end
Limitation: If the single Redis instance fails, the lock is lost.
Redlock Algorithm#
Martin Kleppmann famously critiqued single-instance Redis locks. The Redlock algorithm, proposed by Salvatore Sanfilippo, uses N independent Redis instances (typically 5):
- Acquire the lock on all N instances with the same key and unique value.
- Consider the lock acquired only if a majority (N/2 + 1) succeed within a validity window.
- If acquisition fails, release on all instances.
Trade-offs:
- More resilient than a single instance.
- Still relies on clock assumptions — Kleppmann argues this makes it unsafe under certain GC pauses or clock jumps.
- In practice, Redlock works well for efficiency locks (preventing duplicate work) but may not be suitable for correctness locks (where safety is critical).
ZooKeeper Locks#
ZooKeeper provides strong consistency guarantees via its ZAB consensus protocol, making it a popular choice for correctness-critical locks.
Ephemeral Sequential Nodes#
The standard ZooKeeper lock recipe:
- Create an ephemeral sequential znode under
/locks/resource:/locks/resource/lock-0000000001 - List all children of
/locks/resource. - If your node has the lowest sequence number, you hold the lock.
- Otherwise, set a watch on the node with the next-lower sequence number.
- When that node is deleted (lock released or holder crashed), re-check.
Advantages:
- Ephemeral nodes auto-delete when the session expires, preventing orphaned locks.
- Sequential ordering prevents the herd effect — only one waiter is notified per release.
Disadvantages:
- Higher latency than Redis (consensus round-trips).
- Operational complexity of running a ZooKeeper ensemble.
Database Advisory Locks#
If you already run a relational database, advisory locks avoid introducing another system:
PostgreSQL#
-- Acquire (blocks until available)
SELECT pg_advisory_lock(12345);
-- Try to acquire (non-blocking)
SELECT pg_try_advisory_lock(12345);
-- Release
SELECT pg_advisory_unlock(12345);
MySQL#
SELECT GET_LOCK('resource_name', 10); -- 10s timeout
SELECT RELEASE_LOCK('resource_name');
Trade-offs:
- No extra infrastructure needed.
- Tied to a database connection — if the connection drops, the lock is released.
- Not suitable for high-throughput locking (database becomes the bottleneck).
Fencing Tokens#
Even with a correct lock implementation, a process may believe it holds the lock after it has expired (due to a GC pause, for example). Fencing tokens solve this:
- Each lock acquisition returns a monotonically increasing token (e.g., ZooKeeper's zxid or a counter).
- The lock holder includes the token in every write to the shared resource.
- The resource rejects writes with a token lower than the last accepted token.
This ensures that a stale lock holder cannot corrupt data, even if two processes briefly believe they hold the lock.
Without fencing tokens, no distributed lock is truly safe for correctness.
Lock Expiry and Renewal#
Lock expiry prevents deadlocks when a holder crashes, but introduces a race condition: what if the holder is still working when the lock expires?
Renewal (Heartbeat Extension)#
The holder periodically extends the lock before it expires:
# Pseudocode
while work_in_progress:
if time_until_expiry < threshold:
extend_lock(lock_key, new_ttl)
do_work_chunk()
Rules of thumb:
- Set the initial TTL to 3-5x the expected operation duration.
- Renew at 1/3 of the TTL interval.
- If renewal fails, abort the operation — another process may have acquired the lock.
Deadlock Prevention#
Distributed deadlocks occur when two processes each hold a lock the other needs.
Strategies#
- Lock ordering — always acquire locks in a globally consistent order (e.g., sorted by resource name).
- Timeout-based — if a lock cannot be acquired within a deadline, release all held locks and retry with back-off.
- Try-lock with rollback — attempt to acquire all required locks non-blocking. If any fail, release the ones you got and retry.
In distributed systems, timeout-based prevention is the most common because enforcing global ordering is difficult across services.
Leader Election with Locks#
Distributed locks naturally extend to leader election:
- All candidates attempt to acquire a lock on a well-known key (e.g.,
/election/leader). - The one that succeeds is the leader.
- Other candidates watch for lock release.
- When the leader crashes or resigns, the lock expires and a new candidate acquires it.
ZooKeeper's ephemeral nodes make this particularly clean. In Redis, you need a renewal loop to maintain leadership.
Caution: Leader election via locks is simpler than full consensus (Raft/Paxos) but offers weaker guarantees. It works well for leader-worker patterns where brief dual-leadership is tolerable.
Tools Comparison#
| Tool | Consistency | Latency | Ops Complexity | Best For |
|---|---|---|---|---|
| Redis SET NX | Weak (single instance) | Very low | Low | Efficiency locks, dedup |
| Redlock | Moderate | Low | Medium | Distributed efficiency locks |
| ZooKeeper | Strong (ZAB) | Medium | High | Correctness locks, elections |
| etcd | Strong (Raft) | Medium | Medium | Kubernetes-native systems |
| PostgreSQL advisory | Strong (single node) | Medium | Low | Existing Postgres deployments |
| Consul | Strong (Raft) | Medium | Medium | Service-mesh environments |
Choosing the Right Pattern#
Ask yourself two questions:
-
What happens if two processes enter the critical section simultaneously?
- Duplicate work (wasteful but harmless) → Redis SET NX is fine.
- Data corruption → use ZooKeeper or etcd with fencing tokens.
-
What infrastructure do you already run?
- Already have Redis → start with SET NX.
- Already have PostgreSQL → advisory locks.
- Need strong guarantees → ZooKeeper or etcd.
Key Takeaways#
- SET NX + TTL is the simplest distributed lock but is unsafe without fencing tokens.
- Redlock improves availability but does not eliminate clock-skew risks.
- ZooKeeper ephemeral sequential nodes provide the strongest lock semantics.
- Fencing tokens are essential for correctness — no lock algorithm alone is enough.
- Prefer timeout-based deadlock prevention in distributed environments.
- Use lock renewal to avoid premature expiry, but always handle renewal failure gracefully.
Understanding distributed locking patterns is fundamental to building reliable systems at scale.
Ready to deepen your distributed systems knowledge? Visit codelit.io for hands-on courses, system design practice, and real-world engineering content.
This is article #182 on the Codelit blog.
Try it on Codelit
GitHub Integration
Paste any repo URL to generate an interactive architecture diagram from real code
Related articles
Try these templates
Build this architecture
Generate an interactive architecture for Distributed Locking Patterns in seconds.
Try it in Codelit →
Comments