Consensus Algorithms — Raft, Paxos, and How Distributed Systems Agree
The problem: agreeing when things fail#
You have 5 database replicas. A write comes in. All 5 need to agree on the same value, in the same order. But any server can crash, any network link can fail, messages can arrive out of order.
How do they agree? That's the consensus problem.
Why consensus matters#
Without consensus, distributed systems can't do:
- Leader election — who's the primary database?
- Distributed locks — who holds the lock right now?
- Configuration management — what's the current cluster state?
- Log replication — what's the correct order of operations?
Every system that claims to be "consistent" uses a consensus algorithm underneath.
Paxos: the original (and confusing)#
Invented by Leslie Lamport in 1989. Provably correct. Famously difficult to understand and implement.
The basic idea:
- A proposer suggests a value
- A majority of acceptors must agree
- Once a majority agrees, the value is chosen
Why it's hard: Paxos handles edge cases (competing proposers, failures mid-vote) but the paper is abstract and the implementation details are left as an exercise. Google's Chubby lock service used Paxos, and the engineers said it was "surprisingly difficult to implement."
In practice: Almost nobody implements Paxos from scratch anymore. Use Raft instead.
Raft: the understandable consensus#
Designed specifically to be understandable. Same guarantees as Paxos, much clearer implementation.
Three roles#
- Leader — handles all client requests, replicates to followers
- Follower — receives updates from the leader, responds to reads
- Candidate — a follower that's trying to become the leader
Leader election#
- A follower doesn't hear from the leader (timeout)
- It becomes a candidate and requests votes
- If it gets votes from a majority → becomes the new leader
- If another candidate wins → goes back to follower
Split brain protection: A leader must have votes from a majority. Only one leader can exist per term (monotonically increasing term numbers).
Log replication#
- Client sends a write to the leader
- Leader appends to its log and sends to all followers
- When a majority has the entry → it's committed
- Leader tells followers to apply the committed entry
Safety: A committed entry is never lost, even if the leader crashes. The new leader will have it (because a majority had it, and a majority voted for the new leader — overlap is guaranteed).
When you need consensus#
You need it when:
- Multiple nodes must agree on the same value
- Consistency is more important than availability
- You're building a distributed database, lock service, or configuration store
You don't need it when:
- You have a single primary (just use a database)
- Eventual consistency is acceptable (use CRDTs or last-writer-wins)
- You're building a stateless service (no shared state to agree on)
Systems built on consensus#
| System | Algorithm | Purpose |
|---|---|---|
| etcd | Raft | Kubernetes config store |
| ZooKeeper | ZAB (Paxos-like) | Distributed coordination |
| CockroachDB | Raft | Distributed SQL |
| TiKV | Raft | Distributed key-value |
| Consul | Raft | Service discovery |
| Google Spanner | Paxos | Global SQL |
The practical takeaway#
Most developers will never implement a consensus algorithm. But understanding how they work helps you:
- Choose the right database — if it uses Raft/Paxos, it has strong consistency guarantees
- Understand failure modes — why a 3-node cluster survives 1 failure but not 2
- Size clusters correctly — always odd numbers (3, 5, 7) for majority quorum
- Debug split-brain issues — when consensus fails, you know where to look
See consensus in your architecture#
On Codelit, generate any distributed database or Kubernetes system and you'll see consensus components in the architecture — etcd, ZooKeeper, or Raft-based replication. Click the database nodes to understand how they maintain consistency.
Explore consensus in real architectures: search "Kubernetes" on Codelit.io and click the etcd node to see Raft in action.
Try it on Codelit
Chaos Mode
Simulate node failures and watch cascading impact across your architecture
Related articles
Try these templates
Build this architecture
Generate an interactive architecture for Consensus Algorithms in seconds.
Try it in Codelit →
Comments