Linearizability — The Strongest Consistency Guarantee in Distributed Systems
What linearizability means#
A system is linearizable if every operation appears to take effect atomically at some point between its invocation and its response. Once a write completes, every subsequent read must return that value or a newer one — no stale reads, no going back in time.
Think of it as a single-copy illusion. Even though data is replicated across multiple nodes, the system behaves as if there is exactly one copy, and every operation happens on that one copy in real time.
Linearizability vs serializability#
These two terms sound similar but describe different things:
| Property | Scope | Guarantees |
|---|---|---|
| Linearizability | Single object/register | Real-time ordering of reads and writes |
| Serializability | Transactions (multiple objects) | Equivalent to some serial execution order |
| Strict serializability | Transactions + real-time | Both guarantees combined |
Serializability says transactions execute as if in some serial order, but that order does not need to match real-time. A transaction that commits at 2:00 PM could be ordered before one that committed at 1:00 PM.
Linearizability says single operations respect real-time order. If operation A completes before operation B starts, then A must appear before B in the global order.
Most databases that claim "strong consistency" provide strict serializability — both guarantees together.
The happens-before relationship#
Linearizability is built on the concept of happens-before ordering:
- Within a single process — if A happens before B in the same thread, A precedes B
- Across processes via messages — if process 1 sends a message and process 2 receives it, the send happens before the receive
- Transitivity — if A happens before B and B happens before C, then A happens before C
Two events with no happens-before relationship are concurrent. Linearizability requires that concurrent operations can be ordered in any way, but the result must be consistent with some total order that respects all happens-before relationships.
Total order broadcast#
To implement linearizability, you need total order broadcast — a protocol that delivers messages to all nodes in the same order. Every node processes operations in an identical sequence.
How it works:
- A client sends a write request to the leader
- The leader assigns a sequence number and broadcasts to all replicas
- Every replica applies operations in sequence-number order
- The leader responds to the client only after a majority acknowledge
Total order broadcast is equivalent to consensus. If you have one, you can build the other. This is why systems like Raft and ZAB (ZooKeeper Atomic Broadcast) are at the heart of linearizable systems.
Compare-and-swap: the linearizable primitive#
Compare-and-swap (CAS) is the fundamental operation that linearizability enables:
CAS(register, expected_value, new_value)
if register == expected_value:
register = new_value
return success
else:
return failure
This must be atomic. If two clients try to CAS the same register simultaneously, exactly one succeeds and the other sees the updated value.
What CAS enables:
- Leader election — CAS on a "leader" register; only one node wins
- Distributed locks — CAS to acquire; the winner holds the lock
- Unique constraints — CAS to claim a username; first writer wins
- Account balances — CAS to debit; prevents double-spending
Without linearizability, CAS is meaningless because two clients could both read the old value and both succeed.
Real-world linearizable systems#
etcd#
Kubernetes stores all cluster state in etcd, which uses Raft consensus. Every read and write goes through the Raft leader. etcd provides linearizable reads by default — when you read, the leader confirms it still holds leadership before responding.
Cost: Every read requires a round of Raft communication. etcd offers a serializable read option that skips this confirmation and may return stale data, but is faster.
ZooKeeper#
ZooKeeper uses ZAB (ZooKeeper Atomic Broadcast), a protocol similar to Raft. Writes are linearizable — they go through the leader and are applied in total order.
Reads are NOT linearizable by default. ZooKeeper allows reads from any replica, which can return stale data. To get linearizable reads, you must call sync() before reading, which forces the replica to catch up with the leader.
Spanner#
Google Spanner achieves linearizability across globally distributed data centers using GPS clocks and atomic clocks (TrueTime). It assigns real-time timestamps to transactions and waits for clock uncertainty to pass before committing.
Cost: Every write incurs a wait of roughly 7ms for clock uncertainty. This is the price of global linearizability.
CockroachDB#
Inspired by Spanner but uses NTP clocks instead of GPS. Provides linearizability within a single range (a partition of data) using Raft. Cross-range transactions use a two-phase commit protocol with linearizable ordering.
Trade-off: NTP clock uncertainty is larger than TrueTime, so CockroachDB must handle a wider uncertainty window.
The cost of linearizability#
Linearizability is expensive. Here is what you pay:
Latency#
Every operation must coordinate with a majority of nodes. In a 5-node cluster spread across regions:
- Non-linearizable read: respond from the nearest replica (~1ms)
- Linearizable read: round-trip to the leader, confirm leadership (~50-200ms cross-region)
Throughput#
All writes go through a single leader. The leader becomes the throughput bottleneck. You cannot scale writes by adding more replicas — you can only scale reads (and only if you relax to non-linearizable reads).
Availability#
The CAP theorem states that during a network partition, you must choose between consistency (linearizability) and availability. Linearizable systems choose consistency — they become unavailable to partitioned nodes.
A 5-node cluster tolerates 2 node failures. If 3 nodes become unreachable, the remaining 2 cannot form a majority and the system stops accepting writes.
When you need linearizability#
Use linearizability for:
- Leader election and distributed locks
- Financial transactions and account balances
- Unique constraint enforcement (usernames, order IDs)
- Configuration that all nodes must agree on (Kubernetes API server)
Skip linearizability for:
- Read-heavy analytics workloads
- Content delivery and caching
- Event logs where ordering within a partition is sufficient
- Systems where eventual consistency is acceptable (social media feeds, product catalogs)
Techniques to reduce the cost#
- Partition your data — each partition has its own Raft group. Linearizability within a partition, but no cross-partition guarantees without extra coordination
- Lease-based reads — the leader grants time-bounded leases. During the lease, reads can be served without a Raft round-trip
- Read-your-writes at the client — track the latest write timestamp and ensure subsequent reads are at least that fresh. Weaker than linearizability but sufficient for many use cases
- Linearizable writes, non-linearizable reads — ZooKeeper's default model. Writes are strict, reads are fast but potentially stale
How to test for linearizability#
Jepsen is the standard tool. It runs concurrent operations against a distributed system, records the history, and checks whether the history is linearizable using a model checker.
A history is linearizable if you can find a single total order of operations that:
- Is consistent with the return values
- Respects real-time ordering (if op A finished before op B started, A appears before B)
If no such ordering exists, the system violated linearizability.
Summary#
- Linearizability means every operation appears atomic and respects real-time order
- Serializability is about transaction isolation; linearizability is about single-object consistency
- Total order broadcast and consensus are the building blocks
- CAS operations are what make linearizability practically useful
- etcd, ZooKeeper, Spanner, CockroachDB all provide linearizability with different trade-offs
- The cost is higher latency, lower write throughput, and reduced availability during partitions
- Use it when correctness matters more than performance — locks, elections, financial operations
Article #455 in the Codelit engineering series. Explore our full library of system design, infrastructure, and architecture guides at codelit.io.
Try it on Codelit
Chaos Mode
Simulate node failures and watch cascading impact across your architecture
Related articles
Try these templates
Build this architecture
Generate an interactive architecture for Linearizability in seconds.
Try it in Codelit →
Comments