distributed-systemsdatabasessystem-designfundamentals

Vector Clocks — Tracking Causality in Distributed Systems

March 29, 2026 7 min readBy Codelit Team Discussion

The problem with physical clocks#

In a distributed system, you cannot rely on wall-clock time to order events. Clocks drift, NTP corrections introduce jumps, and two nodes may disagree on the current time by milliseconds — or seconds. When you need to know which write happened first, physical time is unreliable.

Logical clocks solve this by tracking causality — the order in which events depend on each other — without relying on synchronized clocks.

The happened-before relation#

Leslie Lamport defined the happened-before relation (denoted a → b) with three rules:

If a and b are events in the same process and a occurs before b, then a → b
If a is the sending of a message and b is the receipt of that message, then a → b
If a → b and b → c, then a → c (transitivity)

If neither a → b nor b → a, the events are concurrent — the system cannot determine which happened first. This is not an error; it is a fundamental property of distributed systems.

Lamport timestamps#

A Lamport timestamp is a single integer counter that provides a partial ordering of events.

How it works#

Each process maintains a counter, starting at 0
Before each local event, increment the counter
When sending a message, include the current counter value
When receiving a message, set your counter to max(local, received) + 1

Limitations#

Lamport timestamps guarantee: if a → b, then L(a) < L(b). But the converse is not true — L(a) < L(b) does not mean a → b. Two concurrent events may have different timestamps by coincidence.

This means Lamport timestamps can detect ordering when causality exists, but they cannot detect concurrency. For that, you need vector clocks.

Vector clocks#

A vector clock extends the Lamport timestamp from a single counter to a vector of counters — one per process in the system.

How vector clocks work#

For a system with N processes, each process maintains a vector of N integers:

Process i increments VC[i] before each local event
When sending a message, include the entire vector
When receiving a message with vector VC_msg, update each entry: VC[j] = max(VC[j], VC_msg[j]) for all j, then increment VC[i]

Comparing vector clocks#

Given two vector clocks A and B:

A < B (A happened before B) if every entry in A is less than or equal to the corresponding entry in B, and at least one is strictly less
A = B if every entry is equal
A and B are concurrent if neither A < B nor B < A — meaning some entries in A are greater and some entries in B are greater

This is the key advantage: vector clocks can detect concurrent events where Lamport timestamps cannot.

Example#

Three processes — P1, P2, P3 — start with vectors [0,0,0]:

P1 performs a write: [1,0,0]
P1 sends to P2: P2 updates to [1,1,0]
P3 performs a write independently: [0,0,1]

Comparing P2's [1,1,0] with P3's [0,0,1]: P2 has higher values in positions 1 and 2, but P3 has a higher value in position 3. These events are concurrent.

Version vectors#

Version vectors are closely related to vector clocks but serve a different purpose. While vector clocks track events, version vectors track data versions.

In a replicated database:

Each replica maintains a version vector for each data item
When a replica updates an item, it increments its own entry in the vector
When replicas synchronize, they compare version vectors to detect conflicts

The key difference: version vectors are attached to data items rather than to processes or events. This makes them more practical for key-value stores where you need to know whether two copies of the same key are in conflict.

Conflict detection with vector clocks#

When a client reads a value from a distributed store, it receives the value along with its vector clock. When writing back:

The client sends the new value along with the vector clock it received
The server compares the incoming clock with the stored clock
If the incoming clock descends from the stored clock, the write is an update — safe to overwrite
If the clocks are concurrent, a conflict exists — the system must either merge or keep both versions (siblings)

Conflict resolution strategies#

Last-writer-wins (LWW) — discard one version based on timestamp. Simple but loses data.
Application-level merge — return both versions to the client and let application logic merge them. Used by Riak.
CRDTs — conflict-free replicated data types that merge automatically by mathematical design.

Dotted Version Vectors#

Standard version vectors have a problem: sibling explosion. When concurrent writes create siblings, subsequent writes that do not resolve the conflict create even more siblings.

Dotted Version Vectors (DVVs) solve this by associating each value with a single "dot" — a (node, counter) pair — rather than a full vector. This allows the system to precisely track which node wrote which version and prune obsolete siblings.

DVVs provide:

Accurate sibling tracking without exponential growth
Efficient garbage collection of old versions
Correct identification of which values supersede which

Riak adopted DVVs to replace traditional vector clocks for exactly these reasons.

Causal consistency#

Vector clocks enable causal consistency — a consistency model that guarantees:

If operation A causally precedes operation B, then every node sees A before B
Concurrent operations may be seen in different orders by different nodes

Causal consistency is stronger than eventual consistency but weaker than linearizability. It is often the strongest consistency level achievable without coordination — making it attractive for systems that need low latency.

Implementing causal consistency#

Attach vector clocks to all operations
A node delays applying an operation until all causally preceding operations have been applied
Concurrent operations can be applied in any order

This requires tracking dependencies, which adds overhead proportional to the number of nodes.

Scaling challenges#

Vector clocks grow linearly with the number of processes. In a system with thousands of clients, this becomes impractical:

Space overhead — each clock entry adds bytes to every message
Comparison cost — comparing two vectors is O(N)
Garbage collection — pruning old entries requires careful coordination

Solutions#

Bounded vector clocks — limit the number of entries and evict the oldest. Risks false conflicts but keeps size manageable.
Server-side clocks — only track server replicas, not individual clients. DynamoDB uses this approach.
Hash-based schemes — compress the vector using hash functions at the cost of occasional false positives.
Interval tree clocks — dynamic structures that grow and shrink as nodes join and leave.

Real-world applications#

Riak#

Riak used vector clocks (later DVVs) to track object versions. Clients receive a causal context with each read and must return it with writes. Concurrent writes produce siblings that the application resolves.

Amazon DynamoDB#

DynamoDB originally used vector clocks as described in the Dynamo paper. Later versions moved toward server-side reconciliation with last-writer-wins, trading conflict detection for operational simplicity.

CockroachDB#

CockroachDB uses a hybrid approach — Hybrid Logical Clocks (HLC) that combine physical time with logical counters. This provides the ordering guarantees of logical clocks while staying close to real time.

Vector clocks vs. other approaches#

Mechanism	Detects causality	Detects concurrency	Overhead
Physical timestamps	No	No	Minimal
Lamport timestamps	Partial	No	Single integer
Vector clocks	Yes	Yes	O(N) integers
Dotted Version Vectors	Yes	Yes	O(N) with better pruning
Hybrid Logical Clocks	Yes	Partial	Timestamp + counter

Visualize vector clocks#

On Codelit, generate a distributed key-value store to see how vector clocks propagate across replicas. Click on a data item to trace its version history and observe conflict detection in action.

This is article #238 in the Codelit engineering blog series.

Build and explore distributed system architectures visually at codelit.io.

{ }

Explore the WhatsApp architecture interactively

Try it →

Try it on Codelit

Chaos Mode

Simulate node failures and watch cascading impact across your architecture

Build this architecture →

Comments

api design

Try these templates

Distributed Rate Limiter

API rate limiting with sliding window, token bucket, and per-user quotas.

7 components

Distributed Key-Value Store

Redis/DynamoDB-like distributed KV store with consistent hashing, replication, and tunable consistency.

8 components

Build this architecture

Generate an interactive architecture for Vector Clocks in seconds.

Try it in Codelit →

Vector Clocks — Tracking Causality in Distributed Systems

The problem with physical clocks#

The happened-before relation#

Lamport timestamps#

How it works#

Limitations#

Vector clocks#

How vector clocks work#

Comparing vector clocks#

Example#

Version vectors#

Conflict detection with vector clocks#

Conflict resolution strategies#

Dotted Version Vectors#

Causal consistency#

Implementing causal consistency#

Scaling challenges#

Solutions#

Real-world applications#

Riak#

Amazon DynamoDB#

CockroachDB#

Vector clocks vs. other approaches#

Visualize vector clocks#

Comments

Related articles

Batch API Endpoints — Patterns for Bulk Operations, Partial Success, and Idempotency

Circuit Breaker Implementation — State Machine, Failure Counting, Fallbacks, and Resilience4j

API-First Design Methodology — Design Before You Implement

Try these templates

Distributed Rate Limiter

Distributed Key-Value Store

Build this architecture

Vector Clocks — Tracking Causality in Distributed Systems

The problem with physical clocks#

The happened-before relation#

Lamport timestamps#

How it works#

Limitations#

Vector clocks#

How vector clocks work#

Comparing vector clocks#

Example#

Version vectors#

Conflict detection with vector clocks#

Conflict resolution strategies#

Dotted Version Vectors#

Causal consistency#

Implementing causal consistency#

Scaling challenges#

Solutions#

Real-world applications#

Riak#

Amazon DynamoDB#

CockroachDB#

Vector clocks vs. other approaches#

Visualize vector clocks#

Comments

Related articles

Batch API Endpoints — Patterns for Bulk Operations, Partial Success, and Idempotency

Circuit Breaker Implementation — State Machine, Failure Counting, Fallbacks, and Resilience4j

API-First Design Methodology — Design Before You Implement

Try these templates

Distributed Rate Limiter

Distributed Key-Value Store

Build this architecture