distributed-systemssystem-designinfrastructure

Clock Synchronization in Distributed Systems — Wall Clocks, Logical Clocks & TrueTime

March 29, 2026 7 min readBy Codelit Team Discussion

Why time is hard in distributed systems#

On a single machine, you call Date.now() and trust the result. In a distributed system, every node has its own clock. Those clocks drift. They disagree. And when two nodes disagree about what time it is, bad things happen — duplicate writes, lost updates, impossible orderings.

Clock synchronization is not about knowing the exact time. It is about agreeing on the order of events.

Wall clocks vs monotonic clocks#

Your operating system exposes two types of clocks:

Wall clock (time-of-day clock)#

Returns the current date and time. Can jump forward or backward when synced with NTP. Not safe for measuring durations or ordering events across nodes.

Monotonic clock#

Returns a steadily increasing counter. Never jumps backward. Safe for measuring elapsed time on a single node. Useless for comparing timestamps across nodes because each node starts its counter at a different point.

Rule of thumb: Use monotonic clocks for timeouts and latency measurements. Use wall clocks only when you need a human-readable timestamp — and never trust them for ordering.

NTP — Network Time Protocol#

NTP synchronizes wall clocks across machines by querying time servers. It has been around since 1985 and is the backbone of clock sync on the internet.

How NTP works#

Client sends a request to the NTP server, recording the send time t1
Server receives at t2, responds at t3
Client receives the response at t4
Round-trip delay = (t4 - t1) - (t3 - t2)
Clock offset = ((t2 - t1) + (t3 - t4)) / 2

The client adjusts its clock by the calculated offset, accounting for network delay.

NTP limitations#

Typical accuracy: 1–10 ms on the public internet, sub-millisecond on a LAN
Asymmetric network paths cause systematic errors
NTP cannot guarantee bounded skew — it is best-effort
A misconfigured NTP server can push clocks far off

For most web applications, NTP is good enough. For databases that need to order transactions globally, it is not.

Logical clocks — ordering without real time#

Leslie Lamport showed in 1978 that you do not need synchronized clocks to establish event ordering. You just need a counter.

Lamport timestamps#

Each node maintains a counter C. On every local event, increment C. When sending a message, attach C. When receiving a message with timestamp T, set C = max(C, T) + 1.

This gives you a happens-before relationship: if event A causally precedes event B, then C(A) < C(B). But the reverse is not true — if C(A) < C(B), A might not have caused B. They could be concurrent.

Vector clocks#

To detect concurrency, each node maintains a vector of counters — one per node. Node i increments V[i] on every event. Messages carry the full vector. The receiver merges by taking the element-wise max.

Two events are concurrent if neither vector dominates the other. This is how Amazon Dynamo detected conflicting writes.

Trade-off: Vector clocks grow with the number of nodes. At scale, you need techniques like pruning or version vectors to keep them bounded.

Google TrueTime — bounded uncertainty#

Google Spanner needed globally consistent transactions across data centers. NTP was not precise enough. Logical clocks did not give real-time ordering. So Google built TrueTime.

How TrueTime works#

TrueTime returns an interval [earliest, latest] instead of a single timestamp. The API is:

TT.now() returns TTinterval {earliest, latest}
TT.after(t) returns true if t is definitely in the past
TT.before(t) returns true if t is definitely in the future

Google achieves tight intervals (typically under 7 ms) using GPS receivers and atomic clocks in every data center. The uncertainty bound shrinks because these hardware clocks drift very slowly.

Spanner's commit-wait#

When Spanner commits a transaction at timestamp s, it waits until TT.after(s) is true before releasing the commit. This guarantees that any transaction that starts after the commit will see the committed data — even across continents.

The cost is latency: every write waits for the uncertainty interval to pass. But the guarantee is powerful — externally consistent global transactions without locking everything.

Hybrid Logical Clocks (HLC)#

HLCs combine the best of wall clocks and logical clocks. They track physical time loosely while maintaining causal ordering strictly.

HLC structure#

An HLC timestamp has three components:

pt — physical time (wall clock, loosely synced via NTP)
l — logical component (extends physical time when wall clock has not advanced)
c — counter for events at the same (pt, l)

HLC rules#

On a local event or send:

If wall clock pt has advanced past l, update l = pt and reset c = 0
Otherwise, increment c

On receive with incoming (l', c'):

Set l = max(local_l, l', pt)
If l did not change, increment the larger counter
Otherwise reset counter

Why HLCs matter#

HLCs give you timestamps that are close to real time (within NTP accuracy) while preserving causal ordering. CockroachDB uses HLCs for transaction ordering. They do not require specialized hardware like TrueTime.

The trade-off: HLCs cannot provide the bounded uncertainty guarantee that TrueTime offers. Clock skew beyond NTP accuracy can cause ordering anomalies.

Clock skew impact on real systems#

Clock skew is not theoretical. Here is what goes wrong:

Distributed databases#

If node A writes at t=100 and node B writes at t=99 (due to clock skew), but B's write actually happened after A's, you get a causal violation. Last-write-wins resolves to the wrong value.

Lease expiration#

Node A holds a lease that expires at t=200. Node B's clock is ahead — it thinks the lease expired at t=198 and takes over. Now both nodes think they hold the lease. This breaks mutual exclusion.

Certificate validation#

TLS certificates have notBefore and notAfter fields. A client with a clock 5 minutes behind might reject a valid certificate. A clock 5 minutes ahead might accept an expired one.

Log correlation#

When debugging a production incident across 50 services, timestamps that are off by even 100 ms make it nearly impossible to reconstruct the sequence of events.

Choosing the right approach#

Approach	Accuracy	Causal ordering	Hardware needed	Used by
NTP	1–10 ms	No	None	Everything
Lamport clocks	N/A	Partial	None	Many distributed algorithms
Vector clocks	N/A	Full	None	Dynamo, Riak
TrueTime	< 7 ms	Yes (with wait)	GPS + atomic	Google Spanner
HLC	NTP-bounded	Yes	None	CockroachDB

Practical recommendations#

Always use NTP — configure chrony or ntpd on every server
Monitor clock skew — alert if any node drifts more than 100 ms
Use monotonic clocks for timeouts — never Date.now() for measuring elapsed time
Pick the right ordering primitive — Lamport for simple causality, vector clocks for conflict detection, HLC for time-correlated causality
Do not assume clocks are synchronized — design protocols that tolerate bounded skew
If you need global ordering without specialized hardware, HLCs are your best option

Visualize your clock synchronization architecture#

Model how NTP, logical clocks, and HLCs fit into your distributed system — try Codelit to generate an interactive diagram.

Key takeaways#

Wall clocks lie — they jump, drift, and disagree across nodes
NTP is best-effort — good enough for most apps, not enough for global transactions
Lamport timestamps give partial ordering without real time
Vector clocks detect concurrent writes but grow with cluster size
Google TrueTime uses GPS and atomic clocks for bounded uncertainty
Hybrid Logical Clocks combine physical and logical time without special hardware
Clock skew breaks leases, ordering, and debugging — always monitor and bound it

This is article #425 of the Codelit engineering blog.

{ }

Explore the Discord architecture interactively

Try it →

Try it on Codelit

Chaos Mode

Simulate node failures and watch cascading impact across your architecture

Cost Estimator

See estimated AWS monthly costs for every component in your architecture

Build this architecture →

Comments

api design

Batch API Endpoints — Patterns for Bulk Operations, Partial Success, and Idempotency

8 min read

system design

Circuit Breaker Implementation — State Machine, Failure Counting, Fallbacks, and Resilience4j

7 min read

api

API-First Design Methodology — Design Before You Implement

7 min read

Try these templates

Distributed Rate Limiter

API rate limiting with sliding window, token bucket, and per-user quotas.

7 components

Distributed Key-Value Store

Redis/DynamoDB-like distributed KV store with consistent hashing, replication, and tunable consistency.

8 components

Build this architecture

Generate an interactive architecture for Clock Synchronization in Distributed Systems in seconds.

Try it in Codelit →

distributed-systemssystem-designinfrastructure

Clock Synchronization in Distributed Systems — Wall Clocks, Logical Clocks & TrueTime

March 29, 2026 7 min readBy Codelit Team Discussion

Why time is hard in distributed systems#

Clock synchronization is not about knowing the exact time. It is about agreeing on the order of events.

Wall clocks vs monotonic clocks#

Your operating system exposes two types of clocks:

Wall clock (time-of-day clock)#

Returns the current date and time. Can jump forward or backward when synced with NTP. Not safe for measuring durations or ordering events across nodes.

Monotonic clock#

Rule of thumb: Use monotonic clocks for timeouts and latency measurements. Use wall clocks only when you need a human-readable timestamp — and never trust them for ordering.

NTP — Network Time Protocol#

NTP synchronizes wall clocks across machines by querying time servers. It has been around since 1985 and is the backbone of clock sync on the internet.

How NTP works#

Client sends a request to the NTP server, recording the send time t1
Server receives at t2, responds at t3
Client receives the response at t4
Round-trip delay = (t4 - t1) - (t3 - t2)
Clock offset = ((t2 - t1) + (t3 - t4)) / 2

The client adjusts its clock by the calculated offset, accounting for network delay.

NTP limitations#

Typical accuracy: 1–10 ms on the public internet, sub-millisecond on a LAN
Asymmetric network paths cause systematic errors
NTP cannot guarantee bounded skew — it is best-effort
A misconfigured NTP server can push clocks far off

For most web applications, NTP is good enough. For databases that need to order transactions globally, it is not.

Logical clocks — ordering without real time#

Leslie Lamport showed in 1978 that you do not need synchronized clocks to establish event ordering. You just need a counter.

Lamport timestamps#

Each node maintains a counter C. On every local event, increment C. When sending a message, attach C. When receiving a message with timestamp T, set C = max(C, T) + 1.

Vector clocks#

Two events are concurrent if neither vector dominates the other. This is how Amazon Dynamo detected conflicting writes.

Trade-off: Vector clocks grow with the number of nodes. At scale, you need techniques like pruning or version vectors to keep them bounded.

Google TrueTime — bounded uncertainty#

Google Spanner needed globally consistent transactions across data centers. NTP was not precise enough. Logical clocks did not give real-time ordering. So Google built TrueTime.

How TrueTime works#

TrueTime returns an interval [earliest, latest] instead of a single timestamp. The API is:

TT.now() returns TTinterval {earliest, latest}
TT.after(t) returns true if t is definitely in the past
TT.before(t) returns true if t is definitely in the future

Google achieves tight intervals (typically under 7 ms) using GPS receivers and atomic clocks in every data center. The uncertainty bound shrinks because these hardware clocks drift very slowly.

Spanner's commit-wait#

The cost is latency: every write waits for the uncertainty interval to pass. But the guarantee is powerful — externally consistent global transactions without locking everything.

Hybrid Logical Clocks (HLC)#

HLCs combine the best of wall clocks and logical clocks. They track physical time loosely while maintaining causal ordering strictly.

HLC structure#

An HLC timestamp has three components:

pt — physical time (wall clock, loosely synced via NTP)
l — logical component (extends physical time when wall clock has not advanced)
c — counter for events at the same (pt, l)

HLC rules#

On a local event or send:

If wall clock pt has advanced past l, update l = pt and reset c = 0
Otherwise, increment c

On receive with incoming (l', c'):

Set l = max(local_l, l', pt)
If l did not change, increment the larger counter
Otherwise reset counter

Why HLCs matter#

The trade-off: HLCs cannot provide the bounded uncertainty guarantee that TrueTime offers. Clock skew beyond NTP accuracy can cause ordering anomalies.

Clock skew impact on real systems#

Clock skew is not theoretical. Here is what goes wrong:

Distributed databases#

If node A writes at t=100 and node B writes at t=99 (due to clock skew), but B's write actually happened after A's, you get a causal violation. Last-write-wins resolves to the wrong value.

Lease expiration#

Certificate validation#

TLS certificates have notBefore and notAfter fields. A client with a clock 5 minutes behind might reject a valid certificate. A clock 5 minutes ahead might accept an expired one.

Log correlation#

When debugging a production incident across 50 services, timestamps that are off by even 100 ms make it nearly impossible to reconstruct the sequence of events.

Choosing the right approach#

Approach	Accuracy	Causal ordering	Hardware needed	Used by
NTP	1–10 ms	No	None	Everything
Lamport clocks	N/A	Partial	None	Many distributed algorithms
Vector clocks	N/A	Full	None	Dynamo, Riak
TrueTime	< 7 ms	Yes (with wait)	GPS + atomic	Google Spanner
HLC	NTP-bounded	Yes	None	CockroachDB

Practical recommendations#

Always use NTP — configure chrony or ntpd on every server
Monitor clock skew — alert if any node drifts more than 100 ms
Use monotonic clocks for timeouts — never Date.now() for measuring elapsed time
Pick the right ordering primitive — Lamport for simple causality, vector clocks for conflict detection, HLC for time-correlated causality
Do not assume clocks are synchronized — design protocols that tolerate bounded skew
If you need global ordering without specialized hardware, HLCs are your best option

Visualize your clock synchronization architecture#

Model how NTP, logical clocks, and HLCs fit into your distributed system — try Codelit to generate an interactive diagram.

Key takeaways#

Wall clocks lie — they jump, drift, and disagree across nodes
NTP is best-effort — good enough for most apps, not enough for global transactions
Lamport timestamps give partial ordering without real time
Vector clocks detect concurrent writes but grow with cluster size
Google TrueTime uses GPS and atomic clocks for bounded uncertainty
Hybrid Logical Clocks combine physical and logical time without special hardware
Clock skew breaks leases, ordering, and debugging — always monitor and bound it

This is article #425 of the Codelit engineering blog.

{ }

Explore the Discord architecture interactively

Try it →

Try it on Codelit

Chaos Mode

Simulate node failures and watch cascading impact across your architecture

Cost Estimator

See estimated AWS monthly costs for every component in your architecture

Build this architecture →

Comments

api design

Try these templates

Distributed Rate Limiter

API rate limiting with sliding window, token bucket, and per-user quotas.

7 components

Distributed Key-Value Store

Redis/DynamoDB-like distributed KV store with consistent hashing, replication, and tunable consistency.

8 components

Build this architecture

Generate an interactive architecture for Clock Synchronization in Distributed Systems in seconds.

Try it in Codelit →

Clock Synchronization in Distributed Systems — Wall Clocks, Logical Clocks & TrueTime

Why time is hard in distributed systems#

Wall clocks vs monotonic clocks#

Wall clock (time-of-day clock)#

Monotonic clock#

NTP — Network Time Protocol#

How NTP works#

NTP limitations#

Logical clocks — ordering without real time#

Lamport timestamps#

Vector clocks#

Google TrueTime — bounded uncertainty#

How TrueTime works#

Spanner's commit-wait#

Hybrid Logical Clocks (HLC)#

HLC structure#

HLC rules#

Why HLCs matter#

Clock skew impact on real systems#

Distributed databases#

Lease expiration#

Certificate validation#

Log correlation#

Choosing the right approach#

Practical recommendations#

Visualize your clock synchronization architecture#

Key takeaways#

Comments

Related articles

Batch API Endpoints — Patterns for Bulk Operations, Partial Success, and Idempotency

Circuit Breaker Implementation — State Machine, Failure Counting, Fallbacks, and Resilience4j

API-First Design Methodology — Design Before You Implement

Try these templates

Distributed Rate Limiter

Distributed Key-Value Store

Build this architecture

Clock Synchronization in Distributed Systems — Wall Clocks, Logical Clocks & TrueTime

Why time is hard in distributed systems#

Wall clocks vs monotonic clocks#

Wall clock (time-of-day clock)#

Monotonic clock#

NTP — Network Time Protocol#

How NTP works#

NTP limitations#

Logical clocks — ordering without real time#

Lamport timestamps#

Vector clocks#

Google TrueTime — bounded uncertainty#

How TrueTime works#

Spanner's commit-wait#

Hybrid Logical Clocks (HLC)#

HLC structure#

HLC rules#

Why HLCs matter#

Clock skew impact on real systems#

Distributed databases#

Lease expiration#

Certificate validation#

Log correlation#

Choosing the right approach#

Practical recommendations#

Visualize your clock synchronization architecture#

Key takeaways#

Comments

Related articles

Batch API Endpoints — Patterns for Bulk Operations, Partial Success, and Idempotency

Circuit Breaker Implementation — State Machine, Failure Counting, Fallbacks, and Resilience4j

API-First Design Methodology — Design Before You Implement

Try these templates

Distributed Rate Limiter

Distributed Key-Value Store

Build this architecture