distributed-systemssystem-designdatabasespatterns

Eventual Consistency Patterns — Compensation, Reconciliation, and Session Guarantees

March 29, 2026 8 min readBy Codelit Team Discussion

Why eventual consistency#

Strong consistency is simple to reason about: after a write, every read sees the new value. But strong consistency requires coordination — locks, consensus protocols, waiting for replicas to acknowledge. That coordination has costs:

Higher latency — writes must wait for multiple nodes to agree
Lower availability — if a node is unreachable, writes may block
Reduced throughput — coordination is a bottleneck

The CAP theorem forces a choice. When you need availability and partition tolerance, you accept eventual consistency: all replicas will converge to the same state, but there is a window where they may disagree.

Most real-world systems are eventually consistent. DNS, email, social media feeds, shopping carts, CDN caches — all tolerate temporary inconsistency for speed and availability.

The consistency spectrum#

Consistency is not binary. From strongest to weakest:

Level	Guarantee	Cost
Linearizable	Reads always see the latest write	Highest latency
Sequential	All nodes see operations in the same order	High coordination
Causal	Causally related operations are ordered	Moderate overhead
Read-your-writes	You see your own writes immediately	Low overhead
Monotonic reads	You never see older data after seeing newer data	Low overhead
Eventual	All replicas converge eventually	Lowest overhead

Most applications need something between eventual and linearizable. The patterns below help you get the right guarantees without paying for full strong consistency.

Compensation patterns#

When a distributed operation partially fails, you cannot roll back like a local transaction. Instead, you compensate — execute an action that semantically undoes the effect.

The saga pattern#

A saga is a sequence of local transactions, each with a compensating action:

# Order saga: Reserve inventory -> Charge payment -> Ship order
saga_steps = [
    {
        "action": reserve_inventory,
        "compensate": release_inventory,
    },
    {
        "action": charge_payment,
        "compensate": refund_payment,
    },
    {
        "action": create_shipment,
        "compensate": cancel_shipment,
    },
]

def execute_saga(steps, order):
    completed = []
    for step in steps:
        try:
            step["action"](order)
            completed.append(step)
        except Exception:
            # Compensate in reverse order
            for s in reversed(completed):
                s["compensate"](order)
            raise SagaFailed(f"Failed at {step}, compensated {len(completed)} steps")

Compensating transactions in practice#

Compensation is not always a simple undo:

Refund a payment — not an undo, it is a new credit transaction
Cancel a shipment — if already shipped, initiate a return instead
Release inventory — must handle the case where someone else reserved it in the meantime

Design compensations as idempotent operations. They may be retried.

Reconciliation#

Even with careful design, replicas drift. Reconciliation detects and fixes divergence.

Background reconciliation#

Periodically compare replicas and fix mismatches:

async def reconcile_inventories():
    """Compare inventory counts across regions and resolve conflicts."""
    regions = ["us-east", "us-west", "eu-west"]
    for product_id in get_all_product_ids():
        counts = {}
        for region in regions:
            counts[region] = await get_inventory(region, product_id)

        if len(set(counts.values())) > 1:
            # Conflict detected — use the source of truth
            true_count = await get_warehouse_count(product_id)
            for region in regions:
                if counts[region] != true_count:
                    await set_inventory(region, product_id, true_count)
                    log.warning(f"Reconciled {product_id} in {region}: {counts[region]} -> {true_count}")

Event sourcing for reconciliation#

When you store events (not just current state), reconciliation becomes replaying events:

Each replica stores its event log
Compare event logs between replicas
Missing events are replayed on the lagging replica
State is rebuilt from the complete event sequence

This makes reconciliation deterministic and auditable.

Conflict resolution#

When two replicas accept conflicting writes during a partition, you need a resolution strategy.

Last-writer-wins (LWW)#

The write with the latest timestamp wins. Simple but lossy — one write is silently discarded.

def resolve_lww(version_a, version_b):
    if version_a.timestamp >= version_b.timestamp:
        return version_a
    return version_b

Problems with LWW: clock skew between nodes can cause the "wrong" write to win. Use hybrid logical clocks to reduce this risk.

Merge functions#

Instead of picking one write, merge both:

def resolve_shopping_cart(cart_a, cart_b):
    """Union merge — items added on either replica are kept."""
    merged_items = {}
    for item in cart_a.items + cart_b.items:
        if item.id in merged_items:
            merged_items[item.id].quantity = max(item.quantity, merged_items[item.id].quantity)
        else:
            merged_items[item.id] = item
    return Cart(items=list(merged_items.values()))

CRDTs (Conflict-free Replicated Data Types)#

Data structures designed to merge automatically without coordination:

G-Counter — grow-only counter, each node increments its own slot
PN-Counter — supports increment and decrement with two G-Counters
OR-Set — observed-remove set, handles concurrent add/remove
LWW-Register — last-writer-wins for single values

CRDTs guarantee convergence regardless of message ordering or duplication.

Anti-entropy protocols#

Anti-entropy is the process of actively spreading updates between replicas to reduce inconsistency windows.

Gossip protocol#

Each node periodically exchanges state with random peers:

class GossipNode:
    def __init__(self, node_id):
        self.node_id = node_id
        self.data = {}  # key -> (value, vector_clock)
        self.peers = []

    def gossip_round(self):
        peer = random.choice(self.peers)
        # Send our state digest
        digest = {k: v[1] for k, v in self.data.items()}  # key -> clock
        peer_updates = peer.receive_digest(self.node_id, digest)
        # Apply updates from peer
        for key, (value, clock) in peer_updates.items():
            if key not in self.data or clock > self.data[key][1]:
                self.data[key] = (value, clock)

Gossip spreads updates in O(log N) rounds for N nodes. It is robust to node failures — if one peer is down, the next gossip round picks a different peer.

Merkle trees#

Hash tree structures that efficiently detect which data ranges differ between replicas:

Each leaf represents a data partition hash
Parent nodes are hashes of their children
Compare root hashes — if equal, replicas are in sync
If different, recurse into children to find divergent partitions
Only transfer data from divergent partitions

Cassandra and DynamoDB use Merkle trees for anti-entropy repair.

Read-your-writes consistency#

The most commonly needed guarantee: after you write a value, you always read your own write back.

Sticky sessions#

Route all requests from the same user to the same replica:

def get_replica_for_user(user_id, replicas):
    # Consistent hashing ensures same user hits same replica
    index = hash(user_id) % len(replicas)
    return replicas[index]

If that replica holds the user's latest write, they always see it.

Write-through with local cache#

After writing, cache the written value locally. Reads check the cache before hitting the database:

class ReadYourWritesClient:
    def __init__(self, db_client):
        self.db = db_client
        self.local_writes = {}  # key -> (value, expiry)

    def write(self, key, value):
        self.db.write(key, value)
        self.local_writes[key] = (value, time.time() + 30)  # 30s TTL

    def read(self, key):
        if key in self.local_writes:
            value, expiry = self.local_writes[key]
            if time.time() < expiry:
                return value
            del self.local_writes[key]
        return self.db.read(key)

Monotonic reads#

A monotonic read guarantee means: if you read version 5 of a record, you will never subsequently read version 4. No going backward.

This breaks when requests are load-balanced across replicas at different replication positions.

Solution: version tracking#

class MonotonicReadClient:
    def __init__(self, replicas):
        self.replicas = replicas
        self.seen_versions = {}  # key -> last seen version

    def read(self, key):
        min_version = self.seen_versions.get(key, 0)
        for replica in self.replicas:
            value, version = replica.read_with_version(key)
            if version >= min_version:
                self.seen_versions[key] = version
                return value
        # Fallback to primary if no replica is caught up
        return self.primary.read(key)

Session guarantees#

Combine multiple consistency guarantees into a session context:

class ConsistentSession:
    def __init__(self, db_cluster):
        self.cluster = db_cluster
        self.write_timestamp = None  # Track our latest write
        self.read_version = {}       # Track latest version seen per key

    def write(self, key, value):
        ts = self.cluster.write(key, value)
        self.write_timestamp = ts
        self.read_version[key] = ts

    def read(self, key):
        # Read-your-writes: ensure we read at least our write timestamp
        min_ts = self.read_version.get(key, self.write_timestamp)
        value, ts = self.cluster.read(key, min_timestamp=min_ts)
        # Monotonic reads: never go backward
        if ts > self.read_version.get(key, 0):
            self.read_version[key] = ts
        return value

Session guarantees give users the illusion of strong consistency for their own operations while the system remains eventually consistent globally.

When to avoid eventual consistency#

Not everything should be eventually consistent:

Financial balances — double-spending is unacceptable
Inventory for flash sales — overselling creates real costs
Unique constraints — duplicate usernames or emails
Distributed locks — must be strongly consistent by definition

For these, use strong consistency (consensus protocols, serializable transactions) and accept the latency cost.

Key takeaways#

Eventual consistency is a spectrum — choose the weakest guarantee that your use case tolerates
Compensation over rollback — in distributed systems, undo is a new forward action
Reconciliation catches drift — periodically compare and fix divergent replicas
CRDTs merge without coordination — ideal for collaborative or offline-first data
Anti-entropy closes the window — gossip and Merkle trees actively spread updates
Session guarantees give the illusion of strong consistency — users see their own writes without global coordination

Visualize your consistency architecture#

Map out your replication topology, consistency boundaries, and conflict resolution flows — try Codelit to generate an interactive diagram.

Article #359 of the Codelit system design series. Explore all articles at codelit.io.

{ }

Explore the Discord architecture interactively

Try it →

Try it on Codelit

Chaos Mode

Simulate node failures and watch cascading impact across your architecture

Build this architecture →

Comments

api design

Batch API Endpoints — Patterns for Bulk Operations, Partial Success, and Idempotency

8 min read

system design

Circuit Breaker Implementation — State Machine, Failure Counting, Fallbacks, and Resilience4j

7 min read

api

API-First Design Methodology — Design Before You Implement

7 min read

Build this architecture

Generate an interactive architecture for Eventual Consistency Patterns in seconds.

Try it in Codelit →

distributed-systemssystem-designdatabasespatterns

Eventual Consistency Patterns — Compensation, Reconciliation, and Session Guarantees

March 29, 2026 8 min readBy Codelit Team Discussion

Why eventual consistency#

Higher latency — writes must wait for multiple nodes to agree
Lower availability — if a node is unreachable, writes may block
Reduced throughput — coordination is a bottleneck

Most real-world systems are eventually consistent. DNS, email, social media feeds, shopping carts, CDN caches — all tolerate temporary inconsistency for speed and availability.

The consistency spectrum#

Consistency is not binary. From strongest to weakest:

Level	Guarantee	Cost
Linearizable	Reads always see the latest write	Highest latency
Sequential	All nodes see operations in the same order	High coordination
Causal	Causally related operations are ordered	Moderate overhead
Read-your-writes	You see your own writes immediately	Low overhead
Monotonic reads	You never see older data after seeing newer data	Low overhead
Eventual	All replicas converge eventually	Lowest overhead

Most applications need something between eventual and linearizable. The patterns below help you get the right guarantees without paying for full strong consistency.

Compensation patterns#

When a distributed operation partially fails, you cannot roll back like a local transaction. Instead, you compensate — execute an action that semantically undoes the effect.

The saga pattern#

A saga is a sequence of local transactions, each with a compensating action:

# Order saga: Reserve inventory -> Charge payment -> Ship order
saga_steps = [
    {
        "action": reserve_inventory,
        "compensate": release_inventory,
    },
    {
        "action": charge_payment,
        "compensate": refund_payment,
    },
    {
        "action": create_shipment,
        "compensate": cancel_shipment,
    },
]

def execute_saga(steps, order):
    completed = []
    for step in steps:
        try:
            step["action"](order)
            completed.append(step)
        except Exception:
            # Compensate in reverse order
            for s in reversed(completed):
                s["compensate"](order)
            raise SagaFailed(f"Failed at {step}, compensated {len(completed)} steps")

Compensating transactions in practice#

Compensation is not always a simple undo:

Refund a payment — not an undo, it is a new credit transaction
Cancel a shipment — if already shipped, initiate a return instead
Release inventory — must handle the case where someone else reserved it in the meantime

Design compensations as idempotent operations. They may be retried.

Reconciliation#

Even with careful design, replicas drift. Reconciliation detects and fixes divergence.

Background reconciliation#

Periodically compare replicas and fix mismatches:

async def reconcile_inventories():
    """Compare inventory counts across regions and resolve conflicts."""
    regions = ["us-east", "us-west", "eu-west"]
    for product_id in get_all_product_ids():
        counts = {}
        for region in regions:
            counts[region] = await get_inventory(region, product_id)

        if len(set(counts.values())) > 1:
            # Conflict detected — use the source of truth
            true_count = await get_warehouse_count(product_id)
            for region in regions:
                if counts[region] != true_count:
                    await set_inventory(region, product_id, true_count)
                    log.warning(f"Reconciled {product_id} in {region}: {counts[region]} -> {true_count}")

Event sourcing for reconciliation#

When you store events (not just current state), reconciliation becomes replaying events:

Each replica stores its event log
Compare event logs between replicas
Missing events are replayed on the lagging replica
State is rebuilt from the complete event sequence

This makes reconciliation deterministic and auditable.

Conflict resolution#

When two replicas accept conflicting writes during a partition, you need a resolution strategy.

Last-writer-wins (LWW)#

The write with the latest timestamp wins. Simple but lossy — one write is silently discarded.

def resolve_lww(version_a, version_b):
    if version_a.timestamp >= version_b.timestamp:
        return version_a
    return version_b

Problems with LWW: clock skew between nodes can cause the "wrong" write to win. Use hybrid logical clocks to reduce this risk.

Merge functions#

Instead of picking one write, merge both:

def resolve_shopping_cart(cart_a, cart_b):
    """Union merge — items added on either replica are kept."""
    merged_items = {}
    for item in cart_a.items + cart_b.items:
        if item.id in merged_items:
            merged_items[item.id].quantity = max(item.quantity, merged_items[item.id].quantity)
        else:
            merged_items[item.id] = item
    return Cart(items=list(merged_items.values()))

CRDTs (Conflict-free Replicated Data Types)#

Data structures designed to merge automatically without coordination:

G-Counter — grow-only counter, each node increments its own slot
PN-Counter — supports increment and decrement with two G-Counters
OR-Set — observed-remove set, handles concurrent add/remove
LWW-Register — last-writer-wins for single values

CRDTs guarantee convergence regardless of message ordering or duplication.

Anti-entropy protocols#

Anti-entropy is the process of actively spreading updates between replicas to reduce inconsistency windows.

Gossip protocol#

Each node periodically exchanges state with random peers:

class GossipNode:
    def __init__(self, node_id):
        self.node_id = node_id
        self.data = {}  # key -> (value, vector_clock)
        self.peers = []

    def gossip_round(self):
        peer = random.choice(self.peers)
        # Send our state digest
        digest = {k: v[1] for k, v in self.data.items()}  # key -> clock
        peer_updates = peer.receive_digest(self.node_id, digest)
        # Apply updates from peer
        for key, (value, clock) in peer_updates.items():
            if key not in self.data or clock > self.data[key][1]:
                self.data[key] = (value, clock)

Gossip spreads updates in O(log N) rounds for N nodes. It is robust to node failures — if one peer is down, the next gossip round picks a different peer.

Merkle trees#

Hash tree structures that efficiently detect which data ranges differ between replicas:

Each leaf represents a data partition hash
Parent nodes are hashes of their children
Compare root hashes — if equal, replicas are in sync
If different, recurse into children to find divergent partitions
Only transfer data from divergent partitions

Cassandra and DynamoDB use Merkle trees for anti-entropy repair.

Read-your-writes consistency#

The most commonly needed guarantee: after you write a value, you always read your own write back.

Sticky sessions#

Route all requests from the same user to the same replica:

def get_replica_for_user(user_id, replicas):
    # Consistent hashing ensures same user hits same replica
    index = hash(user_id) % len(replicas)
    return replicas[index]

If that replica holds the user's latest write, they always see it.

Write-through with local cache#

After writing, cache the written value locally. Reads check the cache before hitting the database:

class ReadYourWritesClient:
    def __init__(self, db_client):
        self.db = db_client
        self.local_writes = {}  # key -> (value, expiry)

    def write(self, key, value):
        self.db.write(key, value)
        self.local_writes[key] = (value, time.time() + 30)  # 30s TTL

    def read(self, key):
        if key in self.local_writes:
            value, expiry = self.local_writes[key]
            if time.time() < expiry:
                return value
            del self.local_writes[key]
        return self.db.read(key)

Monotonic reads#

A monotonic read guarantee means: if you read version 5 of a record, you will never subsequently read version 4. No going backward.

This breaks when requests are load-balanced across replicas at different replication positions.

Solution: version tracking#

class MonotonicReadClient:
    def __init__(self, replicas):
        self.replicas = replicas
        self.seen_versions = {}  # key -> last seen version

    def read(self, key):
        min_version = self.seen_versions.get(key, 0)
        for replica in self.replicas:
            value, version = replica.read_with_version(key)
            if version >= min_version:
                self.seen_versions[key] = version
                return value
        # Fallback to primary if no replica is caught up
        return self.primary.read(key)

Session guarantees#

Combine multiple consistency guarantees into a session context:

class ConsistentSession:
    def __init__(self, db_cluster):
        self.cluster = db_cluster
        self.write_timestamp = None  # Track our latest write
        self.read_version = {}       # Track latest version seen per key

    def write(self, key, value):
        ts = self.cluster.write(key, value)
        self.write_timestamp = ts
        self.read_version[key] = ts

    def read(self, key):
        # Read-your-writes: ensure we read at least our write timestamp
        min_ts = self.read_version.get(key, self.write_timestamp)
        value, ts = self.cluster.read(key, min_timestamp=min_ts)
        # Monotonic reads: never go backward
        if ts > self.read_version.get(key, 0):
            self.read_version[key] = ts
        return value

Session guarantees give users the illusion of strong consistency for their own operations while the system remains eventually consistent globally.

When to avoid eventual consistency#

Not everything should be eventually consistent:

Financial balances — double-spending is unacceptable
Inventory for flash sales — overselling creates real costs
Unique constraints — duplicate usernames or emails
Distributed locks — must be strongly consistent by definition

For these, use strong consistency (consensus protocols, serializable transactions) and accept the latency cost.

Key takeaways#

Eventual consistency is a spectrum — choose the weakest guarantee that your use case tolerates
Compensation over rollback — in distributed systems, undo is a new forward action
Reconciliation catches drift — periodically compare and fix divergent replicas
CRDTs merge without coordination — ideal for collaborative or offline-first data
Anti-entropy closes the window — gossip and Merkle trees actively spread updates
Session guarantees give the illusion of strong consistency — users see their own writes without global coordination

Visualize your consistency architecture#

Map out your replication topology, consistency boundaries, and conflict resolution flows — try Codelit to generate an interactive diagram.

Article #359 of the Codelit system design series. Explore all articles at codelit.io.

{ }

Explore the Discord architecture interactively

Try it →

Try it on Codelit

Chaos Mode

Simulate node failures and watch cascading impact across your architecture

Build this architecture →

Comments

api design

Build this architecture

Generate an interactive architecture for Eventual Consistency Patterns in seconds.

Try it in Codelit →

Eventual Consistency Patterns — Compensation, Reconciliation, and Session Guarantees

Why eventual consistency#

The consistency spectrum#

Compensation patterns#

The saga pattern#

Compensating transactions in practice#

Reconciliation#

Background reconciliation#

Event sourcing for reconciliation#

Conflict resolution#

Last-writer-wins (LWW)#

Merge functions#

CRDTs (Conflict-free Replicated Data Types)#

Anti-entropy protocols#

Gossip protocol#

Merkle trees#

Read-your-writes consistency#

Sticky sessions#

Write-through with local cache#

Monotonic reads#

Solution: version tracking#

Session guarantees#

When to avoid eventual consistency#

Key takeaways#

Visualize your consistency architecture#

Comments

Related articles

Batch API Endpoints — Patterns for Bulk Operations, Partial Success, and Idempotency

Circuit Breaker Implementation — State Machine, Failure Counting, Fallbacks, and Resilience4j

API-First Design Methodology — Design Before You Implement

Build this architecture

Eventual Consistency Patterns — Compensation, Reconciliation, and Session Guarantees

Why eventual consistency#

The consistency spectrum#

Compensation patterns#

The saga pattern#

Compensating transactions in practice#

Reconciliation#

Background reconciliation#

Event sourcing for reconciliation#

Conflict resolution#

Last-writer-wins (LWW)#

Merge functions#

CRDTs (Conflict-free Replicated Data Types)#

Anti-entropy protocols#

Gossip protocol#

Merkle trees#

Read-your-writes consistency#

Sticky sessions#

Write-through with local cache#

Monotonic reads#

Solution: version tracking#

Session guarantees#

When to avoid eventual consistency#

Key takeaways#

Visualize your consistency architecture#

Comments

Related articles

Batch API Endpoints — Patterns for Bulk Operations, Partial Success, and Idempotency

Circuit Breaker Implementation — State Machine, Failure Counting, Fallbacks, and Resilience4j

API-First Design Methodology — Design Before You Implement

Build this architecture