two-phase commitdistributed systemstransactionsdatabasessystem design

Two-Phase Commit: The Backbone of Distributed Transactions

March 29, 2026 6 min readBy Codelit Team Discussion

Two-Phase Commit (2PC)#

When a single transaction spans multiple databases or services, you need a protocol that guarantees all nodes commit or all nodes abort. Two-phase commit is the classic answer — battle-tested, widely deployed, and fundamentally important to understand.

The Core Problem#

Imagine transferring $500 from Bank A to Bank B. Two databases must update atomically:

Bank A: balance -= 500
Bank B: balance += 500

If Bank A commits but Bank B crashes before committing, the money vanishes. You need atomic commitment across independent systems.

How 2PC Works#

The protocol has two distinct phases, orchestrated by a coordinator node.

Phase 1: Prepare (Voting)#

Coordinator → Participant A: "PREPARE to commit transaction T1"
Coordinator → Participant B: "PREPARE to commit transaction T1"

Participant A: acquires locks, writes to WAL → "YES, I can commit"
Participant B: acquires locks, writes to WAL → "YES, I can commit"

Each participant does everything needed to commit except actually committing. Data is written to the write-ahead log. Locks are held. The participant promises it can commit if asked.

Phase 2: Commit (Decision)#

Coordinator receives all YES votes:
Coordinator → Participant A: "COMMIT transaction T1"
Coordinator → Participant B: "COMMIT transaction T1"

Both participants: make changes permanent, release locks → "ACK"

If any participant votes NO (or times out), the coordinator sends ABORT to everyone.

The Decision Rule#

All vote YES → COMMIT
Any vote NO or timeout → ABORT

This is the key invariant. Once the coordinator decides, the outcome is final.

State Machine#

Coordinator states:
  INIT → WAITING → COMMITTED | ABORTED

Participant states:
  INIT → PREPARED → COMMITTED | ABORTED

The critical moment is when a participant enters PREPARED state — it has promised to commit but hasn't yet. It must wait for the coordinator's decision.

The Blocking Problem#

This is 2PC's Achilles heel. If the coordinator crashes after sending PREPARE but before sending COMMIT/ABORT:

Participant A: in PREPARED state, holding locks, waiting...
Participant B: in PREPARED state, holding locks, waiting...

Coordinator: crashed

Both participants are BLOCKED — they cannot safely commit or abort

Participants hold locks indefinitely. They cannot abort (the coordinator might have decided COMMIT). They cannot commit (the coordinator might have decided ABORT). The system is stuck.

Why Participants Can't Decide Alone#

Scenario 1: Participant A votes YES, Participant B voted NO
  → Coordinator decided ABORT
  → If A commits unilaterally → inconsistency

Scenario 2: Both voted YES
  → Coordinator decided COMMIT
  → If A aborts unilaterally → inconsistency

Without hearing from the coordinator, no safe choice exists.

Recovery and Logging#

Every 2PC participant writes to a durable log at critical moments:

Coordinator log:
  1. Write PREPARE record before sending PREPARE
  2. Write COMMIT/ABORT record before sending decision
  3. Write COMPLETE record after all ACKs received

Participant log:
  1. Write YES record before voting YES
  2. Write COMMIT/ABORT record after receiving decision

On recovery, nodes consult their logs. If a participant finds a YES record but no decision record, it must contact the coordinator (or other participants) to learn the outcome.

Three-Phase Commit (3PC)#

3PC adds a PRE-COMMIT phase to avoid blocking:

Phase 1: PREPARE → collect votes (same as 2PC)
Phase 2: PRE-COMMIT → tell all participants "we will commit"
Phase 3: COMMIT → finalize

The key insight: if a participant is in PRE-COMMIT state and the coordinator crashes, it knows all participants voted YES. It can safely proceed to commit after a timeout.

3PC Trade-offs#

Solves the blocking problem under certain failure models
Does NOT work with network partitions (a partitioned node might abort while others commit)
Adds an extra round trip of latency
Rarely used in practice — most systems prefer 2PC with timeout-based recovery

XA Transactions#

XA is the industry standard interface for 2PC, defined by the X/Open group:

Application (Transaction Manager)
  ├── xa_start()    → begin transaction on resource
  ├── xa_end()      → end work on resource
  ├── xa_prepare()  → Phase 1
  ├── xa_commit()   → Phase 2
  └── xa_rollback() → abort

Supported by PostgreSQL, MySQL, Oracle, and most message brokers. The application server acts as coordinator, databases act as participants.

XA Limitations#

Performance: locks held across two network round trips
Heterogeneous systems: all participants must support XA
Coordinator is a single point of failure
Latency-sensitive: not suitable for cross-datacenter transactions

2PC in Distributed Databases#

Modern distributed databases use 2PC internally but hide the complexity:

Google Spanner: uses 2PC for cross-shard transactions with Paxos-replicated participants — each participant is a Paxos group, so coordinator failure doesn't block.

CockroachDB: parallel commits optimization — the coordinator writes its commit record and the participant commit records simultaneously, reducing latency from 2 round trips to 1.

TiDB: uses a Percolator-inspired protocol where the "primary key" acts as the commit point, avoiding a separate coordinator.

2PC vs Saga Pattern#

Aspect	2PC	Saga
Consistency	Strong (ACID)	Eventual
Isolation	Full (locks held)	None (intermediate states visible)
Latency	Higher (lock duration)	Lower (no global locks)
Availability	Lower (blocking on coordinator)	Higher (no coordinator)
Complexity	Protocol complexity	Compensation logic complexity
Best for	Short-lived DB transactions	Long-running business processes

When to Use 2PC#

Transactions complete in milliseconds
Strong consistency is non-negotiable
All participants are within the same datacenter
You control all participating systems

When to Use Sagas#

Transactions span multiple services or datacenters
Operations take seconds or longer
Availability matters more than strict consistency
Services are independently owned/deployed

Practical Considerations#

Timeout tuning: too short and you abort healthy transactions; too long and you hold locks unnecessarily. Start with 2-5x your p99 latency.

Coordinator availability: run the coordinator as a replicated state machine (Raft/Paxos) to avoid single point of failure.

Heuristic decisions: some systems allow participants to make a "heuristic commit" or "heuristic rollback" after a long timeout — this risks inconsistency but unblocks the system.

Monitoring: track prepare-to-commit latency, abort rates, and in-doubt transaction counts. A rising in-doubt count signals coordinator issues.

Key Takeaways#

2PC guarantees atomic commitment across distributed nodes
The blocking problem is the fundamental limitation
3PC solves blocking but fails under network partitions
XA is the standard API for 2PC across databases
Modern distributed databases use enhanced 2PC internally
Choose 2PC for short, strongly-consistent transactions; sagas for long-running, cross-service workflows

This is article #225 in the Codelit system design series. Explore all articles at codelit.io.

Try it on Codelit

GitHub Integration

Paste any repo URL to generate an interactive architecture diagram from real code

Build this architecture →

Comments

AI search

AI-Powered Search Architecture: Semantic Search, Hybrid Search, and RAG

8 min read

AI safety

AI Safety Guardrails Architecture: Input Validation, Output Filtering, and Human-in-the-Loop

8 min read

API design

API Backward Compatibility: Ship Changes Without Breaking Consumers

6 min read

Try these templates

Distributed Rate Limiter

API rate limiting with sliding window, token bucket, and per-user quotas.

7 components

Distributed Key-Value Store

Redis/DynamoDB-like distributed KV store with consistent hashing, replication, and tunable consistency.

8 components

Build this architecture

Generate an interactive architecture for Two in seconds.

Try it in Codelit →

two-phase commitdistributed systemstransactionsdatabasessystem design

Two-Phase Commit: The Backbone of Distributed Transactions

March 29, 2026 6 min readBy Codelit Team Discussion

Two-Phase Commit (2PC)#

The Core Problem#

Imagine transferring $500 from Bank A to Bank B. Two databases must update atomically:

Bank A: balance -= 500
Bank B: balance += 500

If Bank A commits but Bank B crashes before committing, the money vanishes. You need atomic commitment across independent systems.

How 2PC Works#

The protocol has two distinct phases, orchestrated by a coordinator node.

Phase 1: Prepare (Voting)#

Coordinator → Participant A: "PREPARE to commit transaction T1"
Coordinator → Participant B: "PREPARE to commit transaction T1"

Participant A: acquires locks, writes to WAL → "YES, I can commit"
Participant B: acquires locks, writes to WAL → "YES, I can commit"

Each participant does everything needed to commit except actually committing. Data is written to the write-ahead log. Locks are held. The participant promises it can commit if asked.

Phase 2: Commit (Decision)#

Coordinator receives all YES votes:
Coordinator → Participant A: "COMMIT transaction T1"
Coordinator → Participant B: "COMMIT transaction T1"

Both participants: make changes permanent, release locks → "ACK"

If any participant votes NO (or times out), the coordinator sends ABORT to everyone.

The Decision Rule#

All vote YES → COMMIT
Any vote NO or timeout → ABORT

This is the key invariant. Once the coordinator decides, the outcome is final.

State Machine#

Coordinator states:
  INIT → WAITING → COMMITTED | ABORTED

Participant states:
  INIT → PREPARED → COMMITTED | ABORTED

The critical moment is when a participant enters PREPARED state — it has promised to commit but hasn't yet. It must wait for the coordinator's decision.

The Blocking Problem#

This is 2PC's Achilles heel. If the coordinator crashes after sending PREPARE but before sending COMMIT/ABORT:

Participant A: in PREPARED state, holding locks, waiting...
Participant B: in PREPARED state, holding locks, waiting...

Coordinator: crashed

Both participants are BLOCKED — they cannot safely commit or abort

Participants hold locks indefinitely. They cannot abort (the coordinator might have decided COMMIT). They cannot commit (the coordinator might have decided ABORT). The system is stuck.

Why Participants Can't Decide Alone#

Scenario 1: Participant A votes YES, Participant B voted NO
  → Coordinator decided ABORT
  → If A commits unilaterally → inconsistency

Scenario 2: Both voted YES
  → Coordinator decided COMMIT
  → If A aborts unilaterally → inconsistency

Without hearing from the coordinator, no safe choice exists.

Recovery and Logging#

Every 2PC participant writes to a durable log at critical moments:

Coordinator log:
  1. Write PREPARE record before sending PREPARE
  2. Write COMMIT/ABORT record before sending decision
  3. Write COMPLETE record after all ACKs received

Participant log:
  1. Write YES record before voting YES
  2. Write COMMIT/ABORT record after receiving decision

On recovery, nodes consult their logs. If a participant finds a YES record but no decision record, it must contact the coordinator (or other participants) to learn the outcome.

Three-Phase Commit (3PC)#

3PC adds a PRE-COMMIT phase to avoid blocking:

Phase 1: PREPARE → collect votes (same as 2PC)
Phase 2: PRE-COMMIT → tell all participants "we will commit"
Phase 3: COMMIT → finalize

The key insight: if a participant is in PRE-COMMIT state and the coordinator crashes, it knows all participants voted YES. It can safely proceed to commit after a timeout.

3PC Trade-offs#

Solves the blocking problem under certain failure models
Does NOT work with network partitions (a partitioned node might abort while others commit)
Adds an extra round trip of latency
Rarely used in practice — most systems prefer 2PC with timeout-based recovery

XA Transactions#

XA is the industry standard interface for 2PC, defined by the X/Open group:

Application (Transaction Manager)
  ├── xa_start()    → begin transaction on resource
  ├── xa_end()      → end work on resource
  ├── xa_prepare()  → Phase 1
  ├── xa_commit()   → Phase 2
  └── xa_rollback() → abort

Supported by PostgreSQL, MySQL, Oracle, and most message brokers. The application server acts as coordinator, databases act as participants.

XA Limitations#

Performance: locks held across two network round trips
Heterogeneous systems: all participants must support XA
Coordinator is a single point of failure
Latency-sensitive: not suitable for cross-datacenter transactions

2PC in Distributed Databases#

Modern distributed databases use 2PC internally but hide the complexity:

Google Spanner: uses 2PC for cross-shard transactions with Paxos-replicated participants — each participant is a Paxos group, so coordinator failure doesn't block.

CockroachDB: parallel commits optimization — the coordinator writes its commit record and the participant commit records simultaneously, reducing latency from 2 round trips to 1.

TiDB: uses a Percolator-inspired protocol where the "primary key" acts as the commit point, avoiding a separate coordinator.

2PC vs Saga Pattern#

Aspect	2PC	Saga
Consistency	Strong (ACID)	Eventual
Isolation	Full (locks held)	None (intermediate states visible)
Latency	Higher (lock duration)	Lower (no global locks)
Availability	Lower (blocking on coordinator)	Higher (no coordinator)
Complexity	Protocol complexity	Compensation logic complexity
Best for	Short-lived DB transactions	Long-running business processes

When to Use 2PC#

Transactions complete in milliseconds
Strong consistency is non-negotiable
All participants are within the same datacenter
You control all participating systems

When to Use Sagas#

Transactions span multiple services or datacenters
Operations take seconds or longer
Availability matters more than strict consistency
Services are independently owned/deployed

Practical Considerations#

Timeout tuning: too short and you abort healthy transactions; too long and you hold locks unnecessarily. Start with 2-5x your p99 latency.

Coordinator availability: run the coordinator as a replicated state machine (Raft/Paxos) to avoid single point of failure.

Heuristic decisions: some systems allow participants to make a "heuristic commit" or "heuristic rollback" after a long timeout — this risks inconsistency but unblocks the system.

Monitoring: track prepare-to-commit latency, abort rates, and in-doubt transaction counts. A rising in-doubt count signals coordinator issues.

Key Takeaways#

2PC guarantees atomic commitment across distributed nodes
The blocking problem is the fundamental limitation
3PC solves blocking but fails under network partitions
XA is the standard API for 2PC across databases
Modern distributed databases use enhanced 2PC internally
Choose 2PC for short, strongly-consistent transactions; sagas for long-running, cross-service workflows

This is article #225 in the Codelit system design series. Explore all articles at codelit.io.

Try it on Codelit

GitHub Integration

Paste any repo URL to generate an interactive architecture diagram from real code

Build this architecture →

Comments

AI search

Try these templates

Distributed Rate Limiter

API rate limiting with sliding window, token bucket, and per-user quotas.

7 components

Distributed Key-Value Store

Redis/DynamoDB-like distributed KV store with consistent hashing, replication, and tunable consistency.

8 components

Build this architecture

Generate an interactive architecture for Two in seconds.

Try it in Codelit →

Two-Phase Commit: The Backbone of Distributed Transactions

Two-Phase Commit (2PC)#

The Core Problem#

How 2PC Works#

Phase 1: Prepare (Voting)#

Phase 2: Commit (Decision)#

The Decision Rule#

State Machine#

The Blocking Problem#

Why Participants Can't Decide Alone#

Recovery and Logging#

Three-Phase Commit (3PC)#

3PC Trade-offs#

XA Transactions#

XA Limitations#

2PC in Distributed Databases#

2PC vs Saga Pattern#

When to Use 2PC#

When to Use Sagas#

Practical Considerations#

Key Takeaways#

Comments

Related articles

AI-Powered Search Architecture: Semantic Search, Hybrid Search, and RAG

AI Safety Guardrails Architecture: Input Validation, Output Filtering, and Human-in-the-Loop

API Backward Compatibility: Ship Changes Without Breaking Consumers

Try these templates

Distributed Rate Limiter

Distributed Key-Value Store

Build this architecture

Two-Phase Commit: The Backbone of Distributed Transactions

Two-Phase Commit (2PC)#

The Core Problem#

How 2PC Works#

Phase 1: Prepare (Voting)#

Phase 2: Commit (Decision)#

The Decision Rule#

State Machine#

The Blocking Problem#

Why Participants Can't Decide Alone#

Recovery and Logging#

Three-Phase Commit (3PC)#

3PC Trade-offs#

XA Transactions#

XA Limitations#

2PC in Distributed Databases#

2PC vs Saga Pattern#

When to Use 2PC#

When to Use Sagas#

Practical Considerations#

Key Takeaways#

Comments

Related articles

AI-Powered Search Architecture: Semantic Search, Hybrid Search, and RAG

AI Safety Guardrails Architecture: Input Validation, Output Filtering, and Human-in-the-Loop

API Backward Compatibility: Ship Changes Without Breaking Consumers

Try these templates

Distributed Rate Limiter

Distributed Key-Value Store

Build this architecture