Distributed Transactions and the Saga Pattern — A Practical Guide
The problem: transactions across services#
In a monolith, transactions are simple. Start a transaction, do some work, commit. If anything fails, rollback. The database handles it.
In microservices, a single business operation might span multiple services, each with its own database:
- Order Service creates the order
- Payment Service charges the card
- Inventory Service reserves the items
- Shipping Service schedules delivery
If step 3 fails (out of stock), you need to undo steps 1 and 2. But they're in different databases. There's no shared transaction.
Two-phase commit (2PC): the textbook answer#
The coordinator asks all participants: "Can you commit?" If everyone says yes, it sends "commit." If anyone says no, it sends "rollback."
Why it's rarely used in microservices:
- The coordinator is a single point of failure
- All participants are blocked during the prepare phase (holding locks)
- If the coordinator crashes after sending "prepare" but before "commit," everyone is stuck
- Performance is terrible at scale — every transaction requires multiple network round trips
2PC works for databases within a single data center. It doesn't work for services that need to be independently deployable and scalable.
The saga pattern: the practical answer#
A saga is a sequence of local transactions. Each service performs its own transaction and publishes an event. If a step fails, compensating transactions undo the previous steps.
Choreography-based sagas#
Each service listens to events and decides what to do:
OrderCreated → Payment Service charges card
PaymentCompleted → Inventory Service reserves items
InventoryReserved → Shipping Service schedules delivery
If inventory fails:
InventoryFailed → Payment Service refunds card
PaymentRefunded → Order Service cancels order
Pros: Decoupled, no central coordinator, simple for small flows. Cons: Hard to understand the full flow. Debugging is painful. Adding steps means modifying multiple services.
Orchestration-based sagas#
A central orchestrator tells each service what to do:
Orchestrator → OrderService.create()
Orchestrator → PaymentService.charge()
Orchestrator → InventoryService.reserve()
Orchestrator → ShippingService.schedule()
On failure:
Orchestrator → PaymentService.refund()
Orchestrator → OrderService.cancel()
Pros: Flow is visible in one place. Easy to add steps. Better for complex workflows. Cons: Orchestrator can become a bottleneck. Still need to handle orchestrator failures.
Compensating transactions#
The key insight: you can't rollback across services. Instead, you compensate — perform a new action that undoes the effect.
| Original action | Compensation |
|---|---|
| Create order | Cancel order |
| Charge card | Refund card |
| Reserve inventory | Release inventory |
| Send email | Send cancellation email |
| Schedule delivery | Cancel delivery |
Not every action is perfectly reversible. You can refund a payment, but you can't un-send an email. Design your saga so irreversible actions happen last.
Idempotency: the non-negotiable requirement#
In distributed systems, messages can be delivered more than once. If "charge card" is executed twice, the customer is charged twice. Every service in a saga must be idempotent.
How: Include a unique transaction ID in every request. Before processing, check if you've already handled this ID. If yes, return the cached result.
Handling edge cases#
What if the orchestrator crashes mid-saga? Store the saga state in a database. On restart, resume from where it left off. The saga state machine persists across crashes.
What if a compensation fails? Retry with exponential backoff. If it keeps failing, alert an operator. Some compensations (like refunds) are critical and must eventually succeed.
What if two sagas conflict? Semantic locks: mark resources as "in-progress" so other sagas wait or fail fast. For example, inventory is "reserved" until the saga completes or times out.
When to use sagas vs when to avoid them#
Use sagas when:
- Business operations span multiple services
- You need eventual consistency (not immediate)
- Each service owns its own data
Avoid sagas when:
- You need strict ACID transactions — use a single database
- The flow is simple enough for synchronous calls with retry
- You can restructure services to keep related data together
See it in action#
On Codelit, generate any e-commerce or payment system and you'll see exactly where distributed transactions happen — the edges between Order Service, Payment Service, and Inventory Service. Click any node to audit the transaction boundaries.
Explore transaction patterns: describe your system on Codelit.io and see how services coordinate across boundaries.
Try it on Codelit
Chaos Mode
Simulate node failures and watch cascading impact across your architecture
Related articles
Try these templates
Build this architecture
Generate an interactive architecture for Distributed Transactions and the Saga Pattern in seconds.
Try it in Codelit →
Comments