The Saga Pattern: Managing Distributed Transactions Without Two-Phase Commit
In a monolith, a single database transaction can wrap multiple operations with ACID guarantees. In a microservices architecture, each service owns its own data store. There is no single transaction boundary. The saga pattern provides a way to maintain data consistency across services without distributed locks.
What Is a Saga?#
A saga is a sequence of local transactions. Each step updates one service and publishes an event or sends a command. If any step fails, the saga executes compensating transactions to undo the changes made by preceding steps.
Order Saga (happy path):
[Create Order] ──► [Reserve Inventory] ──► [Charge Payment] ──► [Ship Order]
T1 T2 T3 T4
If T3 (Charge Payment) fails:
[Ship Order] skipped
[Charge Payment] ──► FAILED
[Undo Reserve Inventory] ◄── C2 (compensating)
[Cancel Order] ◄── C1 (compensating)
Each compensating transaction (C1, C2) reverses the effect of its corresponding forward transaction. Compensations are semantic inverses — they do not roll back database state; they apply a new operation that logically negates the prior one.
Orchestration vs Choreography#
There are two coordination strategies for sagas. The choice shapes how your services communicate and where the workflow logic lives.
Choreography#
Each service listens for events and decides locally what to do next. There is no central coordinator.
OrderService InventoryService PaymentService
│ │ │
│── OrderCreated ──────►│ │
│ │── InventoryReserved ─►│
│ │ │── PaymentCharged ──►
│◄── (listen) ──────────┼──────────────────────┘
Pros:
- Loose coupling. Services are independent.
- No single point of failure in the coordinator.
- Simple for short sagas (2 -- 3 steps).
Cons:
- Hard to understand the overall flow — logic is scattered across services.
- Difficult to add new steps or change ordering.
- Cycle detection and debugging are painful at scale.
Orchestration#
A central saga orchestrator (or saga execution coordinator) tells each service what to do and tracks the overall state.
SagaOrchestrator
│
┌──────── │ ────────┐
▼ ▼ ▼
OrderSvc InventorySvc PaymentSvc
The orchestrator sends commands: "Reserve inventory for order X." Each service replies with success or failure. The orchestrator decides the next step.
Pros:
- Clear, centralised workflow definition.
- Easy to reason about, test, and modify.
- Natural place for retries, timeouts, and compensation logic.
Cons:
- The orchestrator can become a bottleneck or single point of failure (mitigate with clustering).
- Tighter coupling between the orchestrator and participating services.
Recommendation: Use choreography for simple, short-lived flows. Use orchestration for anything with more than three steps, complex branching, or strict ordering requirements.
Compensating Transactions#
A compensating transaction must be idempotent and retryable. The saga framework may invoke it more than once if acknowledgement is lost.
Design guidelines:
- Make compensations safe to replay. Use unique identifiers to detect duplicates.
- Compensations cannot fail permanently. If a compensation fails, it must be retried until it succeeds. This is a hard requirement — otherwise the system is left in an inconsistent state.
- Some steps are not compensatable. Sending an email or charging a credit card may not be perfectly reversible. For these, use a pivot transaction — a point of no return after which only forward recovery (retries) is allowed.
Compensatable steps ──► Pivot transaction ──► Retriable-only steps
(can undo) (point of no return) (must succeed)
Place the pivot transaction as late as possible to maximise the window in which the saga can safely roll back.
Saga Execution Coordinator#
The orchestrator maintains a saga log — a durable record of which steps have completed and which compensations are pending. This log enables recovery after crashes.
State machine representation:
STARTED ──► RESERVING_INVENTORY ──► CHARGING_PAYMENT ──► SHIPPING
│ │ │ │
│ COMPENSATING COMPENSATING │
│ │ │ │
▼ ▼ ▼ ▼
FAILED FAILED FAILED COMPLETED
The coordinator persists state transitions atomically with each step. On restart, it reads the log and resumes from the last committed state — either continuing forward or executing pending compensations.
Failure Handling Strategies#
Backward Recovery#
The default strategy. When a step fails, execute compensating transactions for all previously completed steps in reverse order. Used when the failure occurs before the pivot transaction.
Forward Recovery#
Retry the failed step until it succeeds. Used after the pivot transaction, where compensation is not possible. Requires that the failing step is idempotent and will eventually succeed (perhaps after a transient issue resolves).
Timeout and Deadlines#
Set a deadline for the entire saga. If the saga does not complete within the deadline, trigger compensation. This prevents sagas from hanging indefinitely when a service is down.
Partial Compensation#
In some designs, intermediate states are acceptable temporarily. For example, an order might remain in a "PENDING" state while a payment retry is in progress. The saga coordinator tracks this and resolves it within a bounded time window.
Tools and Frameworks#
Temporal#
Temporal models workflows as code. A saga is a workflow function that calls activities (service operations) and handles failures with try/catch. The Temporal server durably persists workflow state and replays it on failure.
// Pseudocode — Temporal workflow
async function orderSaga(order) {
const orderId = await createOrder(order)
try {
await reserveInventory(orderId)
await chargePayment(orderId)
await shipOrder(orderId)
} catch (err) {
await compensateInventory(orderId)
await cancelOrder(orderId)
}
}
Temporal handles retries, timeouts, and crash recovery transparently. It is the strongest option for complex, long-running sagas.
Axon Framework#
Axon (Java/Kotlin) provides first-class saga support with @SagaEventHandler annotations. Sagas react to events, maintain state, and trigger commands. Axon Server handles event routing and saga persistence.
MassTransit#
MassTransit (.NET) includes a state machine-based saga implementation built on Automatonymous. Define states, events, and transitions declaratively. Supports RabbitMQ, Azure Service Bus, and Amazon SQS as transports.
Eventuate Tram#
A lightweight framework for Java microservices. Provides both choreography-based and orchestration-based saga support with an outbox pattern for reliable messaging.
Saga vs Two-Phase Commit (2PC)#
| Aspect | Saga | 2PC |
|---|---|---|
| Consistency | Eventual | Strong (while lock held) |
| Availability | High — no distributed locks | Low — coordinator failure blocks all participants |
| Latency | Lower — steps run sequentially without holding locks | Higher — prepare phase locks resources |
| Scalability | Scales with services | Bottleneck at coordinator and lock contention |
| Isolation | Weak — intermediate states visible | Strong — changes invisible until commit |
| Failure handling | Compensating transactions | Rollback via abort |
| Use case | Microservices, long-running flows | Single database, short transactions |
Use 2PC when you have a single database or a small number of tightly coupled services that need strict consistency and can tolerate the availability trade-off.
Use sagas when you have multiple services with independent data stores, when transactions are long-running, or when availability matters more than immediate consistency.
Design Tips#
- Keep sagas short. Each additional step increases complexity and the chance of partial failure.
- Make every operation idempotent. Services will receive duplicate messages.
- Use correlation IDs. Track the saga instance across services for debugging and observability.
- Persist the saga log. Without it, crash recovery is impossible.
- Test compensations explicitly. They are production code, not edge cases.
- Avoid nested sagas. If you need a saga within a saga, reconsider your service boundaries.
Key Takeaways#
The saga pattern trades strong consistency for availability and scalability. It is the practical solution for distributed transactions in microservices. Choose orchestration for complex flows, choreography for simple ones, and always design compensating transactions as carefully as the happy path.
This is article #223 in the Codelit engineering series. Want to sharpen your system design and backend skills? Explore more at codelit.io.
Try it on Codelit
GitHub Integration
Paste any repo URL to generate an interactive architecture diagram from real code
Related articles
AI-Powered Search Architecture: Semantic Search, Hybrid Search, and RAG
8 min read
AI safetyAI Safety Guardrails Architecture: Input Validation, Output Filtering, and Human-in-the-Loop
8 min read
AI workflowsAI Workflow Orchestration: Chains, DAGs, Human-in-the-Loop & Production Patterns
6 min read
Try these templates
Scalable SaaS Application
Modern SaaS with microservices, event-driven processing, and multi-tenant architecture.
10 componentsDistributed Rate Limiter
API rate limiting with sliding window, token bucket, and per-user quotas.
7 componentsKubernetes Container Orchestration
K8s cluster with pod scheduling, service mesh, auto-scaling, and CI/CD deployment pipeline.
9 componentsBuild this architecture
Generate an interactive architecture for The Saga Pattern in seconds.
Try it in Codelit →
Comments