Outbox Pattern: Reliable Messaging Without Distributed Transactions
Outbox Pattern#
When a service needs to update its database and publish a message, things break. The database commit succeeds but the message broker is down. Or the message publishes but the database transaction rolls back. The outbox pattern solves this dual-write problem.
The Dual-Write Problem#
Service receives request:
1. UPDATE orders SET status = 'confirmed' WHERE id = 42;
2. PUBLISH event: OrderConfirmed to Kafka
What can go wrong:
✗ Step 1 succeeds, step 2 fails → DB updated, no event sent
✗ Step 2 succeeds, step 1 fails → Event sent, DB not updated
✗ Both succeed but step 1 rolls back → Event already sent, can't unsend
You cannot atomically write to two different systems (database + message broker) without a distributed transaction. And distributed transactions (2PC) are slow, fragile, and most message brokers don't support them.
The Solution: Transactional Outbox#
Instead of writing to two systems, write to one system — your database — and let a separate process relay messages to the broker.
Service:
BEGIN TRANSACTION
UPDATE orders SET status = 'confirmed' WHERE id = 42;
INSERT INTO outbox (id, event_type, payload, created_at)
VALUES (uuid(), 'OrderConfirmed', '{"orderId": 42}', NOW());
COMMIT
Relay process (separate):
READ from outbox table → PUBLISH to Kafka → DELETE from outbox
The business write and the event are in the same database transaction. Either both happen or neither does. Atomicity solved.
Outbox Table Schema#
CREATE TABLE outbox (
id UUID PRIMARY KEY,
aggregate_type VARCHAR(255) NOT NULL, -- 'Order', 'Payment'
aggregate_id VARCHAR(255) NOT NULL, -- '42'
event_type VARCHAR(255) NOT NULL, -- 'OrderConfirmed'
payload JSONB NOT NULL, -- full event data
created_at TIMESTAMP NOT NULL DEFAULT NOW(),
published_at TIMESTAMP NULL -- set when relay publishes
);
CREATE INDEX idx_outbox_unpublished
ON outbox (created_at) WHERE published_at IS NULL;
Key design choices:
- aggregate_type + aggregate_id — enables routing to topic partitions
- payload as JSONB — the event is self-contained, no joins needed at relay time
- published_at — tracks relay progress, enables exactly-once semantics
CDC-Based Outbox (Debezium)#
Change Data Capture (CDC) reads the database transaction log and streams changes to Kafka. No polling, no delay.
PostgreSQL WAL → Debezium Connector → Kafka
Flow:
1. Service writes to outbox table
2. PostgreSQL records the INSERT in the WAL
3. Debezium reads the WAL entry
4. Debezium publishes the event to Kafka
5. Debezium tracks its position in the WAL (offset)
Debezium configuration for outbox:
{
"connector.class": "io.debezium.connector.postgresql.PostgresConnector",
"database.hostname": "db-host",
"database.dbname": "orders_db",
"table.include.list": "public.outbox",
"transforms": "outbox",
"transforms.outbox.type": "io.debezium.transforms.outbox.EventRouter",
"transforms.outbox.route.topic.replacement": "${routedByValue}",
"transforms.outbox.table.field.event.key": "aggregate_id",
"transforms.outbox.table.field.event.type": "event_type",
"transforms.outbox.table.field.event.payload": "payload"
}
The EventRouter transform extracts the payload and routes events to the correct Kafka topic based on aggregate type.
Advantages of CDC:
- Near real-time — millisecond latency from commit to Kafka
- No polling load on the database
- Exactly-once relay — Debezium tracks WAL offsets
- Outbox table can be truncated — CDC reads from WAL, not the table
Polling Publisher#
If CDC is too complex, use a simple polling approach:
Polling relay (runs every 100ms - 5s):
1. SELECT * FROM outbox WHERE published_at IS NULL
ORDER BY created_at LIMIT 100;
2. For each row:
→ Publish to Kafka
→ UPDATE outbox SET published_at = NOW() WHERE id = :id;
3. Optionally: DELETE FROM outbox WHERE published_at IS NOT NULL
AND published_at < NOW() - INTERVAL '7 days';
Trade-offs vs CDC:
| Factor | Polling | CDC (Debezium) |
|---|---|---|
| Latency | 100ms-5s | Milliseconds |
| DB load | Periodic queries | Reads WAL (minimal) |
| Complexity | Simple | Requires connector infrastructure |
| Ordering | Per-query ordering | WAL order (strict) |
| Setup | Just code | Kafka Connect + Debezium |
Use polling for lower-throughput systems or when you want to avoid Debezium infrastructure.
Inbox Pattern for Consumers#
The outbox solves the producer side. The inbox pattern solves the consumer side — ensuring messages are processed exactly once.
Consumer receives event:
BEGIN TRANSACTION
-- Check if already processed
SELECT 1 FROM inbox WHERE event_id = :id;
IF EXISTS → SKIP (already processed)
-- Process the event
INSERT INTO inbox (event_id, processed_at) VALUES (:id, NOW());
UPDATE inventory SET quantity = quantity - 1 WHERE product_id = :pid;
COMMIT
-- ACK message to broker
Inbox table:
CREATE TABLE inbox (
event_id UUID PRIMARY KEY,
processed_at TIMESTAMP NOT NULL DEFAULT NOW()
);
-- Clean up old entries periodically
DELETE FROM inbox WHERE processed_at < NOW() - INTERVAL '30 days';
The inbox deduplicates at the database level — the event_id primary key constraint prevents double processing.
Exactly-Once Delivery#
True exactly-once delivery across distributed systems is impossible (see the Two Generals Problem). What we achieve is effectively exactly-once through:
Producer side (outbox):
→ At-least-once publishing (relay retries on failure)
→ Deduplication via event ID
Consumer side (inbox):
→ At-least-once consumption (broker redelivers on no ACK)
→ Deduplication via inbox table
Combined:
→ Effectively exactly-once processing
The key insight: at-least-once delivery + idempotent processing = effectively exactly-once.
Idempotent Consumers#
Even without an inbox table, you can design consumers to be naturally idempotent:
Idempotent operations (safe to repeat):
SET status = 'confirmed' WHERE id = 42
UPSERT user SET email = 'new@example.com' WHERE id = 7
DELETE FROM cart WHERE user_id = 42
Non-idempotent operations (dangerous to repeat):
UPDATE balance SET amount = amount - 100 -- deducts twice!
INSERT INTO ledger (amount) VALUES (-100) -- duplicate entry!
Strategies for making non-idempotent operations safe:
- Inbox table — explicit deduplication (covered above)
- Idempotency key — store a unique key per operation, reject duplicates
- Conditional updates —
UPDATE balance SET amount = 400 WHERE amount = 500(expected value) - Version/sequence numbers — reject events with sequence <= last processed
Implementation Architecture#
Order Service Inventory Service
┌──────────────────┐ ┌──────────────────┐
│ API handler │ │ Kafka consumer │
│ ↓ │ │ ↓ │
│ Business logic │ │ Check inbox table │
│ ↓ │ │ ↓ │
│ DB transaction: │ │ DB transaction: │
│ - orders table │ │ - inbox table │
│ - outbox table │ │ - inventory table│
└────────┬─────────┘ └──────────────────┘
│ ↑
┌────┴────┐ │
│ Debezium │ ──── Kafka ───────────────────┘
│ (CDC) │ topic: order-events
└─────────┘
Ordering Guarantees#
Events for the same aggregate must be processed in order:
Kafka topic: order-events
Partition key: aggregate_id (order ID)
Partition 0: [OrderCreated:42, OrderConfirmed:42, OrderShipped:42]
Partition 1: [OrderCreated:99, OrderCancelled:99]
By using aggregate_id as the partition key, all events for order 42 land in the same partition, guaranteeing order within that aggregate.
Anti-Patterns#
- Skipping the outbox — publishing directly from the service. Works until it doesn't.
- Large payloads in outbox — store only event data, not entire database snapshots.
- No cleanup — outbox table grows forever. Truncate or archive processed rows.
- Ignoring consumer idempotency — the outbox guarantees at-least-once, not exactly-once. Consumers must handle duplicates.
- Single global outbox — for high-throughput systems, partition the outbox table per aggregate type.
Summary#
- The dual-write problem is real — you cannot atomically write to DB + broker
- Transactional outbox writes the event to the same DB transaction as the business data
- CDC with Debezium provides low-latency, reliable relay from outbox to Kafka
- Polling publisher is simpler but adds latency and DB load
- Inbox pattern deduplicates on the consumer side
- Idempotent consumers are essential — at-least-once + idempotency = effectively exactly-once
- Partition by aggregate ID to maintain event ordering
Design your event-driven architecture at codelit.io — generate interactive diagrams with message flow visualization.
235 articles on system design at codelit.io/blog.
Try it on Codelit
Chaos Mode
Simulate node failures and watch cascading impact across your architecture
Related articles
AI Agent Tool Use Architecture: Function Calling, ReAct Loops & Structured Outputs
6 min read
AI searchAI-Powered Search Architecture: Semantic Search, Hybrid Search, and RAG
8 min read
AI safetyAI Safety Guardrails Architecture: Input Validation, Output Filtering, and Human-in-the-Loop
8 min read
Try these templates
Scalable SaaS Application
Modern SaaS with microservices, event-driven processing, and multi-tenant architecture.
10 componentsNetflix Video Streaming Architecture
Global video streaming platform with adaptive bitrate, CDN distribution, and recommendation engine.
10 componentsSlack-like Team Messaging
Workspace-based team messaging with channels, threads, file sharing, and integrations.
9 componentsBuild this architecture
Generate an interactive architecture for Outbox Pattern in seconds.
Try it in Codelit →
Comments