Event-Driven Architecture: Patterns, Brokers & Production Strategies
Event-Driven Architecture: Patterns, Brokers & Production Strategies#
Request/response served us well for decades. But as systems scale, synchronous calls become bottlenecks — tight coupling, cascading failures, and throughput ceilings. Event-driven architecture (EDA) flips the model: services communicate by producing and consuming events asynchronously.
What Is Event-Driven Architecture?#
In an event-driven system, components emit events when something meaningful happens. Other components react to those events independently. The producer does not know — or care — who consumes the event.
┌──────────┐ event ┌─────────────┐ event ┌──────────────┐
│ Order │ ──────────▶ │ Message │ ──────────▶ │ Inventory │
│ Service │ │ Broker │ │ Service │
└──────────┘ │ │ └──────────────┘
│ │ event ┌──────────────┐
│ │ ──────────▶ │ Notification│
└─────────────┘ │ Service │
└──────────────┘
This decoupling is the core value proposition. Services evolve independently, scale independently, and fail independently.
Event Types#
Not all events are the same. Understanding the distinctions matters for schema design and routing.
Domain Events#
Facts about something that happened within a bounded context. They are immutable records of state changes.
{
"type": "OrderPlaced",
"orderId": "ord-9182",
"customerId": "cust-441",
"total": 129.99,
"timestamp": "2026-03-28T14:22:00Z"
}
Integration Events#
Published across service boundaries. They carry only the data other services need — never internal implementation details.
{
"type": "PaymentCompleted",
"orderId": "ord-9182",
"amount": 129.99,
"currency": "USD"
}
Commands#
Unlike events, commands express intent — a request for something to happen. They target a specific consumer.
{
"type": "ShipOrder",
"orderId": "ord-9182",
"warehouse": "us-east-1"
}
Core Patterns#
Pub/Sub Pattern#
The simplest EDA pattern. Producers publish to a topic, subscribers receive a copy. No direct coupling.
# Publisher
await broker.publish("orders.placed", event)
# Subscriber A — inventory
@subscribe("orders.placed")
async def reserve_stock(event):
await inventory.reserve(event["orderId"], event["items"])
# Subscriber B — notifications
@subscribe("orders.placed")
async def notify_customer(event):
await email.send_order_confirmation(event["customerId"])
Pub/sub works well when multiple consumers need to react to the same event independently.
Event Sourcing#
Instead of storing current state, store the sequence of events that produced it. The current state is derived by replaying events.
Event Store for Order ord-9182:
─────────────────────────────────
1. OrderCreated { items: [...], customer: "cust-441" }
2. PaymentReceived { amount: 129.99 }
3. OrderShipped { trackingId: "TRK-882" }
4. OrderDelivered { signature: "J. Smith" }
Current state = fold(events) → { status: "delivered", ... }
Benefits: full audit trail, time-travel debugging, ability to rebuild read models. Costs: increased storage, eventual consistency, replay complexity.
CQRS (Command Query Responsibility Segregation)#
Separate the write model from the read model. Writes go through command handlers that emit events. Reads are served from projections optimized for queries.
┌─────────┐ command ┌───────────┐ event ┌────────────┐
│ Client │ ─────────▶ │ Write │ ────────▶ │ Event │
│ │ │ Model │ │ Store │
└─────────┘ └───────────┘ └────────────┘
│ │
│ query ┌───────────┐ projection ┌─────┘
└────────▶ │ Read │ ◀───────────────┘
│ Model │
└───────────┘
CQRS pairs naturally with event sourcing. The write side appends events; the read side projects them into denormalized views tuned for specific queries.
Saga Pattern#
Long-running business processes that span multiple services. Each step emits an event that triggers the next. If a step fails, compensating actions undo previous steps.
OrderSaga:
1. OrderPlaced → reserve inventory
2. InventoryReserved → charge payment
3. PaymentCharged → ship order
4. ShipmentFailed → refund payment (compensate)
5. PaymentRefunded → release inventory (compensate)
Sagas replace distributed transactions (2PC) with eventual consistency and explicit compensation logic.
Message Brokers Compared#
| Broker | Model | Ordering | Replay | Best For |
|---|---|---|---|---|
| Apache Kafka | Log-based | Per-partition | Yes | High-throughput streaming, event sourcing |
| RabbitMQ | Queue-based | Per-queue | No | Task distribution, RPC, routing |
| AWS SQS | Queue-based | FIFO optional | No | Serverless workloads, simple async |
| Redis Streams | Log-based | Per-stream | Yes | Low-latency, lightweight event logs |
Kafka#
The default choice for event-driven systems at scale. Partitioned log provides ordering, replay, and consumer groups.
producer.send("orders", key=order_id, value=event)
# Consumer group — each partition handled by one consumer
for message in consumer:
process(message.value)
RabbitMQ#
Excels at complex routing with exchanges, bindings, and queues. Better for task-queue patterns than event streaming.
SQS#
Managed, serverless-friendly. Pair with SNS for fan-out (pub/sub). FIFO queues add ordering guarantees at lower throughput.
Redis Streams#
Lightweight alternative when you need stream semantics without Kafka's operational overhead. Good for moderate-scale event logs.
When to Use Event-Driven vs Request/Response#
Use event-driven when:
- Multiple services need to react to the same trigger
- You need temporal decoupling (producer and consumer run at different times)
- Workloads are bursty and benefit from buffering
- You want an audit trail of state changes
- Services should evolve independently
Stick with request/response when:
- You need an immediate, synchronous answer
- The interaction is a simple query with no side effects
- Latency requirements are sub-millisecond
- The system is simple enough that async adds unnecessary complexity
Most production systems use both. API gateways handle synchronous queries; background processing flows through events.
Error Handling in Event-Driven Systems#
Async systems demand deliberate error strategies. Failures are invisible unless you design for them.
Dead Letter Queues (DLQ)#
Messages that fail processing after N retries land in a DLQ for inspection and manual replay.
Main Queue → Consumer → [fails 3x] → Dead Letter Queue
│
inspect & fix
│
replay to main queue
Retry Strategies#
Use exponential backoff with jitter to avoid thundering herds.
async def process_with_retry(event, max_retries=3):
for attempt in range(max_retries):
try:
await handle(event)
return
except TransientError:
delay = (2 ** attempt) + random.uniform(0, 1)
await asyncio.sleep(delay)
await dead_letter_queue.send(event)
Idempotency#
Events may be delivered more than once. Every consumer must handle duplicates safely.
async def handle_payment(event):
idempotency_key = event["eventId"]
if await already_processed(idempotency_key):
return # skip duplicate
await charge(event["amount"])
await mark_processed(idempotency_key)
Store processed event IDs in a set or database. This is non-negotiable in production.
Putting It All Together#
A real-world e-commerce flow combining these patterns:
Customer places order (HTTP)
→ API Gateway (sync)
→ Order Service writes OrderPlaced event to Kafka
├─▶ Inventory Service reserves stock
├─▶ Payment Service charges card → emits PaymentCompleted
├─▶ Notification Service sends confirmation email
└─▶ Analytics Service updates dashboards
If payment fails:
→ Saga orchestrator emits compensating events
→ Inventory releases reservation
→ Customer receives failure notification
Each service owns its data, processes events at its own pace, and scales independently. Kafka provides the durable backbone. DLQs catch failures. Idempotency keys prevent duplicates.
Key Takeaways#
- Events are facts, commands are requests — design schemas accordingly
- Pub/sub decouples producers from consumers; add subscribers without changing producers
- Event sourcing trades storage for auditability and flexibility
- CQRS separates read and write concerns for independent optimization
- Sagas replace distributed transactions with compensating actions
- Choose your broker based on ordering, replay, and throughput needs
- Never skip dead letter queues, retries with backoff, and idempotent consumers
Design your event-driven architecture at codelit.io.
125 articles on system design at codelit.io/blog.
Try it on Codelit
Chaos Mode
Simulate node failures and watch cascading impact across your architecture
Comments