Event-Driven Microservices: Events as First-Class Citizens
Most microservice architectures start with synchronous HTTP calls between services. That works until one downstream service is slow or unavailable and the entire request chain stalls. Event-driven microservices solve this by making events first-class citizens — every meaningful state change is published as an immutable fact that any interested service can consume on its own schedule.
Why Events Matter#
An event captures something that already happened: OrderPlaced, PaymentReceived, InventoryReserved. Unlike commands (which ask a service to do something), events are statements of fact. This distinction unlocks three architectural benefits:
- Temporal decoupling — The producer does not wait for the consumer to process the event.
- Spatial decoupling — Producers and consumers do not need to know each other's network addresses.
- Replay and audit — Because events are persisted, you can rebuild state or debug production issues by replaying them.
The Event Bus#
The event bus is the backbone of an event-driven system. It accepts events from producers, stores them durably, and delivers them to consumers.
┌───────────┐ ┌──────────────────┐ ┌───────────┐
│ Order │──publish──▶│ │──deliver──▶│ Payment │
│ Service │ │ Event Bus │ │ Service │
└───────────┘ │ (Kafka / Pulsar) │ └───────────┘
│ │
┌───────────┐ │ │ ┌───────────┐
│ Shipping │◀─deliver──│ │◀─publish──│ Inventory│
│ Service │ └──────────────────┘ │ Service │
└───────────┘ └───────────┘
Popular implementations include Apache Kafka, Apache Pulsar, Amazon EventBridge, NATS JetStream, and Google Pub/Sub. Each differs in ordering guarantees, retention semantics, and delivery modes — but the core abstraction is the same: topics (or channels) that decouple writers from readers.
Event Schema Registry#
When dozens of services emit and consume events, schema drift becomes a production-breaking risk. A schema registry acts as the single source of truth for event shapes.
How It Works#
- Producers register their event schema before publishing.
- The registry validates that the new schema is compatible with previous versions.
- Consumers fetch the schema at read time so they can deserialize correctly.
Producer ──▶ Schema Registry ──▶ Compatibility Check
│ │
▼ ▼
Store schema v3 Reject if breaking
│
▼
Consumer fetches v3 schema on read
Confluent Schema Registry, AWS Glue Schema Registry, and Apicurio are popular choices.
Avro vs Protobuf for Event Serialization#
Two dominant formats for encoding events are Apache Avro and Protocol Buffers (Protobuf).
| Aspect | Avro | Protobuf |
|---|---|---|
| Schema location | Embedded or registry | .proto files compiled to code |
| Evolution model | Reader/writer schemas resolved at runtime | Field numbers provide forward/backward compat |
| Payload size | Compact binary, no field tags | Compact binary with field tags |
| Code generation | Optional | Required |
| Ecosystem fit | Kafka-native (Confluent) | gRPC-native (Google) |
Rule of thumb: If your bus is Kafka and your ecosystem is JVM-heavy, Avro is the path of least resistance. If you already use gRPC or need strong cross-language codegen, Protobuf is the better choice.
Event Versioning Strategies#
Events evolve over time. Fields get added, renamed, or deprecated. A versioning strategy prevents consumer breakage.
Strategy 1 — Schema Evolution with Compatibility Modes#
Schema registries enforce one of four compatibility modes:
- Backward — New schema can read data written by the old schema. Consumers upgrade first.
- Forward — Old schema can read data written by the new schema. Producers upgrade first.
- Full — Both directions. Safest, but most restrictive.
- None — No checks. Dangerous in production.
Strategy 2 — Event Type Versioning#
Include the version in the event type name:
topic: orders.v2.placed
This lets v1 and v2 consumers coexist during migration. The downside is topic proliferation if not managed carefully.
Strategy 3 — Envelope Pattern#
Wrap every event in a standard envelope:
{
"eventId": "a]f81d4f-2c3b-4e7a-bb12-deadbeef",
"eventType": "OrderPlaced",
"schemaVersion": 3,
"timestamp": "2026-03-29T10:00:00Z",
"source": "order-service",
"data": { }
}
The schemaVersion field lets consumers route to the correct deserialization logic.
Consumer Groups and Partitioning#
In Kafka (and similar systems), a consumer group is a set of consumer instances that cooperate to read from a topic. Each partition within the topic is assigned to exactly one consumer in the group, enabling parallel processing without duplicate delivery.
Topic: order-events (4 partitions)
Consumer Group: payment-service
├── instance-1 → partition 0, partition 1
└── instance-2 → partition 2, partition 3
Consumer Group: analytics-service
├── instance-1 → partition 0
├── instance-2 → partition 1
├── instance-3 → partition 2
└── instance-4 → partition 3
Key Concepts#
- Partition key — Determines which partition an event lands in. Use a natural key (e.g.,
orderId) to guarantee ordering for related events. - Rebalancing — When a consumer joins or leaves, partitions are reassigned. Cooperative rebalancing minimizes disruption.
- Offset management — Consumers track their position (offset) in each partition. Committing offsets too early risks data loss; committing too late causes duplicates.
Idempotency and Exactly-Once Semantics#
Network failures mean events can be delivered more than once. Design consumers to be idempotent — processing the same event twice produces the same result.
Techniques include:
- Deduplication table — Store processed
eventIdvalues and skip duplicates. - Idempotent writes — Use database upserts keyed on the event's natural identifier.
- Transactional outbox — Write the event and the business state change in a single database transaction, then relay events from the outbox table.
Dead Letter Queues#
When a consumer cannot process an event after several retries, send it to a dead letter queue (DLQ). This prevents one poison message from blocking the entire partition.
Consumer ──▶ Process ──▶ Success
│
▼ (after N retries)
Dead Letter Queue ──▶ Alert + Manual Review
Monitor DLQ depth as a key operational metric.
Event Sourcing vs Event-Driven#
These terms are related but distinct:
- Event-driven architecture — Services communicate through events. The event bus is the integration layer.
- Event sourcing — A service stores its internal state as an append-only log of events rather than a mutable row in a database.
You can use event-driven architecture without event sourcing (and vice versa). Combining both gives you full audit trails and temporal queries, but increases operational complexity.
Production Checklist#
Before going live with an event-driven system:
- Define a schema registry and enforce backward or full compatibility.
- Use a standard event envelope with
eventId,eventType,schemaVersion,timestamp, andsource. - Choose a partition key that preserves ordering for business-critical flows.
- Make every consumer idempotent.
- Configure dead letter queues with alerting.
- Set topic retention based on replay requirements (7 days is a common default; event-sourced systems may retain indefinitely).
- Monitor consumer lag — a growing lag indicates consumers cannot keep up.
- Test schema evolution in CI before deploying new event versions.
Wrapping Up#
Event-driven microservices shift the architecture from "call and wait" to "publish and react." The event bus, schema registry, and consumer group model form a production-grade backbone that scales horizontally and degrades gracefully. Start with a single domain event, prove the pattern, and expand from there.
298 articles on system design at codelit.io/blog.
Try it on Codelit
GitHub Integration
Paste any repo URL to generate an interactive architecture diagram from real code
Related articles
Try these templates
Scalable SaaS Application
Modern SaaS with microservices, event-driven processing, and multi-tenant architecture.
10 componentsMicroservices with API Gateway
Microservices architecture with API gateway, service discovery, circuit breakers, and distributed tracing.
10 componentsEvent Sourcing with CQRS
Event-driven architecture with separate read/write models, event store, projections, and eventual consistency.
10 componentsBuild this architecture
Generate an interactive architecture for Event in seconds.
Try it in Codelit →
Comments