event-driven architecturemicroservicesevent busschema registryAvroProtobufevent sourcingconsumer groupssystem design

Event-Driven Microservices: Events as First-Class Citizens

March 29, 2026 6 min readBy Codelit Team Discussion

Most microservice architectures start with synchronous HTTP calls between services. That works until one downstream service is slow or unavailable and the entire request chain stalls. Event-driven microservices solve this by making events first-class citizens — every meaningful state change is published as an immutable fact that any interested service can consume on its own schedule.

Why Events Matter#

An event captures something that already happened: OrderPlaced, PaymentReceived, InventoryReserved. Unlike commands (which ask a service to do something), events are statements of fact. This distinction unlocks three architectural benefits:

Temporal decoupling — The producer does not wait for the consumer to process the event.
Spatial decoupling — Producers and consumers do not need to know each other's network addresses.
Replay and audit — Because events are persisted, you can rebuild state or debug production issues by replaying them.

The Event Bus#

The event bus is the backbone of an event-driven system. It accepts events from producers, stores them durably, and delivers them to consumers.

┌───────────┐        ┌──────────────────┐        ┌───────────┐
│  Order    │──publish──▶│                  │──deliver──▶│  Payment  │
│  Service  │        │   Event Bus       │        │  Service  │
└───────────┘        │  (Kafka / Pulsar) │        └───────────┘
                     │                  │
┌───────────┐        │                  │        ┌───────────┐
│  Shipping │◀─deliver──│                  │◀─publish──│  Inventory│
│  Service  │        └──────────────────┘        │  Service  │
└───────────┘                                    └───────────┘

Popular implementations include Apache Kafka, Apache Pulsar, Amazon EventBridge, NATS JetStream, and Google Pub/Sub. Each differs in ordering guarantees, retention semantics, and delivery modes — but the core abstraction is the same: topics (or channels) that decouple writers from readers.

Event Schema Registry#

When dozens of services emit and consume events, schema drift becomes a production-breaking risk. A schema registry acts as the single source of truth for event shapes.

How It Works#

Producers register their event schema before publishing.
The registry validates that the new schema is compatible with previous versions.
Consumers fetch the schema at read time so they can deserialize correctly.

Producer ──▶ Schema Registry ──▶ Compatibility Check
                │                        │
                ▼                        ▼
         Store schema v3          Reject if breaking
                │
                ▼
         Consumer fetches v3 schema on read

Confluent Schema Registry, AWS Glue Schema Registry, and Apicurio are popular choices.

Avro vs Protobuf for Event Serialization#

Two dominant formats for encoding events are Apache Avro and Protocol Buffers (Protobuf).

Aspect	Avro	Protobuf
Schema location	Embedded or registry	`.proto` files compiled to code
Evolution model	Reader/writer schemas resolved at runtime	Field numbers provide forward/backward compat
Payload size	Compact binary, no field tags	Compact binary with field tags
Code generation	Optional	Required
Ecosystem fit	Kafka-native (Confluent)	gRPC-native (Google)

Rule of thumb: If your bus is Kafka and your ecosystem is JVM-heavy, Avro is the path of least resistance. If you already use gRPC or need strong cross-language codegen, Protobuf is the better choice.

Event Versioning Strategies#

Events evolve over time. Fields get added, renamed, or deprecated. A versioning strategy prevents consumer breakage.

Strategy 1 — Schema Evolution with Compatibility Modes#

Schema registries enforce one of four compatibility modes:

Backward — New schema can read data written by the old schema. Consumers upgrade first.
Forward — Old schema can read data written by the new schema. Producers upgrade first.
Full — Both directions. Safest, but most restrictive.
None — No checks. Dangerous in production.

Strategy 2 — Event Type Versioning#

Include the version in the event type name:

topic: orders.v2.placed

This lets v1 and v2 consumers coexist during migration. The downside is topic proliferation if not managed carefully.

Strategy 3 — Envelope Pattern#

Wrap every event in a standard envelope:

{
  "eventId": "a]f81d4f-2c3b-4e7a-bb12-deadbeef",
  "eventType": "OrderPlaced",
  "schemaVersion": 3,
  "timestamp": "2026-03-29T10:00:00Z",
  "source": "order-service",
  "data": { }
}

The schemaVersion field lets consumers route to the correct deserialization logic.

Consumer Groups and Partitioning#

In Kafka (and similar systems), a consumer group is a set of consumer instances that cooperate to read from a topic. Each partition within the topic is assigned to exactly one consumer in the group, enabling parallel processing without duplicate delivery.

Topic: order-events (4 partitions)

Consumer Group: payment-service
  ├── instance-1 → partition 0, partition 1
  └── instance-2 → partition 2, partition 3

Consumer Group: analytics-service
  ├── instance-1 → partition 0
  ├── instance-2 → partition 1
  ├── instance-3 → partition 2
  └── instance-4 → partition 3

Key Concepts#

Partition key — Determines which partition an event lands in. Use a natural key (e.g., orderId) to guarantee ordering for related events.
Rebalancing — When a consumer joins or leaves, partitions are reassigned. Cooperative rebalancing minimizes disruption.
Offset management — Consumers track their position (offset) in each partition. Committing offsets too early risks data loss; committing too late causes duplicates.

Idempotency and Exactly-Once Semantics#

Network failures mean events can be delivered more than once. Design consumers to be idempotent — processing the same event twice produces the same result.

Techniques include:

Deduplication table — Store processed eventId values and skip duplicates.
Idempotent writes — Use database upserts keyed on the event's natural identifier.
Transactional outbox — Write the event and the business state change in a single database transaction, then relay events from the outbox table.

Dead Letter Queues#

When a consumer cannot process an event after several retries, send it to a dead letter queue (DLQ). This prevents one poison message from blocking the entire partition.

Consumer ──▶ Process ──▶ Success
                │
                ▼ (after N retries)
          Dead Letter Queue ──▶ Alert + Manual Review

Monitor DLQ depth as a key operational metric.

Event Sourcing vs Event-Driven#

These terms are related but distinct:

Event-driven architecture — Services communicate through events. The event bus is the integration layer.
Event sourcing — A service stores its internal state as an append-only log of events rather than a mutable row in a database.

You can use event-driven architecture without event sourcing (and vice versa). Combining both gives you full audit trails and temporal queries, but increases operational complexity.

Production Checklist#

Before going live with an event-driven system:

Define a schema registry and enforce backward or full compatibility.
Use a standard event envelope with eventId, eventType, schemaVersion, timestamp, and source.
Choose a partition key that preserves ordering for business-critical flows.
Make every consumer idempotent.
Configure dead letter queues with alerting.
Set topic retention based on replay requirements (7 days is a common default; event-sourced systems may retain indefinitely).
Monitor consumer lag — a growing lag indicates consumers cannot keep up.
Test schema evolution in CI before deploying new event versions.

Wrapping Up#

Event-driven microservices shift the architecture from "call and wait" to "publish and react." The event bus, schema registry, and consumer group model form a production-grade backbone that scales horizontally and degrades gracefully. Start with a single domain event, prove the pattern, and expand from there.

298 articles on system design at codelit.io/blog.

Try it on Codelit

GitHub Integration

Paste any repo URL to generate an interactive architecture diagram from real code

Build this architecture →

Comments

AI search

AI-Powered Search Architecture: Semantic Search, Hybrid Search, and RAG

8 min read

AI safety

AI Safety Guardrails Architecture: Input Validation, Output Filtering, and Human-in-the-Loop

8 min read

API design

API Backward Compatibility: Ship Changes Without Breaking Consumers

6 min read

Try these templates

Scalable SaaS Application

Modern SaaS with microservices, event-driven processing, and multi-tenant architecture.

10 components

Microservices with API Gateway

Microservices architecture with API gateway, service discovery, circuit breakers, and distributed tracing.

10 components

Event Sourcing with CQRS

Event-driven architecture with separate read/write models, event store, projections, and eventual consistency.

10 components

Build this architecture

Generate an interactive architecture for Event in seconds.

Try it in Codelit →

event-driven architecturemicroservicesevent busschema registryAvroProtobufevent sourcingconsumer groupssystem design

Event-Driven Microservices: Events as First-Class Citizens

March 29, 2026 6 min readBy Codelit Team Discussion

Why Events Matter#

Temporal decoupling — The producer does not wait for the consumer to process the event.
Spatial decoupling — Producers and consumers do not need to know each other's network addresses.
Replay and audit — Because events are persisted, you can rebuild state or debug production issues by replaying them.

The Event Bus#

The event bus is the backbone of an event-driven system. It accepts events from producers, stores them durably, and delivers them to consumers.

┌───────────┐        ┌──────────────────┐        ┌───────────┐
│  Order    │──publish──▶│                  │──deliver──▶│  Payment  │
│  Service  │        │   Event Bus       │        │  Service  │
└───────────┘        │  (Kafka / Pulsar) │        └───────────┘
                     │                  │
┌───────────┐        │                  │        ┌───────────┐
│  Shipping │◀─deliver──│                  │◀─publish──│  Inventory│
│  Service  │        └──────────────────┘        │  Service  │
└───────────┘                                    └───────────┘

Event Schema Registry#

When dozens of services emit and consume events, schema drift becomes a production-breaking risk. A schema registry acts as the single source of truth for event shapes.

How It Works#

Producers register their event schema before publishing.
The registry validates that the new schema is compatible with previous versions.
Consumers fetch the schema at read time so they can deserialize correctly.

Producer ──▶ Schema Registry ──▶ Compatibility Check
                │                        │
                ▼                        ▼
         Store schema v3          Reject if breaking
                │
                ▼
         Consumer fetches v3 schema on read

Confluent Schema Registry, AWS Glue Schema Registry, and Apicurio are popular choices.

Avro vs Protobuf for Event Serialization#

Two dominant formats for encoding events are Apache Avro and Protocol Buffers (Protobuf).

Aspect	Avro	Protobuf
Schema location	Embedded or registry	`.proto` files compiled to code
Evolution model	Reader/writer schemas resolved at runtime	Field numbers provide forward/backward compat
Payload size	Compact binary, no field tags	Compact binary with field tags
Code generation	Optional	Required
Ecosystem fit	Kafka-native (Confluent)	gRPC-native (Google)

Event Versioning Strategies#

Events evolve over time. Fields get added, renamed, or deprecated. A versioning strategy prevents consumer breakage.

Strategy 1 — Schema Evolution with Compatibility Modes#

Schema registries enforce one of four compatibility modes:

Backward — New schema can read data written by the old schema. Consumers upgrade first.
Forward — Old schema can read data written by the new schema. Producers upgrade first.
Full — Both directions. Safest, but most restrictive.
None — No checks. Dangerous in production.

Strategy 2 — Event Type Versioning#

Include the version in the event type name:

topic: orders.v2.placed

This lets v1 and v2 consumers coexist during migration. The downside is topic proliferation if not managed carefully.

Strategy 3 — Envelope Pattern#

Wrap every event in a standard envelope:

{
  "eventId": "a]f81d4f-2c3b-4e7a-bb12-deadbeef",
  "eventType": "OrderPlaced",
  "schemaVersion": 3,
  "timestamp": "2026-03-29T10:00:00Z",
  "source": "order-service",
  "data": { }
}

The schemaVersion field lets consumers route to the correct deserialization logic.

Consumer Groups and Partitioning#

Topic: order-events (4 partitions)

Consumer Group: payment-service
  ├── instance-1 → partition 0, partition 1
  └── instance-2 → partition 2, partition 3

Consumer Group: analytics-service
  ├── instance-1 → partition 0
  ├── instance-2 → partition 1
  ├── instance-3 → partition 2
  └── instance-4 → partition 3

Key Concepts#

Partition key — Determines which partition an event lands in. Use a natural key (e.g., orderId) to guarantee ordering for related events.
Rebalancing — When a consumer joins or leaves, partitions are reassigned. Cooperative rebalancing minimizes disruption.
Offset management — Consumers track their position (offset) in each partition. Committing offsets too early risks data loss; committing too late causes duplicates.

Idempotency and Exactly-Once Semantics#

Network failures mean events can be delivered more than once. Design consumers to be idempotent — processing the same event twice produces the same result.

Techniques include:

Deduplication table — Store processed eventId values and skip duplicates.
Idempotent writes — Use database upserts keyed on the event's natural identifier.
Transactional outbox — Write the event and the business state change in a single database transaction, then relay events from the outbox table.

Dead Letter Queues#

When a consumer cannot process an event after several retries, send it to a dead letter queue (DLQ). This prevents one poison message from blocking the entire partition.

Consumer ──▶ Process ──▶ Success
                │
                ▼ (after N retries)
          Dead Letter Queue ──▶ Alert + Manual Review

Monitor DLQ depth as a key operational metric.

Event Sourcing vs Event-Driven#

These terms are related but distinct:

Event-driven architecture — Services communicate through events. The event bus is the integration layer.
Event sourcing — A service stores its internal state as an append-only log of events rather than a mutable row in a database.

You can use event-driven architecture without event sourcing (and vice versa). Combining both gives you full audit trails and temporal queries, but increases operational complexity.

Production Checklist#

Before going live with an event-driven system:

Define a schema registry and enforce backward or full compatibility.
Use a standard event envelope with eventId, eventType, schemaVersion, timestamp, and source.
Choose a partition key that preserves ordering for business-critical flows.
Make every consumer idempotent.
Configure dead letter queues with alerting.
Set topic retention based on replay requirements (7 days is a common default; event-sourced systems may retain indefinitely).
Monitor consumer lag — a growing lag indicates consumers cannot keep up.
Test schema evolution in CI before deploying new event versions.

Wrapping Up#

298 articles on system design at codelit.io/blog.

Try it on Codelit

GitHub Integration

Paste any repo URL to generate an interactive architecture diagram from real code

Build this architecture →

Comments

AI search

AI-Powered Search Architecture: Semantic Search, Hybrid Search, and RAG

8 min read

AI safety

AI Safety Guardrails Architecture: Input Validation, Output Filtering, and Human-in-the-Loop

8 min read

API design

API Backward Compatibility: Ship Changes Without Breaking Consumers

6 min read

Build this architecture

Generate an interactive architecture for Event in seconds.

Try it in Codelit →

Event-Driven Microservices: Events as First-Class Citizens

Why Events Matter#

The Event Bus#

Event Schema Registry#

How It Works#

Avro vs Protobuf for Event Serialization#

Event Versioning Strategies#

Strategy 1 — Schema Evolution with Compatibility Modes#

Strategy 2 — Event Type Versioning#

Strategy 3 — Envelope Pattern#

Consumer Groups and Partitioning#

Key Concepts#

Idempotency and Exactly-Once Semantics#

Dead Letter Queues#

Event Sourcing vs Event-Driven#

Production Checklist#

Wrapping Up#

Comments

Related articles

AI-Powered Search Architecture: Semantic Search, Hybrid Search, and RAG

AI Safety Guardrails Architecture: Input Validation, Output Filtering, and Human-in-the-Loop

API Backward Compatibility: Ship Changes Without Breaking Consumers

Try these templates

Scalable SaaS Application

Microservices with API Gateway

Event Sourcing with CQRS

Build this architecture

Event-Driven Microservices: Events as First-Class Citizens

Why Events Matter#

The Event Bus#

Event Schema Registry#

How It Works#

Avro vs Protobuf for Event Serialization#

Event Versioning Strategies#

Strategy 1 — Schema Evolution with Compatibility Modes#

Strategy 2 — Event Type Versioning#

Strategy 3 — Envelope Pattern#

Consumer Groups and Partitioning#

Key Concepts#

Idempotency and Exactly-Once Semantics#

Dead Letter Queues#

Event Sourcing vs Event-Driven#

Production Checklist#

Wrapping Up#

Comments

Related articles

AI-Powered Search Architecture: Semantic Search, Hybrid Search, and RAG

AI Safety Guardrails Architecture: Input Validation, Output Filtering, and Human-in-the-Loop

API Backward Compatibility: Ship Changes Without Breaking Consumers

Try these templates

Scalable SaaS Application

Microservices with API Gateway

Event Sourcing with CQRS

Build this architecture