Message Ordering Guarantees — FIFO, Partition Keys, and Causal Consistency
Why message ordering matters#
When Service A sends three events — created, updated, deleted — the consumer must process them in that exact sequence. If deleted arrives before created, the consumer either crashes or silently corrupts state.
Ordering sounds trivial in a single-process system. In distributed messaging, it is anything but.
FIFO queues — the simplest guarantee#
A strict FIFO (first-in, first-out) queue ensures messages are delivered in exactly the order they were enqueued.
How it works:
- Producer sends messages sequentially to a single queue
- The broker persists them in arrival order
- A single consumer reads them one at a time, acknowledging each before receiving the next
Trade-offs:
- Throughput is limited to one consumer per queue
- If the consumer fails mid-processing, redelivery can cause duplicates
- Scaling requires partitioning, which weakens the global ordering guarantee
AWS SQS FIFO queues follow this model, capping throughput at 300 messages per second per group (3,000 with batching).
Partition-based ordering — the Kafka model#
Apache Kafka takes a different approach: ordering is guaranteed within a partition, not across partitions.
Topic: order-events (3 partitions)
Partition 0: [order-101-created] → [order-101-updated] → [order-101-shipped]
Partition 1: [order-202-created] → [order-202-cancelled]
Partition 2: [order-303-created] → [order-303-updated]
Partition key selection determines which messages share ordering guarantees. Choosing order_id as the partition key ensures all events for a single order land in the same partition and are consumed in sequence.
Choosing the right partition key#
| Key choice | Ordering scope | Risk |
|---|---|---|
user_id | All events for a user are ordered | Hot partitions for active users |
order_id | Per-order consistency | Good distribution, limited cross-order ordering |
entity_type | All entities of a type are ordered | Severe hot-spotting |
| Random / round-robin | No ordering | Maximum throughput |
Rule of thumb: pick the narrowest entity that still satisfies your ordering requirement.
Sequence numbers#
Sequence numbers let consumers detect gaps and reorder messages independently of the broker.
Message { sequence: 42, payload: "updated", entity: "order-101" }
Message { sequence: 43, payload: "shipped", entity: "order-101" }
Implementation patterns:
- Per-producer sequences — each producer maintains a monotonically increasing counter. Consumers track the last-seen sequence per producer.
- Per-entity sequences — the sequence counter is scoped to the entity (e.g., per order). This detects gaps within a single entity's event stream.
- Global sequences — a single counter across all messages. Difficult to scale but provides total ordering.
Gap detection#
When a consumer receives sequence 45 but last processed 42, it knows messages 43 and 44 are missing. The consumer can:
- Buffer and wait — hold message 45 until 43 and 44 arrive (adds latency)
- Request retransmission — ask the producer or broker to resend the missing messages
- Skip with a warning — process 45 and flag the gap for investigation
Causal ordering#
Causal ordering guarantees that if event B was caused by event A, every consumer sees A before B. Events with no causal relationship can arrive in any order.
Example: a user posts a comment (event A), then edits it (event B). Causal ordering ensures no consumer sees the edit before the original post. But two independent users posting comments simultaneously have no causal relationship — their events can arrive in either order.
Vector clocks#
Vector clocks are the classic mechanism for tracking causality:
Node A: [A:1, B:0] → sends message → Node B
Node B: [A:1, B:1] → sends reply → Node A
Node A: [A:2, B:1] → knows B's reply was caused by A's message
Each node maintains a vector of counters, one per node. When a message is sent, the sender's vector is attached. The receiver merges vectors by taking the element-wise maximum.
Limitations:
- Vector size grows with the number of nodes
- Impractical for systems with thousands of producers
- Hybrid logical clocks (HLC) offer a more compact alternative
Total ordering#
Total ordering means every consumer sees every message in the exact same sequence. This is the strongest guarantee and the hardest to achieve.
Approaches:
- Single-leader replication — one node assigns sequence numbers to all messages. Simple but creates a bottleneck and single point of failure.
- Consensus protocols — Raft or Paxos elect a leader that serializes messages. Tolerates failures but adds latency per message.
- Lamport timestamps — provide a total order, but it may not reflect real-time causality. Two events with no causal relationship are ordered arbitrarily but consistently.
| Guarantee | Strength | Cost |
|---|---|---|
| No ordering | Weakest | Lowest latency, highest throughput |
| FIFO per-producer | Moderate | Single-producer bottleneck |
| Partition ordering | Practical | Good throughput with careful key selection |
| Causal ordering | Strong | Vector clock overhead |
| Total ordering | Strongest | Consensus protocol latency |
Handling out-of-order messages#
Even with ordering guarantees, network partitions, retries, and consumer failures can deliver messages out of order. Robust systems plan for this.
Reorder buffer#
Hold messages in a buffer, sorted by sequence number. Release them to the application layer only when all preceding messages have arrived.
Buffer: [seq:47, seq:49, seq:50]
Waiting for: seq:48
→ On arrival of 48: release 47, 48, 49, 50 in order
Set a buffer timeout — if the missing message does not arrive within N seconds, either skip it or escalate to a dead letter queue.
Idempotent consumers#
Design consumers so that processing the same message twice produces the same result. This lets you safely retry without worrying about ordering-induced duplicates.
- Store processed message IDs in a deduplication table
- Use database upserts instead of inserts
- Make state transitions idempotent (e.g., "set status to shipped" rather than "increment version")
Last-writer-wins (LWW)#
For some use cases, strict ordering is unnecessary. Attach a timestamp to each message and always apply the message with the latest timestamp, discarding older ones.
Warning: LWW can silently drop valid updates if clocks are skewed. Only use it when the latest state is all that matters (e.g., caching, status indicators).
Ordering across microservices#
When an event must flow through multiple services in order, consider:
- Choreography with partition keys — each service publishes to the next topic using the same partition key, preserving per-entity order throughout the pipeline
- Orchestration with a saga — a central coordinator sequences the steps explicitly
- Event sourcing — store all events in an ordered log. Services replay the log to rebuild state, inherently preserving order.
Choosing the right guarantee#
Start with the weakest guarantee that satisfies your requirements. Stronger ordering always costs throughput, latency, or both.
- Chat messages: causal ordering (replies appear after the messages they reference)
- Financial transactions: total ordering within an account (partition by account ID)
- Analytics events: no ordering required (maximize throughput)
- Inventory updates: FIFO per SKU (partition by SKU)
Explore message ordering visually#
On Codelit, generate a Kafka consumer group or an SQS FIFO pipeline to see how partition keys, sequence numbers, and reorder buffers interact in a live architecture diagram.
This is article #310 in the Codelit engineering blog series.
Build and explore distributed system architectures visually at codelit.io.
Try it on Codelit
Chaos Mode
Simulate node failures and watch cascading impact across your architecture
Related articles
Try these templates
Slack-like Team Messaging
Workspace-based team messaging with channels, threads, file sharing, and integrations.
9 componentsWhatsApp-Scale Messaging System
End-to-end encrypted messaging with offline delivery, group chats, and media sharing at billions-of-messages scale.
9 componentsTelegram Messaging Platform
Cloud-based messaging with channels, groups, bots, E2E encryption, file sharing, and global MTProto network.
10 componentsBuild this architecture
Generate an interactive architecture for Message Ordering Guarantees in seconds.
Try it in Codelit →
Comments