queuearchitecturedistributed systemsmessagingsystem design

Queue-Based Architecture: Decouple Services With Asynchronous Messaging

March 29, 2026 7 min readBy Codelit Team Discussion

Queue-Based Architecture#

Queues are the backbone of resilient distributed systems. They decouple producers from consumers, absorb traffic spikes, and guarantee that no work is lost — even when downstream services crash.

Why Queues Matter#

Without queues:

User request → Service A → Service B → Service C → response
If Service B is down → entire request fails

With queues:

User request → Service A → Queue → response (immediate)
Service B picks up from queue when ready
Service C picks up from next queue when ready

The producer doesn't wait. The consumer doesn't rush. The queue absorbs the difference.

Work Queue vs Pub/Sub#

These are fundamentally different patterns:

Work Queue (Point-to-Point)#

One message, one consumer. The message is removed after processing.

Producer → [Queue] → Consumer A (gets message 1)
                   → Consumer B (gets message 2)
                   → Consumer C (gets message 3)

Use cases: task processing, job scheduling, order fulfillment.

Pub/Sub (Fan-Out)#

One message, many consumers. Every subscriber gets a copy.

Producer → [Topic] → Subscriber A (gets ALL messages)
                   → Subscriber B (gets ALL messages)
                   → Subscriber C (gets ALL messages)

Use cases: event notification, audit logging, cache invalidation.

Hybrid Pattern#

Most systems combine both. A topic fans out to multiple queues, each with its own consumer group:

Order placed → [Topic]
  → [Inventory Queue] → Inventory Service
  → [Email Queue] → Notification Service
  → [Analytics Queue] → Analytics Service

Priority Queues#

Not all messages are equal. Priority queues ensure critical work happens first.

# Celery priority example
@app.task(queue='high-priority')
def process_payment(order_id):
    ...

@app.task(queue='low-priority')
def generate_report(report_id):
    ...

Implementation strategies:

Separate queues per priority — simplest, most common
Single queue with priority field — SQS supports 1-256 priority levels
Weighted consumers — 3 workers on high, 1 worker on low

Watch out for starvation — low-priority messages that never get processed. Set a maximum wait time that promotes messages automatically.

Delay Queues#

Messages that should be processed later, not now.

User signs up → [Delay Queue: 24h] → Send welcome email
Payment pending → [Delay Queue: 30min] → Check payment status
Trial started → [Delay Queue: 7d] → Send trial reminder

SQS supports up to 15 minutes of delay natively. For longer delays, use a scheduled job that moves messages from a "pending" store into the queue at the right time.

FIFO Guarantees#

Standard queues offer at-least-once delivery with best-effort ordering. FIFO queues guarantee:

Exactly-once processing — deduplication within a 5-minute window
Strict ordering — messages processed in exact send order

Standard Queue: Message A, B, C → might arrive as B, A, C
FIFO Queue:     Message A, B, C → always arrives as A, B, C

The tradeoff: FIFO queues have lower throughput. SQS FIFO supports 300 messages/second (3,000 with batching) vs unlimited for standard queues.

Use FIFO when order matters: financial transactions, state machines, sequential workflows.

Use standard when throughput matters: log processing, analytics events, notifications.

Message Group IDs#

FIFO doesn't mean global ordering across all messages. Use message group IDs to maintain order within a logical group:

Group "user-123": msg1 → msg2 → msg3 (ordered)
Group "user-456": msg1 → msg2 → msg3 (ordered)
But user-123 and user-456 messages can interleave

This gives you per-entity ordering with parallel processing across entities.

Visibility Timeout#

When a consumer picks up a message, it becomes invisible to other consumers for a set period. This prevents duplicate processing.

1. Consumer A receives message (visibility timeout = 30s)
2. Message hidden from Consumer B and C
3a. Consumer A finishes → deletes message ✓
3b. Consumer A crashes → message reappears after 30s → Consumer B picks it up

Setting the right timeout:

Too short — message reappears before processing finishes, causing duplicates
Too long — if consumer crashes, message sits invisible for too long

Best practice: set timeout to 6x your average processing time. Extend it dynamically if processing takes longer than expected.

Poison Pill Handling#

A poison pill is a message that can never be processed successfully. It fails, returns to the queue, gets picked up, fails again — an infinite loop.

Message "corrupt-data" → Consumer fails → back to queue
→ Consumer fails → back to queue → Consumer fails → ...

Dead Letter Queue (DLQ)#

After N failed attempts, move the message to a separate queue for investigation:

Main Queue → Consumer (fail #1) → retry
           → Consumer (fail #2) → retry
           → Consumer (fail #3) → Dead Letter Queue

Configure maxReceiveCount (typically 3-5). Every message in the DLQ represents a bug or data issue that needs human attention.

DLQ Best Practices#

Monitor DLQ depth — alert when messages arrive
Preserve original metadata — timestamp, source, error reason
Build a redrive mechanism — after fixing the bug, replay DLQ messages back to the main queue
Set DLQ retention longer than the main queue — you need time to investigate

Queue Monitoring#

The metrics that matter:

Metric	What It Tells You	Alert When
Queue depth	Backlog size	Growing steadily
Age of oldest message	How behind you are	Exceeds SLA
Messages in flight	Active processing	Near max consumers
DLQ depth	Failure rate	Any message arrives
Processing time	Consumer performance	P99 exceeds timeout

Scaling Based on Queue Depth#

Queue depth 0-100     → 1 consumer
Queue depth 100-1000  → 5 consumers
Queue depth 1000+     → 20 consumers
Queue depth 10000+    → page on-call

Auto-scaling consumers based on queue depth is one of the most reliable scaling patterns in distributed systems.

Tools and Platforms#

Amazon SQS#

Fully managed, virtually unlimited throughput. Standard and FIFO variants. Best for AWS-native architectures.

Standard: unlimited throughput, at-least-once, best-effort order
FIFO: 300 msg/s, exactly-once, strict order

Celery (Python)#

Distributed task queue with support for multiple brokers (Redis, RabbitMQ). Best for Python applications.

# Define task
@app.task(bind=True, max_retries=3)
def process_order(self, order_id):
    try:
        do_work(order_id)
    except Exception as exc:
        self.retry(exc=exc, countdown=60)

Bull (Node.js)#

Redis-backed queue for Node.js. Supports priorities, delays, rate limiting, and repeatable jobs.

const queue = new Bull('orders');

queue.process(async (job) =&gt; {
  await processOrder(job.data.orderId);
});

// Add with delay and priority
queue.add({ orderId: '123' }, {
  delay: 5000,
  priority: 1,
});

RabbitMQ#

Full-featured message broker supporting AMQP. Offers exchanges, routing, and flexible topologies. Best for complex routing needs.

Apache Kafka#

Distributed log, not a traditional queue. Messages persist and can be replayed. Best for event streaming and high-throughput pipelines.

Architecture Patterns#

Request-Response Over Queues#

When you need a response but want async processing:

Client → Request Queue → Worker processes → Response Queue → Client

Use a correlation ID to match responses to requests.

Competing Consumers#

Multiple consumers on one queue for horizontal scaling:

[Queue] → Consumer 1 (handles 33%)
        → Consumer 2 (handles 33%)
        → Consumer 3 (handles 33%)

Add consumers to scale, remove to save cost. The queue balances load automatically.

Key Takeaways#

Work queues for task distribution, pub/sub for event fan-out
Priority queues prevent critical work from waiting behind bulk operations
FIFO when order matters, standard when throughput matters
Visibility timeout should be 6x average processing time
Dead letter queues catch poison pills — monitor and redrive them
Auto-scale consumers based on queue depth for cost-efficient processing

281 articles on system design at codelit.io/blog.

Try it on Codelit

Chaos Mode

Simulate node failures and watch cascading impact across your architecture

Build this architecture →

Comments

AI agents

Context Engineering for Agentic Systems

2 min read

AI agents

AI Agent Memory Architecture

2 min read

AI agents

Production AI Agent Deployment Checklist

2 min read

Try these templates

Scalable SaaS Application

Modern SaaS with microservices, event-driven processing, and multi-tenant architecture.

10 components

Netflix Video Streaming Architecture

Global video streaming platform with adaptive bitrate, CDN distribution, and recommendation engine.

10 components

Slack-like Team Messaging

Workspace-based team messaging with channels, threads, file sharing, and integrations.

9 components

Build this architecture

Generate an interactive Queue in seconds.

Try it in Codelit →

queuearchitecturedistributed systemsmessagingsystem design

Queue-Based Architecture: Decouple Services With Asynchronous Messaging

March 29, 2026 7 min readBy Codelit Team Discussion

Queue-Based Architecture#

Why Queues Matter#

Without queues:

User request → Service A → Service B → Service C → response
If Service B is down → entire request fails

With queues:

User request → Service A → Queue → response (immediate)
Service B picks up from queue when ready
Service C picks up from next queue when ready

The producer doesn't wait. The consumer doesn't rush. The queue absorbs the difference.

Work Queue vs Pub/Sub#

These are fundamentally different patterns:

Work Queue (Point-to-Point)#

One message, one consumer. The message is removed after processing.

Producer → [Queue] → Consumer A (gets message 1)
                   → Consumer B (gets message 2)
                   → Consumer C (gets message 3)

Use cases: task processing, job scheduling, order fulfillment.

Pub/Sub (Fan-Out)#

One message, many consumers. Every subscriber gets a copy.

Producer → [Topic] → Subscriber A (gets ALL messages)
                   → Subscriber B (gets ALL messages)
                   → Subscriber C (gets ALL messages)

Use cases: event notification, audit logging, cache invalidation.

Hybrid Pattern#

Most systems combine both. A topic fans out to multiple queues, each with its own consumer group:

Order placed → [Topic]
  → [Inventory Queue] → Inventory Service
  → [Email Queue] → Notification Service
  → [Analytics Queue] → Analytics Service

Priority Queues#

Not all messages are equal. Priority queues ensure critical work happens first.

# Celery priority example
@app.task(queue='high-priority')
def process_payment(order_id):
    ...

@app.task(queue='low-priority')
def generate_report(report_id):
    ...

Implementation strategies:

Separate queues per priority — simplest, most common
Single queue with priority field — SQS supports 1-256 priority levels
Weighted consumers — 3 workers on high, 1 worker on low

Watch out for starvation — low-priority messages that never get processed. Set a maximum wait time that promotes messages automatically.

Delay Queues#

Messages that should be processed later, not now.

User signs up → [Delay Queue: 24h] → Send welcome email
Payment pending → [Delay Queue: 30min] → Check payment status
Trial started → [Delay Queue: 7d] → Send trial reminder

SQS supports up to 15 minutes of delay natively. For longer delays, use a scheduled job that moves messages from a "pending" store into the queue at the right time.

FIFO Guarantees#

Standard queues offer at-least-once delivery with best-effort ordering. FIFO queues guarantee:

Exactly-once processing — deduplication within a 5-minute window
Strict ordering — messages processed in exact send order

Standard Queue: Message A, B, C → might arrive as B, A, C
FIFO Queue:     Message A, B, C → always arrives as A, B, C

The tradeoff: FIFO queues have lower throughput. SQS FIFO supports 300 messages/second (3,000 with batching) vs unlimited for standard queues.

Use FIFO when order matters: financial transactions, state machines, sequential workflows.

Use standard when throughput matters: log processing, analytics events, notifications.

Message Group IDs#

FIFO doesn't mean global ordering across all messages. Use message group IDs to maintain order within a logical group:

Group "user-123": msg1 → msg2 → msg3 (ordered)
Group "user-456": msg1 → msg2 → msg3 (ordered)
But user-123 and user-456 messages can interleave

This gives you per-entity ordering with parallel processing across entities.

Visibility Timeout#

When a consumer picks up a message, it becomes invisible to other consumers for a set period. This prevents duplicate processing.

1. Consumer A receives message (visibility timeout = 30s)
2. Message hidden from Consumer B and C
3a. Consumer A finishes → deletes message ✓
3b. Consumer A crashes → message reappears after 30s → Consumer B picks it up

Setting the right timeout:

Too short — message reappears before processing finishes, causing duplicates
Too long — if consumer crashes, message sits invisible for too long

Best practice: set timeout to 6x your average processing time. Extend it dynamically if processing takes longer than expected.

Poison Pill Handling#

A poison pill is a message that can never be processed successfully. It fails, returns to the queue, gets picked up, fails again — an infinite loop.

Message "corrupt-data" → Consumer fails → back to queue
→ Consumer fails → back to queue → Consumer fails → ...

Dead Letter Queue (DLQ)#

After N failed attempts, move the message to a separate queue for investigation:

Main Queue → Consumer (fail #1) → retry
           → Consumer (fail #2) → retry
           → Consumer (fail #3) → Dead Letter Queue

Configure maxReceiveCount (typically 3-5). Every message in the DLQ represents a bug or data issue that needs human attention.

DLQ Best Practices#

Monitor DLQ depth — alert when messages arrive
Preserve original metadata — timestamp, source, error reason
Build a redrive mechanism — after fixing the bug, replay DLQ messages back to the main queue
Set DLQ retention longer than the main queue — you need time to investigate

Queue Monitoring#

The metrics that matter:

Metric	What It Tells You	Alert When
Queue depth	Backlog size	Growing steadily
Age of oldest message	How behind you are	Exceeds SLA
Messages in flight	Active processing	Near max consumers
DLQ depth	Failure rate	Any message arrives
Processing time	Consumer performance	P99 exceeds timeout

Scaling Based on Queue Depth#

Queue depth 0-100     → 1 consumer
Queue depth 100-1000  → 5 consumers
Queue depth 1000+     → 20 consumers
Queue depth 10000+    → page on-call

Auto-scaling consumers based on queue depth is one of the most reliable scaling patterns in distributed systems.

Tools and Platforms#

Amazon SQS#

Fully managed, virtually unlimited throughput. Standard and FIFO variants. Best for AWS-native architectures.

Standard: unlimited throughput, at-least-once, best-effort order
FIFO: 300 msg/s, exactly-once, strict order

Celery (Python)#

Distributed task queue with support for multiple brokers (Redis, RabbitMQ). Best for Python applications.

# Define task
@app.task(bind=True, max_retries=3)
def process_order(self, order_id):
    try:
        do_work(order_id)
    except Exception as exc:
        self.retry(exc=exc, countdown=60)

Bull (Node.js)#

Redis-backed queue for Node.js. Supports priorities, delays, rate limiting, and repeatable jobs.

const queue = new Bull('orders');

queue.process(async (job) =&gt; {
  await processOrder(job.data.orderId);
});

// Add with delay and priority
queue.add({ orderId: '123' }, {
  delay: 5000,
  priority: 1,
});

RabbitMQ#

Full-featured message broker supporting AMQP. Offers exchanges, routing, and flexible topologies. Best for complex routing needs.

Apache Kafka#

Distributed log, not a traditional queue. Messages persist and can be replayed. Best for event streaming and high-throughput pipelines.

Architecture Patterns#

Request-Response Over Queues#

When you need a response but want async processing:

Client → Request Queue → Worker processes → Response Queue → Client

Use a correlation ID to match responses to requests.

Competing Consumers#

Multiple consumers on one queue for horizontal scaling:

[Queue] → Consumer 1 (handles 33%)
        → Consumer 2 (handles 33%)
        → Consumer 3 (handles 33%)

Add consumers to scale, remove to save cost. The queue balances load automatically.

Key Takeaways#

Work queues for task distribution, pub/sub for event fan-out
Priority queues prevent critical work from waiting behind bulk operations
FIFO when order matters, standard when throughput matters
Visibility timeout should be 6x average processing time
Dead letter queues catch poison pills — monitor and redrive them
Auto-scale consumers based on queue depth for cost-efficient processing

281 articles on system design at codelit.io/blog.

Try it on Codelit

Chaos Mode

Simulate node failures and watch cascading impact across your architecture

Build this architecture →

Comments

AI agents

Build this architecture

Generate an interactive Queue in seconds.

Try it in Codelit →

Queue-Based Architecture: Decouple Services With Asynchronous Messaging

Queue-Based Architecture#

Why Queues Matter#

Work Queue vs Pub/Sub#

Work Queue (Point-to-Point)#

Pub/Sub (Fan-Out)#

Hybrid Pattern#

Priority Queues#

Delay Queues#

FIFO Guarantees#

Message Group IDs#

Visibility Timeout#

Poison Pill Handling#

Dead Letter Queue (DLQ)#

DLQ Best Practices#

Queue Monitoring#

Scaling Based on Queue Depth#

Tools and Platforms#

Amazon SQS#

Celery (Python)#

Bull (Node.js)#

RabbitMQ#

Apache Kafka#

Architecture Patterns#

Request-Response Over Queues#

Competing Consumers#

Key Takeaways#

Comments

Related articles

Context Engineering for Agentic Systems

AI Agent Memory Architecture

Production AI Agent Deployment Checklist

Try these templates

Scalable SaaS Application

Netflix Video Streaming Architecture

Slack-like Team Messaging

Build this architecture

Queue-Based Architecture: Decouple Services With Asynchronous Messaging

Queue-Based Architecture#

Why Queues Matter#

Work Queue vs Pub/Sub#

Work Queue (Point-to-Point)#

Pub/Sub (Fan-Out)#

Hybrid Pattern#

Priority Queues#

Delay Queues#

FIFO Guarantees#

Message Group IDs#

Visibility Timeout#

Poison Pill Handling#

Dead Letter Queue (DLQ)#

DLQ Best Practices#

Queue Monitoring#

Scaling Based on Queue Depth#

Tools and Platforms#

Amazon SQS#

Celery (Python)#

Bull (Node.js)#

RabbitMQ#

Apache Kafka#

Architecture Patterns#

Request-Response Over Queues#

Competing Consumers#

Key Takeaways#

Comments

Related articles

Context Engineering for Agentic Systems

AI Agent Memory Architecture

Production AI Agent Deployment Checklist

Try these templates

Scalable SaaS Application

Netflix Video Streaming Architecture

Slack-like Team Messaging

Build this architecture