Queue-Based Architecture: Decouple Services With Asynchronous Messaging
Queue-Based Architecture#
Queues are the backbone of resilient distributed systems. They decouple producers from consumers, absorb traffic spikes, and guarantee that no work is lost — even when downstream services crash.
Why Queues Matter#
Without queues:
User request → Service A → Service B → Service C → response
If Service B is down → entire request fails
With queues:
User request → Service A → Queue → response (immediate)
Service B picks up from queue when ready
Service C picks up from next queue when ready
The producer doesn't wait. The consumer doesn't rush. The queue absorbs the difference.
Work Queue vs Pub/Sub#
These are fundamentally different patterns:
Work Queue (Point-to-Point)#
One message, one consumer. The message is removed after processing.
Producer → [Queue] → Consumer A (gets message 1)
→ Consumer B (gets message 2)
→ Consumer C (gets message 3)
Use cases: task processing, job scheduling, order fulfillment.
Pub/Sub (Fan-Out)#
One message, many consumers. Every subscriber gets a copy.
Producer → [Topic] → Subscriber A (gets ALL messages)
→ Subscriber B (gets ALL messages)
→ Subscriber C (gets ALL messages)
Use cases: event notification, audit logging, cache invalidation.
Hybrid Pattern#
Most systems combine both. A topic fans out to multiple queues, each with its own consumer group:
Order placed → [Topic]
→ [Inventory Queue] → Inventory Service
→ [Email Queue] → Notification Service
→ [Analytics Queue] → Analytics Service
Priority Queues#
Not all messages are equal. Priority queues ensure critical work happens first.
# Celery priority example
@app.task(queue='high-priority')
def process_payment(order_id):
...
@app.task(queue='low-priority')
def generate_report(report_id):
...
Implementation strategies:
- Separate queues per priority — simplest, most common
- Single queue with priority field — SQS supports 1-256 priority levels
- Weighted consumers — 3 workers on high, 1 worker on low
Watch out for starvation — low-priority messages that never get processed. Set a maximum wait time that promotes messages automatically.
Delay Queues#
Messages that should be processed later, not now.
User signs up → [Delay Queue: 24h] → Send welcome email
Payment pending → [Delay Queue: 30min] → Check payment status
Trial started → [Delay Queue: 7d] → Send trial reminder
SQS supports up to 15 minutes of delay natively. For longer delays, use a scheduled job that moves messages from a "pending" store into the queue at the right time.
FIFO Guarantees#
Standard queues offer at-least-once delivery with best-effort ordering. FIFO queues guarantee:
- Exactly-once processing — deduplication within a 5-minute window
- Strict ordering — messages processed in exact send order
Standard Queue: Message A, B, C → might arrive as B, A, C
FIFO Queue: Message A, B, C → always arrives as A, B, C
The tradeoff: FIFO queues have lower throughput. SQS FIFO supports 300 messages/second (3,000 with batching) vs unlimited for standard queues.
Use FIFO when order matters: financial transactions, state machines, sequential workflows.
Use standard when throughput matters: log processing, analytics events, notifications.
Message Group IDs#
FIFO doesn't mean global ordering across all messages. Use message group IDs to maintain order within a logical group:
Group "user-123": msg1 → msg2 → msg3 (ordered)
Group "user-456": msg1 → msg2 → msg3 (ordered)
But user-123 and user-456 messages can interleave
This gives you per-entity ordering with parallel processing across entities.
Visibility Timeout#
When a consumer picks up a message, it becomes invisible to other consumers for a set period. This prevents duplicate processing.
1. Consumer A receives message (visibility timeout = 30s)
2. Message hidden from Consumer B and C
3a. Consumer A finishes → deletes message ✓
3b. Consumer A crashes → message reappears after 30s → Consumer B picks it up
Setting the right timeout:
- Too short — message reappears before processing finishes, causing duplicates
- Too long — if consumer crashes, message sits invisible for too long
Best practice: set timeout to 6x your average processing time. Extend it dynamically if processing takes longer than expected.
Poison Pill Handling#
A poison pill is a message that can never be processed successfully. It fails, returns to the queue, gets picked up, fails again — an infinite loop.
Message "corrupt-data" → Consumer fails → back to queue
→ Consumer fails → back to queue → Consumer fails → ...
Dead Letter Queue (DLQ)#
After N failed attempts, move the message to a separate queue for investigation:
Main Queue → Consumer (fail #1) → retry
→ Consumer (fail #2) → retry
→ Consumer (fail #3) → Dead Letter Queue
Configure maxReceiveCount (typically 3-5). Every message in the DLQ represents a bug or data issue that needs human attention.
DLQ Best Practices#
- Monitor DLQ depth — alert when messages arrive
- Preserve original metadata — timestamp, source, error reason
- Build a redrive mechanism — after fixing the bug, replay DLQ messages back to the main queue
- Set DLQ retention longer than the main queue — you need time to investigate
Queue Monitoring#
The metrics that matter:
| Metric | What It Tells You | Alert When |
|---|---|---|
| Queue depth | Backlog size | Growing steadily |
| Age of oldest message | How behind you are | Exceeds SLA |
| Messages in flight | Active processing | Near max consumers |
| DLQ depth | Failure rate | Any message arrives |
| Processing time | Consumer performance | P99 exceeds timeout |
Scaling Based on Queue Depth#
Queue depth 0-100 → 1 consumer
Queue depth 100-1000 → 5 consumers
Queue depth 1000+ → 20 consumers
Queue depth 10000+ → page on-call
Auto-scaling consumers based on queue depth is one of the most reliable scaling patterns in distributed systems.
Tools and Platforms#
Amazon SQS#
Fully managed, virtually unlimited throughput. Standard and FIFO variants. Best for AWS-native architectures.
Standard: unlimited throughput, at-least-once, best-effort order
FIFO: 300 msg/s, exactly-once, strict order
Celery (Python)#
Distributed task queue with support for multiple brokers (Redis, RabbitMQ). Best for Python applications.
# Define task
@app.task(bind=True, max_retries=3)
def process_order(self, order_id):
try:
do_work(order_id)
except Exception as exc:
self.retry(exc=exc, countdown=60)
Bull (Node.js)#
Redis-backed queue for Node.js. Supports priorities, delays, rate limiting, and repeatable jobs.
const queue = new Bull('orders');
queue.process(async (job) => {
await processOrder(job.data.orderId);
});
// Add with delay and priority
queue.add({ orderId: '123' }, {
delay: 5000,
priority: 1,
});
RabbitMQ#
Full-featured message broker supporting AMQP. Offers exchanges, routing, and flexible topologies. Best for complex routing needs.
Apache Kafka#
Distributed log, not a traditional queue. Messages persist and can be replayed. Best for event streaming and high-throughput pipelines.
Architecture Patterns#
Request-Response Over Queues#
When you need a response but want async processing:
Client → Request Queue → Worker processes → Response Queue → Client
Use a correlation ID to match responses to requests.
Competing Consumers#
Multiple consumers on one queue for horizontal scaling:
[Queue] → Consumer 1 (handles 33%)
→ Consumer 2 (handles 33%)
→ Consumer 3 (handles 33%)
Add consumers to scale, remove to save cost. The queue balances load automatically.
Key Takeaways#
- Work queues for task distribution, pub/sub for event fan-out
- Priority queues prevent critical work from waiting behind bulk operations
- FIFO when order matters, standard when throughput matters
- Visibility timeout should be 6x average processing time
- Dead letter queues catch poison pills — monitor and redrive them
- Auto-scale consumers based on queue depth for cost-efficient processing
281 articles on system design at codelit.io/blog.
Try it on Codelit
Chaos Mode
Simulate node failures and watch cascading impact across your architecture
Related articles
AI Agent Tool Use Architecture: Function Calling, ReAct Loops & Structured Outputs
6 min read
AI searchAI-Powered Search Architecture: Semantic Search, Hybrid Search, and RAG
8 min read
AI safetyAI Safety Guardrails Architecture: Input Validation, Output Filtering, and Human-in-the-Loop
8 min read
Try these templates
Scalable SaaS Application
Modern SaaS with microservices, event-driven processing, and multi-tenant architecture.
10 componentsNetflix Video Streaming Architecture
Global video streaming platform with adaptive bitrate, CDN distribution, and recommendation engine.
10 componentsSlack-like Team Messaging
Workspace-based team messaging with channels, threads, file sharing, and integrations.
9 components
Comments