Backpressure & Flow Control in Distributed Systems
Backpressure & Flow Control in Distributed Systems#
When a producer generates data faster than a consumer can process it, something has to give. Without backpressure, the system crashes — queues overflow, memory exhausts, latency spikes, and users see errors.
Backpressure is how systems communicate "slow down" upstream.
The Problem#
Producer (1000 msg/sec) → Queue → Consumer (100 msg/sec)
After 10 seconds: 9000 messages queued
After 100 seconds: 90,000 messages queued → OOM crash
Three Strategies#
1. Drop (Load Shedding)#
Drop excess messages when overloaded:
Producer → Queue (max 10K) → Consumer
↓ full
Drop new messages (or oldest)
When: Real-time data where old data is worthless (metrics, live video, stock tickers). Missing a few data points is better than crashing.
2. Buffer (Absorb Spikes)#
Queue messages and process them eventually:
Producer → Kafka (days of retention) → Consumer (catches up when it can)
When: All messages must be processed but latency is flexible (event sourcing, analytics, email queues).
3. Signal (Slow Down Producer)#
Tell the producer to reduce rate:
Consumer → "I'm overloaded" → Producer reduces rate
When: Producer can actually slow down (API rate limiting, TCP flow control, reactive streams).
Patterns#
Queue Depth Monitoring#
Watch queue size and alert before overflow:
Queue depth < 1000: Normal ✓
Queue depth > 5000: Warning ⚠ — consumers may be falling behind
Queue depth > 10000: Critical 🔴 — add consumers or shed load
// Auto-scale consumers based on queue depth
const depth = await sqs.getQueueAttributes({ QueueUrl, AttributeNames: ["ApproximateNumberOfMessages"] });
if (depth > 5000) scaleUp(workers);
if (depth < 100) scaleDown(workers);
Circuit Breaker + Backpressure#
When a downstream service is slow, stop sending:
Service A → Circuit Breaker → Service B (slow)
↓ open
Return 503 to caller (shed load)
Retry after timeout
Reactive Streams#
Consumer tells producer how much it can handle:
Consumer: "Send me 100 items"
Producer: sends 100
Consumer: processes... "Send me 50 more"
Producer: sends 50
Tools: Project Reactor (Java), RxJS (JavaScript), Akka Streams (Scala)
TCP Flow Control#
TCP has built-in backpressure — the receive window:
Sender → [data] → Receiver
← window: 64KB (I can accept 64KB more)
Sender → [64KB data] → Receiver
← window: 0 (stop! I'm processing)
... processes ...
← window: 32KB (OK, send more)
Every TCP connection does this automatically. gRPC inherits it.
Kafka Consumer Lag#
Kafka consumers track their offset. Lag = how far behind:
Topic partition: [msg1, msg2, ..., msg1000] ← latest
Consumer offset: msg800 ← current position
Lag: 200 messages behind
Monitor lag. If it grows, add consumers or investigate processing bottleneck.
Rate Limiting as Backpressure#
API rate limits are a form of backpressure:
Client: 1000 req/sec
Server: 429 Too Many Requests (limit: 100/sec)
Retry-After: 1
Client: slows to 100 req/sec ← backpressure worked
Architecture Examples#
Event Processing Pipeline#
App Events → Kafka (buffer) → Consumer Group (auto-scaled)
↓ if lag > threshold
Scale up consumers
↓ if processing fails
Dead letter topic (don't block)
API with Backpressure#
Client → Rate Limiter (token bucket)
↓ allowed
API Server → Circuit Breaker → Database
↓ DB slow ↓ open
Queue request (buffer) Return cached/degraded response
Streaming Data Pipeline#
Sensors (10K/sec) → Load Balancer → Ingestion Service
↓ if queue > 80%
Drop low-priority events
Keep critical events
↓
Stream Processor → Storage
Anti-Patterns#
- Unbounded queues — always set max size. Unbounded = eventual OOM
- Retry storms — failed requests retried immediately by all clients simultaneously. Use exponential backoff + jitter
- Ignoring lag — Kafka consumer lag growing silently until hours behind
- No circuit breakers — slow service takes down everything upstream
Monitoring#
| Metric | Alert Threshold | Action |
|---|---|---|
| Queue depth | > 5x normal | Scale consumers |
| Consumer lag | > 5 minutes | Investigate bottleneck |
| Error rate | > 5% | Circuit breaker / shed load |
| P99 latency | > 3x normal | Add capacity or degrade |
| Memory usage | > 80% | Drop or buffer to disk |
Summary#
- Every system needs backpressure — unbounded growth always crashes
- Drop when data is time-sensitive (metrics, video)
- Buffer when all data matters (events, orders)
- Signal when producers can slow down (APIs, streams)
- Monitor queue depth and consumer lag — alert before crash
- Auto-scale consumers based on queue depth
Design resilient architectures at codelit.io — 117 articles, performance audits, and load test simulation.
117 articles on system design at codelit.io/blog.
Try it on Codelit
GitHub Integration
Paste any repo URL to generate an interactive architecture diagram from real code
Related articles
Try these templates
Build this architecture
Generate an interactive architecture for Backpressure & Flow Control in Distributed Systems in seconds.
Try it in Codelit →
Comments