Fan-Out Fan-In Pattern: Parallel Processing and Aggregation
When a single task is too large or too slow for one worker, you split it into independent sub-tasks, process them in parallel, and merge the results. That is the fan-out fan-in pattern — one of the most versatile concurrency primitives in distributed system design.
The Core Idea#
Fan-out fan-in has two phases:
- Fan-out — Dispatch work to multiple workers, queues, or functions in parallel.
- Fan-in — Collect, aggregate, or reduce the individual results into a single output.
┌──── Worker A ────┐
│ │
Request ──────┼──── Worker B ────┼──── Aggregator ──── Response
│ │
└──── Worker C ────┘
The pattern appears under many names: scatter-gather, fork-join, map-reduce, and parallel pipeline. The mechanics differ, but the shape is always the same — split, process, merge.
Scatter-Gather#
Scatter-gather is the synchronous flavour. A coordinator sends the same request to N services in parallel and waits for all (or enough) responses.
Use cases:
- Price comparison across providers
- Search federation (query multiple indices, merge ranked results)
- Multi-region reads for lowest latency
Key decisions:
- Wait-for-all vs wait-for-quorum — Do you need every response, or can you return once K of N reply?
- Timeout handling — Set a deadline. Return partial results when stragglers exceed the budget.
- Deduplication — If services overlap, merge and deduplicate results before returning.
Map-Reduce#
Map-reduce is fan-out fan-in applied to data processing:
- Map — Each mapper receives a partition of the input and emits key-value pairs.
- Shuffle — The framework groups pairs by key.
- Reduce — Each reducer aggregates values for a given key.
Input Splits Mappers Reducers
Split 0 ──────▶ Mapper 0 ──┐
├──▶ Reducer A ──▶ Output A
Split 1 ──────▶ Mapper 1 ──┤
├──▶ Reducer B ──▶ Output B
Split 2 ──────▶ Mapper 2 ──┘
Hadoop popularized the model, but the same principle powers Spark stages, BigQuery slots, and even database parallel query plans.
Fork-Join#
Fork-join is the in-process variant. A thread pool splits a task recursively until sub-tasks are small enough to run directly, then joins results bottom-up.
Java's ForkJoinPool implements this. So does Go's errgroup:
g, ctx := errgroup.WithContext(ctx)
for _, url := range urls {
g.Go(func() error {
return fetch(ctx, url)
})
}
if err := g.Wait(); err != nil {
// handle
}
Fork-join works best when sub-tasks are CPU-bound and roughly equal in cost. Imbalanced splits create stragglers.
Fan-Out on Write vs Fan-Out on Read#
This distinction matters most in social and activity-feed systems:
Fan-out on write (push model):
- When a user publishes a post, immediately write it to every follower's feed.
- Reads are fast — each user's feed is pre-materialized.
- Writes are expensive — a celebrity with 10M followers triggers 10M writes.
Fan-out on read (pull model):
- Store the post once. When a follower opens their feed, query all followed users and merge results at read time.
- Writes are cheap — one write per post.
- Reads are expensive — every feed load triggers N queries.
Hybrid approach:
- Fan-out on write for users with fewer than K followers.
- Fan-out on read for high-follower accounts.
- Twitter (X) and Instagram use variations of this hybrid.
Queue-Based Fan-Out#
Queues decouple the fan-out producer from the fan-in consumer:
Producer ──▶ Queue ──▶ Consumer 1
──▶ Consumer 2
──▶ Consumer 3
│
▼
Result Store ──▶ Aggregator
SQS + Lambda pattern:
- Drop a message onto an SQS queue (or SNS topic for true fan-out).
- Lambda polls the queue, processing messages in parallel across invocations.
- Each Lambda writes its result to DynamoDB or S3.
- A final Lambda (or Step Functions state) reads all results and aggregates.
Benefits:
- Automatic scaling — Lambda spins up concurrency to match queue depth.
- Retry semantics — failed messages return to the queue.
- Backpressure — configure reserved concurrency to cap parallelism.
AWS Step Functions: Orchestrated Fan-Out Fan-In#
Step Functions provide a Map state that fans out over a collection:
{
"Type": "Map",
"ItemsPath": "$.items",
"MaxConcurrency": 40,
"Iterator": {
"StartAt": "ProcessItem",
"States": {
"ProcessItem": {
"Type": "Task",
"Resource": "arn:aws:lambda:...:processItem",
"End": true
}
}
},
"ResultPath": "$.results"
}
The Map state automatically collects all iteration results into an array — fan-in is built in. MaxConcurrency controls parallelism.
For large datasets (over 10,000 items), use Distributed Map which delegates fan-out to S3 and supports millions of items.
Aggregation Strategies#
The fan-in phase needs a merge strategy:
| Strategy | Description | Example |
|---|---|---|
| Concatenate | Append all results into a list | Search results from multiple indices |
| Reduce | Apply an associative function (sum, max, min) | Total word count across partitions |
| Merge-sort | Combine pre-sorted streams | Time-ordered event feeds |
| Quorum vote | Accept the majority answer | Distributed consensus reads |
| First-wins | Return the fastest response, discard the rest | Hedged requests for latency |
Error Handling#
Partial failures are the norm in fan-out fan-in:
- Retry failed branches independently. Do not restart the entire fan-out.
- Dead-letter queues capture poison messages that fail repeatedly.
- Compensation logic — if some branches succeed and others fail, decide whether to roll back the successes or proceed with partial results.
- Idempotency — workers must produce the same result if invoked twice with the same input. Queues deliver at-least-once.
Monitoring and Observability#
Track these metrics for fan-out fan-in workloads:
- Fan-out degree — How many parallel branches per request?
- Straggler latency — P99 of the slowest branch determines total latency.
- Aggregation time — How long does the merge phase take?
- Partial failure rate — What percentage of fan-outs complete with missing branches?
Use correlation IDs to trace a single request across all branches. Distributed tracing tools (Jaeger, X-Ray, Tempo) visualize the parallel spans.
When to Use Fan-Out Fan-In#
Good fit:
- Sub-tasks are independent (no shared mutable state).
- Parallelism reduces wall-clock time meaningfully.
- The aggregation step is straightforward.
Poor fit:
- Sub-tasks have ordering dependencies.
- The fan-out degree is unpredictable and can explode (use bounded concurrency).
- Network overhead of dispatching exceeds the work itself.
Key Takeaways#
The fan-out fan-in pattern is the backbone of parallel processing in distributed systems. Whether you call it scatter-gather, map-reduce, or fork-join, the shape is the same: split work, process in parallel, merge results. Choose your fan-out mechanism (threads, queues, serverless functions) based on latency requirements, failure tolerance, and operational complexity.
This is article #263 of the Codelit system design series. For more deep dives on distributed system patterns, explore the full blog archive.
Try it on Codelit
GitHub Integration
Paste any repo URL to generate an interactive architecture diagram from real code
Comments