fan-out fan-in patternscatter-gatherparallel processingmap-reducefork-joindistributed systemssystem designAWS LambdaSQSStep Functions

Fan-Out Fan-In Pattern: Parallel Processing and Aggregation

March 29, 2026 6 min readBy Codelit Team Discussion

When a single task is too large or too slow for one worker, you split it into independent sub-tasks, process them in parallel, and merge the results. That is the fan-out fan-in pattern — one of the most versatile concurrency primitives in distributed system design.

The Core Idea#

Fan-out fan-in has two phases:

Fan-out — Dispatch work to multiple workers, queues, or functions in parallel.
Fan-in — Collect, aggregate, or reduce the individual results into a single output.

              ┌──── Worker A ────┐
              │                  │
Request ──────┼──── Worker B ────┼──── Aggregator ──── Response
              │                  │
              └──── Worker C ────┘

The pattern appears under many names: scatter-gather, fork-join, map-reduce, and parallel pipeline. The mechanics differ, but the shape is always the same — split, process, merge.

Scatter-Gather#

Scatter-gather is the synchronous flavour. A coordinator sends the same request to N services in parallel and waits for all (or enough) responses.

Use cases:

Price comparison across providers
Search federation (query multiple indices, merge ranked results)
Multi-region reads for lowest latency

Key decisions:

Wait-for-all vs wait-for-quorum — Do you need every response, or can you return once K of N reply?
Timeout handling — Set a deadline. Return partial results when stragglers exceed the budget.
Deduplication — If services overlap, merge and deduplicate results before returning.

Map-Reduce#

Map-reduce is fan-out fan-in applied to data processing:

Map — Each mapper receives a partition of the input and emits key-value pairs.
Shuffle — The framework groups pairs by key.
Reduce — Each reducer aggregates values for a given key.

Input Splits           Mappers            Reducers

 Split 0  ──────▶  Mapper 0  ──┐
                                ├──▶  Reducer A  ──▶  Output A
 Split 1  ──────▶  Mapper 1  ──┤
                                ├──▶  Reducer B  ──▶  Output B
 Split 2  ──────▶  Mapper 2  ──┘

Hadoop popularized the model, but the same principle powers Spark stages, BigQuery slots, and even database parallel query plans.

Fork-Join#

Fork-join is the in-process variant. A thread pool splits a task recursively until sub-tasks are small enough to run directly, then joins results bottom-up.

Java's ForkJoinPool implements this. So does Go's errgroup:

g, ctx := errgroup.WithContext(ctx)
for _, url := range urls {
    g.Go(func() error {
        return fetch(ctx, url)
    })
}
if err := g.Wait(); err != nil {
    // handle
}

Fork-join works best when sub-tasks are CPU-bound and roughly equal in cost. Imbalanced splits create stragglers.

Fan-Out on Write vs Fan-Out on Read#

This distinction matters most in social and activity-feed systems:

Fan-out on write (push model):

When a user publishes a post, immediately write it to every follower's feed.
Reads are fast — each user's feed is pre-materialized.
Writes are expensive — a celebrity with 10M followers triggers 10M writes.

Fan-out on read (pull model):

Store the post once. When a follower opens their feed, query all followed users and merge results at read time.
Writes are cheap — one write per post.
Reads are expensive — every feed load triggers N queries.

Hybrid approach:

Fan-out on write for users with fewer than K followers.
Fan-out on read for high-follower accounts.
Twitter (X) and Instagram use variations of this hybrid.

Queue-Based Fan-Out#

Queues decouple the fan-out producer from the fan-in consumer:

Producer ──▶ Queue ──▶ Consumer 1
                  ──▶ Consumer 2
                  ──▶ Consumer 3
                         │
                         ▼
                    Result Store ──▶ Aggregator

SQS + Lambda pattern:

Drop a message onto an SQS queue (or SNS topic for true fan-out).
Lambda polls the queue, processing messages in parallel across invocations.
Each Lambda writes its result to DynamoDB or S3.
A final Lambda (or Step Functions state) reads all results and aggregates.

Benefits:

Automatic scaling — Lambda spins up concurrency to match queue depth.
Retry semantics — failed messages return to the queue.
Backpressure — configure reserved concurrency to cap parallelism.

AWS Step Functions: Orchestrated Fan-Out Fan-In#

Step Functions provide a Map state that fans out over a collection:

{
  "Type": "Map",
  "ItemsPath": "$.items",
  "MaxConcurrency": 40,
  "Iterator": {
    "StartAt": "ProcessItem",
    "States": {
      "ProcessItem": {
        "Type": "Task",
        "Resource": "arn:aws:lambda:...:processItem",
        "End": true
      }
    }
  },
  "ResultPath": "$.results"
}

The Map state automatically collects all iteration results into an array — fan-in is built in. MaxConcurrency controls parallelism.

For large datasets (over 10,000 items), use Distributed Map which delegates fan-out to S3 and supports millions of items.

Aggregation Strategies#

The fan-in phase needs a merge strategy:

Strategy	Description	Example
Concatenate	Append all results into a list	Search results from multiple indices
Reduce	Apply an associative function (sum, max, min)	Total word count across partitions
Merge-sort	Combine pre-sorted streams	Time-ordered event feeds
Quorum vote	Accept the majority answer	Distributed consensus reads
First-wins	Return the fastest response, discard the rest	Hedged requests for latency

Error Handling#

Partial failures are the norm in fan-out fan-in:

Retry failed branches independently. Do not restart the entire fan-out.
Dead-letter queues capture poison messages that fail repeatedly.
Compensation logic — if some branches succeed and others fail, decide whether to roll back the successes or proceed with partial results.
Idempotency — workers must produce the same result if invoked twice with the same input. Queues deliver at-least-once.

Monitoring and Observability#

Track these metrics for fan-out fan-in workloads:

Fan-out degree — How many parallel branches per request?
Straggler latency — P99 of the slowest branch determines total latency.
Aggregation time — How long does the merge phase take?
Partial failure rate — What percentage of fan-outs complete with missing branches?

Use correlation IDs to trace a single request across all branches. Distributed tracing tools (Jaeger, X-Ray, Tempo) visualize the parallel spans.

When to Use Fan-Out Fan-In#

Good fit:

Sub-tasks are independent (no shared mutable state).
Parallelism reduces wall-clock time meaningfully.
The aggregation step is straightforward.

Poor fit:

Sub-tasks have ordering dependencies.
The fan-out degree is unpredictable and can explode (use bounded concurrency).
Network overhead of dispatching exceeds the work itself.

Key Takeaways#

The fan-out fan-in pattern is the backbone of parallel processing in distributed systems. Whether you call it scatter-gather, map-reduce, or fork-join, the shape is the same: split work, process in parallel, merge results. Choose your fan-out mechanism (threads, queues, serverless functions) based on latency requirements, failure tolerance, and operational complexity.

This is article #263 of the Codelit system design series. For more deep dives on distributed system patterns, explore the full blog archive.

Try it on Codelit

GitHub Integration

Paste any repo URL to generate an interactive architecture diagram from real code

Build this architecture →

Comments

AI search

AI-Powered Search Architecture: Semantic Search, Hybrid Search, and RAG

8 min read

AI safety

AI Safety Guardrails Architecture: Input Validation, Output Filtering, and Human-in-the-Loop

8 min read

API design

API Backward Compatibility: Ship Changes Without Breaking Consumers

6 min read

Try these templates

Payment Processing Platform

PCI-compliant payment system with multi-gateway routing, fraud detection, and reconciliation.

9 components

AWS Lambda Serverless Architecture

Event-driven serverless computing with API Gateway, Lambda functions, DynamoDB, S3, and SQS.

10 components

Build this architecture

Generate an interactive architecture for Fan in seconds.

Try it in Codelit →

fan-out fan-in patternscatter-gatherparallel processingmap-reducefork-joindistributed systemssystem designAWS LambdaSQSStep Functions

Fan-Out Fan-In Pattern: Parallel Processing and Aggregation

March 29, 2026 6 min readBy Codelit Team Discussion

The Core Idea#

Fan-out fan-in has two phases:

Fan-out — Dispatch work to multiple workers, queues, or functions in parallel.
Fan-in — Collect, aggregate, or reduce the individual results into a single output.

              ┌──── Worker A ────┐
              │                  │
Request ──────┼──── Worker B ────┼──── Aggregator ──── Response
              │                  │
              └──── Worker C ────┘

The pattern appears under many names: scatter-gather, fork-join, map-reduce, and parallel pipeline. The mechanics differ, but the shape is always the same — split, process, merge.

Scatter-Gather#

Scatter-gather is the synchronous flavour. A coordinator sends the same request to N services in parallel and waits for all (or enough) responses.

Use cases:

Price comparison across providers
Search federation (query multiple indices, merge ranked results)
Multi-region reads for lowest latency

Key decisions:

Wait-for-all vs wait-for-quorum — Do you need every response, or can you return once K of N reply?
Timeout handling — Set a deadline. Return partial results when stragglers exceed the budget.
Deduplication — If services overlap, merge and deduplicate results before returning.

Map-Reduce#

Map-reduce is fan-out fan-in applied to data processing:

Map — Each mapper receives a partition of the input and emits key-value pairs.
Shuffle — The framework groups pairs by key.
Reduce — Each reducer aggregates values for a given key.

Input Splits           Mappers            Reducers

 Split 0  ──────▶  Mapper 0  ──┐
                                ├──▶  Reducer A  ──▶  Output A
 Split 1  ──────▶  Mapper 1  ──┤
                                ├──▶  Reducer B  ──▶  Output B
 Split 2  ──────▶  Mapper 2  ──┘

Hadoop popularized the model, but the same principle powers Spark stages, BigQuery slots, and even database parallel query plans.

Fork-Join#

Fork-join is the in-process variant. A thread pool splits a task recursively until sub-tasks are small enough to run directly, then joins results bottom-up.

Java's ForkJoinPool implements this. So does Go's errgroup:

g, ctx := errgroup.WithContext(ctx)
for _, url := range urls {
    g.Go(func() error {
        return fetch(ctx, url)
    })
}
if err := g.Wait(); err != nil {
    // handle
}

Fork-join works best when sub-tasks are CPU-bound and roughly equal in cost. Imbalanced splits create stragglers.

Fan-Out on Write vs Fan-Out on Read#

This distinction matters most in social and activity-feed systems:

Fan-out on write (push model):

When a user publishes a post, immediately write it to every follower's feed.
Reads are fast — each user's feed is pre-materialized.
Writes are expensive — a celebrity with 10M followers triggers 10M writes.

Fan-out on read (pull model):

Store the post once. When a follower opens their feed, query all followed users and merge results at read time.
Writes are cheap — one write per post.
Reads are expensive — every feed load triggers N queries.

Hybrid approach:

Fan-out on write for users with fewer than K followers.
Fan-out on read for high-follower accounts.
Twitter (X) and Instagram use variations of this hybrid.

Queue-Based Fan-Out#

Queues decouple the fan-out producer from the fan-in consumer:

Producer ──▶ Queue ──▶ Consumer 1
                  ──▶ Consumer 2
                  ──▶ Consumer 3
                         │
                         ▼
                    Result Store ──▶ Aggregator

SQS + Lambda pattern:

Drop a message onto an SQS queue (or SNS topic for true fan-out).
Lambda polls the queue, processing messages in parallel across invocations.
Each Lambda writes its result to DynamoDB or S3.
A final Lambda (or Step Functions state) reads all results and aggregates.

Benefits:

Automatic scaling — Lambda spins up concurrency to match queue depth.
Retry semantics — failed messages return to the queue.
Backpressure — configure reserved concurrency to cap parallelism.

AWS Step Functions: Orchestrated Fan-Out Fan-In#

Step Functions provide a Map state that fans out over a collection:

{
  "Type": "Map",
  "ItemsPath": "$.items",
  "MaxConcurrency": 40,
  "Iterator": {
    "StartAt": "ProcessItem",
    "States": {
      "ProcessItem": {
        "Type": "Task",
        "Resource": "arn:aws:lambda:...:processItem",
        "End": true
      }
    }
  },
  "ResultPath": "$.results"
}

The Map state automatically collects all iteration results into an array — fan-in is built in. MaxConcurrency controls parallelism.

For large datasets (over 10,000 items), use Distributed Map which delegates fan-out to S3 and supports millions of items.

Aggregation Strategies#

The fan-in phase needs a merge strategy:

Strategy	Description	Example
Concatenate	Append all results into a list	Search results from multiple indices
Reduce	Apply an associative function (sum, max, min)	Total word count across partitions
Merge-sort	Combine pre-sorted streams	Time-ordered event feeds
Quorum vote	Accept the majority answer	Distributed consensus reads
First-wins	Return the fastest response, discard the rest	Hedged requests for latency

Error Handling#

Partial failures are the norm in fan-out fan-in:

Retry failed branches independently. Do not restart the entire fan-out.
Dead-letter queues capture poison messages that fail repeatedly.
Compensation logic — if some branches succeed and others fail, decide whether to roll back the successes or proceed with partial results.
Idempotency — workers must produce the same result if invoked twice with the same input. Queues deliver at-least-once.

Monitoring and Observability#

Track these metrics for fan-out fan-in workloads:

Fan-out degree — How many parallel branches per request?
Straggler latency — P99 of the slowest branch determines total latency.
Aggregation time — How long does the merge phase take?
Partial failure rate — What percentage of fan-outs complete with missing branches?

Use correlation IDs to trace a single request across all branches. Distributed tracing tools (Jaeger, X-Ray, Tempo) visualize the parallel spans.

When to Use Fan-Out Fan-In#

Good fit:

Sub-tasks are independent (no shared mutable state).
Parallelism reduces wall-clock time meaningfully.
The aggregation step is straightforward.

Poor fit:

Sub-tasks have ordering dependencies.
The fan-out degree is unpredictable and can explode (use bounded concurrency).
Network overhead of dispatching exceeds the work itself.

Key Takeaways#

This is article #263 of the Codelit system design series. For more deep dives on distributed system patterns, explore the full blog archive.

Try it on Codelit

GitHub Integration

Paste any repo URL to generate an interactive architecture diagram from real code

Build this architecture →

Comments

AI search

Try these templates

Payment Processing Platform

PCI-compliant payment system with multi-gateway routing, fraud detection, and reconciliation.

9 components

AWS Lambda Serverless Architecture

Event-driven serverless computing with API Gateway, Lambda functions, DynamoDB, S3, and SQS.

10 components

Build this architecture

Generate an interactive architecture for Fan in seconds.

Try it in Codelit →

Fan-Out Fan-In Pattern: Parallel Processing and Aggregation

The Core Idea#

Scatter-Gather#

Map-Reduce#

Fork-Join#

Fan-Out on Write vs Fan-Out on Read#

Queue-Based Fan-Out#

AWS Step Functions: Orchestrated Fan-Out Fan-In#

Aggregation Strategies#

Error Handling#

Monitoring and Observability#

When to Use Fan-Out Fan-In#

Key Takeaways#

Comments

Related articles

AI-Powered Search Architecture: Semantic Search, Hybrid Search, and RAG

AI Safety Guardrails Architecture: Input Validation, Output Filtering, and Human-in-the-Loop

API Backward Compatibility: Ship Changes Without Breaking Consumers

Try these templates

Payment Processing Platform

AWS Lambda Serverless Architecture

Build this architecture

Fan-Out Fan-In Pattern: Parallel Processing and Aggregation

The Core Idea#

Scatter-Gather#

Map-Reduce#

Fork-Join#

Fan-Out on Write vs Fan-Out on Read#

Queue-Based Fan-Out#

AWS Step Functions: Orchestrated Fan-Out Fan-In#

Aggregation Strategies#

Error Handling#

Monitoring and Observability#

When to Use Fan-Out Fan-In#

Key Takeaways#

Comments

Related articles

AI-Powered Search Architecture: Semantic Search, Hybrid Search, and RAG

AI Safety Guardrails Architecture: Input Validation, Output Filtering, and Human-in-the-Loop

API Backward Compatibility: Ship Changes Without Breaking Consumers

Try these templates

Payment Processing Platform

AWS Lambda Serverless Architecture

Build this architecture