streamingdata-engineeringsystem-designsparkarchitecture

Micro-Batching Architecture — Balancing Latency and Throughput in Stream Processing

March 29, 2026 6 min readBy Codelit Team Discussion

What is micro-batching?#

Micro-batching is a stream-processing strategy that collects incoming events into small, time-bounded groups and processes each group as a mini batch job. Instead of handling one record at a time (true streaming) or waiting hours for a large batch, micro-batching finds the middle ground — typically processing data every few hundred milliseconds to a few seconds.

Micro-batch vs true streaming#

Dimension	Micro-batch	True streaming
Latency	Hundreds of ms to seconds	Sub-millisecond to low ms
Throughput	Very high (amortised overhead)	High but per-record overhead
Exactly-once	Easier to achieve via batch commit	Requires careful checkpointing
Complexity	Moderate	Higher
Back-pressure	Natural (batch size limits)	Must be explicitly managed
Examples	Spark Structured Streaming	Apache Flink, Kafka Streams

True streaming engines like Flink process each event as it arrives. Micro-batch engines like Spark Structured Streaming wait for a trigger interval, collect all events that arrived during that window, and process them together.

When micro-batching wins#

High-throughput analytics — aggregating millions of events per second where sub-second latency is acceptable
Cost-sensitive workloads — fewer task launches means less scheduler overhead
Exactly-once requirements — batch-level commits simplify offset management
Teams already using Spark — reuse existing Spark knowledge and infrastructure

When true streaming wins#

Ultra-low latency — fraud detection, real-time bidding, alerting
Event-driven architectures — where each event triggers immediate downstream action
Complex event processing — pattern detection across event sequences

Spark Structured Streaming internals#

Spark Structured Streaming is the most widely deployed micro-batch engine. Under the hood, each micro-batch goes through these stages:

1. Trigger fires#

The trigger interval (default: as fast as possible) determines when the next micro-batch starts. Common configurations:

# Process as fast as possible (default)
query = df.writeStream.trigger(processingTime="0 seconds")

# Fixed interval
query = df.writeStream.trigger(processingTime="10 seconds")

# Process all available data once, then stop
query = df.writeStream.trigger(once=True)

# Available-now: process all available data in multiple batches, then stop
query = df.writeStream.trigger(availableNow=True)

2. Offset range computation#

The engine queries the source (Kafka, file system, etc.) for the latest available offsets. It computes the range: from the last committed offset to the latest available offset.

3. Micro-batch planning#

Spark creates a standard batch execution plan — logical plan, physical plan, and optimised DAG. The key insight is that all of Spark's catalyst optimisations apply to micro-batches too.

4. Execution#

The micro-batch runs exactly like a normal Spark batch job — tasks are distributed across executors, data is shuffled as needed, and results are computed.

5. Commit#

Once the batch completes, the engine commits the end offsets to the checkpoint location. This atomic commit is what enables exactly-once semantics.

Windowing strategies#

Windowing determines how events are grouped for aggregation. Micro-batch engines support all standard window types:

Tumbling windows#

Fixed-size, non-overlapping time intervals. Every event belongs to exactly one window.

from pyspark.sql.functions import window

df.groupBy(
    window("event_time", "5 minutes")
).agg({"value": "sum"})

Sliding windows#

Fixed-size windows that advance by a slide interval. Events can belong to multiple windows.

df.groupBy(
    window("event_time", "10 minutes", "2 minutes")
).agg({"value": "avg"})

Session windows#

Dynamic windows that close after a gap of inactivity. Useful for user session analysis.

df.groupBy(
    session_window("event_time", "30 minutes")
).agg({"page_views": "count"})

Late data and watermarks#

Real-world data arrives late. Watermarks tell the engine how long to wait for late data before finalising a window:

df.withWatermark("event_time", "10 minutes") \
  .groupBy(window("event_time", "5 minutes")) \
  .agg({"value": "sum"})

Events arriving more than 10 minutes late are dropped. This bounds the state the engine must maintain.

Exactly-once semantics in micro-batch#

Micro-batching simplifies exactly-once delivery through a three-part mechanism:

1. Source replay#

Sources like Kafka support offset-based replay. If a micro-batch fails mid-way, the engine re-reads from the last committed offset.

2. Deterministic processing#

Given the same input offsets, the engine produces the same output. No randomness, no external calls that could vary between retries.

3. Idempotent sinks#

The sink must handle duplicate writes gracefully. Strategies include:

Transactional writes — write output and commit offsets in the same transaction
Idempotent inserts — use unique keys so re-inserts are no-ops
Two-phase commit — coordinate between the engine and the sink

Spark Structured Streaming achieves end-to-end exactly-once with its checkpoint mechanism combined with idempotent sinks like Delta Lake.

Latency tradeoffs#

The trigger interval is the primary latency knob:

Trigger interval	Typical end-to-end latency	Throughput	Resource usage
100 ms	200–500 ms	Lower	Higher (frequent scheduling)
1 second	1–3 seconds	Good	Moderate
10 seconds	10–15 seconds	Very high	Lower
1 minute	1–2 minutes	Maximum	Minimum

The scheduling tax#

Each micro-batch incurs fixed overhead: query the source for new offsets, plan the batch, launch tasks, commit offsets. For a 100 ms trigger, this overhead can consume a significant fraction of the interval.

Adaptive batching#

Some systems adapt batch size dynamically. When load is high, batches grow larger (higher throughput). When load is low, batches shrink (lower latency). Spark's default trigger (processingTime="0 seconds") achieves this naturally — it starts the next batch as soon as the previous one finishes.

Micro-batching at scale#

Checkpoint management#

Checkpoints grow over time and must be cleaned. Strategies:

Compaction — periodically merge small checkpoint files
Retention policies — delete checkpoints older than a threshold
External state stores — use RocksDB-backed state for large stateful operations

Monitoring key metrics#

Batch duration vs trigger interval — if batches take longer than the trigger, you are falling behind
Input rate vs processing rate — sustained imbalance means you need more resources
State size — unbounded state growth indicates missing watermarks
Scheduler delay — time between trigger fire and batch start

Scaling patterns#

Vertical — increase executor memory and cores per micro-batch
Horizontal — add more Kafka partitions and Spark executors
Source parallelism — match the number of Spark partitions to source partitions

When to move beyond micro-batch#

Consider migrating to true streaming when:

You need consistent sub-100 ms latency
Your processing logic is inherently per-event (no aggregations)
You need complex event processing with temporal patterns
Your team has the expertise to manage Flink or Kafka Streams

For most analytics, monitoring, and ETL workloads, micro-batching delivers the right balance of simplicity, throughput, and latency.

396 articles on system design at codelit.io/blog.

Try it on Codelit

Chaos Mode

Simulate node failures and watch cascading impact across your architecture

Build this architecture →

Comments

AI agents

Context Engineering for Agentic Systems

2 min read

AI agents

AI Agent Memory Architecture

2 min read

AI agents

Production AI Agent Deployment Checklist

2 min read

Try these templates

Netflix Video Streaming Architecture

Global video streaming platform with adaptive bitrate, CDN distribution, and recommendation engine.

10 components

Spotify Music Streaming Platform

Music streaming with personalized recommendations, offline sync, and social features.

9 components

Search Engine Architecture

Web-scale search with crawling, indexing, ranking, and sub-second query serving.

8 components

Build this architecture

Generate an interactive Micro in seconds.

Try it in Codelit →

streamingdata-engineeringsystem-designsparkarchitecture

Micro-Batching Architecture — Balancing Latency and Throughput in Stream Processing

March 29, 2026 6 min readBy Codelit Team Discussion

What is micro-batching?#

Micro-batch vs true streaming#

Dimension	Micro-batch	True streaming
Latency	Hundreds of ms to seconds	Sub-millisecond to low ms
Throughput	Very high (amortised overhead)	High but per-record overhead
Exactly-once	Easier to achieve via batch commit	Requires careful checkpointing
Complexity	Moderate	Higher
Back-pressure	Natural (batch size limits)	Must be explicitly managed
Examples	Spark Structured Streaming	Apache Flink, Kafka Streams

When micro-batching wins#

High-throughput analytics — aggregating millions of events per second where sub-second latency is acceptable
Cost-sensitive workloads — fewer task launches means less scheduler overhead
Exactly-once requirements — batch-level commits simplify offset management
Teams already using Spark — reuse existing Spark knowledge and infrastructure

When true streaming wins#

Ultra-low latency — fraud detection, real-time bidding, alerting
Event-driven architectures — where each event triggers immediate downstream action
Complex event processing — pattern detection across event sequences

Spark Structured Streaming internals#

Spark Structured Streaming is the most widely deployed micro-batch engine. Under the hood, each micro-batch goes through these stages:

1. Trigger fires#

The trigger interval (default: as fast as possible) determines when the next micro-batch starts. Common configurations:

# Process as fast as possible (default)
query = df.writeStream.trigger(processingTime="0 seconds")

# Fixed interval
query = df.writeStream.trigger(processingTime="10 seconds")

# Process all available data once, then stop
query = df.writeStream.trigger(once=True)

# Available-now: process all available data in multiple batches, then stop
query = df.writeStream.trigger(availableNow=True)

2. Offset range computation#

The engine queries the source (Kafka, file system, etc.) for the latest available offsets. It computes the range: from the last committed offset to the latest available offset.

3. Micro-batch planning#

Spark creates a standard batch execution plan — logical plan, physical plan, and optimised DAG. The key insight is that all of Spark's catalyst optimisations apply to micro-batches too.

4. Execution#

The micro-batch runs exactly like a normal Spark batch job — tasks are distributed across executors, data is shuffled as needed, and results are computed.

5. Commit#

Once the batch completes, the engine commits the end offsets to the checkpoint location. This atomic commit is what enables exactly-once semantics.

Windowing strategies#

Windowing determines how events are grouped for aggregation. Micro-batch engines support all standard window types:

Tumbling windows#

Fixed-size, non-overlapping time intervals. Every event belongs to exactly one window.

from pyspark.sql.functions import window

df.groupBy(
    window("event_time", "5 minutes")
).agg({"value": "sum"})

Sliding windows#

Fixed-size windows that advance by a slide interval. Events can belong to multiple windows.

df.groupBy(
    window("event_time", "10 minutes", "2 minutes")
).agg({"value": "avg"})

Session windows#

Dynamic windows that close after a gap of inactivity. Useful for user session analysis.

df.groupBy(
    session_window("event_time", "30 minutes")
).agg({"page_views": "count"})

Late data and watermarks#

Real-world data arrives late. Watermarks tell the engine how long to wait for late data before finalising a window:

df.withWatermark("event_time", "10 minutes") \
  .groupBy(window("event_time", "5 minutes")) \
  .agg({"value": "sum"})

Events arriving more than 10 minutes late are dropped. This bounds the state the engine must maintain.

Exactly-once semantics in micro-batch#

Micro-batching simplifies exactly-once delivery through a three-part mechanism:

1. Source replay#

Sources like Kafka support offset-based replay. If a micro-batch fails mid-way, the engine re-reads from the last committed offset.

2. Deterministic processing#

Given the same input offsets, the engine produces the same output. No randomness, no external calls that could vary between retries.

3. Idempotent sinks#

The sink must handle duplicate writes gracefully. Strategies include:

Transactional writes — write output and commit offsets in the same transaction
Idempotent inserts — use unique keys so re-inserts are no-ops
Two-phase commit — coordinate between the engine and the sink

Spark Structured Streaming achieves end-to-end exactly-once with its checkpoint mechanism combined with idempotent sinks like Delta Lake.

Latency tradeoffs#

The trigger interval is the primary latency knob:

Trigger interval	Typical end-to-end latency	Throughput	Resource usage
100 ms	200–500 ms	Lower	Higher (frequent scheduling)
1 second	1–3 seconds	Good	Moderate
10 seconds	10–15 seconds	Very high	Lower
1 minute	1–2 minutes	Maximum	Minimum

The scheduling tax#

Adaptive batching#

Micro-batching at scale#

Checkpoint management#

Checkpoints grow over time and must be cleaned. Strategies:

Compaction — periodically merge small checkpoint files
Retention policies — delete checkpoints older than a threshold
External state stores — use RocksDB-backed state for large stateful operations

Monitoring key metrics#

Batch duration vs trigger interval — if batches take longer than the trigger, you are falling behind
Input rate vs processing rate — sustained imbalance means you need more resources
State size — unbounded state growth indicates missing watermarks
Scheduler delay — time between trigger fire and batch start

Scaling patterns#

Vertical — increase executor memory and cores per micro-batch
Horizontal — add more Kafka partitions and Spark executors
Source parallelism — match the number of Spark partitions to source partitions

When to move beyond micro-batch#

Consider migrating to true streaming when:

You need consistent sub-100 ms latency
Your processing logic is inherently per-event (no aggregations)
You need complex event processing with temporal patterns
Your team has the expertise to manage Flink or Kafka Streams

For most analytics, monitoring, and ETL workloads, micro-batching delivers the right balance of simplicity, throughput, and latency.

396 articles on system design at codelit.io/blog.

Try it on Codelit

Chaos Mode

Simulate node failures and watch cascading impact across your architecture

Build this architecture →

Comments

AI agents

Build this architecture

Generate an interactive Micro in seconds.

Try it in Codelit →

Micro-Batching Architecture — Balancing Latency and Throughput in Stream Processing

What is micro-batching?#

Micro-batch vs true streaming#

When micro-batching wins#

When true streaming wins#

Spark Structured Streaming internals#

1. Trigger fires#

2. Offset range computation#

3. Micro-batch planning#

4. Execution#

5. Commit#

Windowing strategies#

Tumbling windows#

Sliding windows#

Session windows#

Late data and watermarks#

Exactly-once semantics in micro-batch#

1. Source replay#

2. Deterministic processing#

3. Idempotent sinks#

Latency tradeoffs#

The scheduling tax#

Adaptive batching#

Micro-batching at scale#

Checkpoint management#

Monitoring key metrics#

Scaling patterns#

When to move beyond micro-batch#

Comments

Related articles

Context Engineering for Agentic Systems

AI Agent Memory Architecture

Production AI Agent Deployment Checklist

Try these templates

Netflix Video Streaming Architecture

Spotify Music Streaming Platform

Search Engine Architecture

Build this architecture

Micro-Batching Architecture — Balancing Latency and Throughput in Stream Processing

What is micro-batching?#

Micro-batch vs true streaming#

When micro-batching wins#

When true streaming wins#

Spark Structured Streaming internals#

1. Trigger fires#

2. Offset range computation#

3. Micro-batch planning#

4. Execution#

5. Commit#

Windowing strategies#

Tumbling windows#

Sliding windows#

Session windows#

Late data and watermarks#

Exactly-once semantics in micro-batch#

1. Source replay#

2. Deterministic processing#

3. Idempotent sinks#

Latency tradeoffs#

The scheduling tax#

Adaptive batching#

Micro-batching at scale#

Checkpoint management#

Monitoring key metrics#

Scaling patterns#

When to move beyond micro-batch#

Comments

Related articles

Context Engineering for Agentic Systems

AI Agent Memory Architecture

Production AI Agent Deployment Checklist

Try these templates

Netflix Video Streaming Architecture

Spotify Music Streaming Platform

Search Engine Architecture

Build this architecture