distributed tracingtrace samplingobservabilityhead-based samplingtail-based samplingOpenTelemetrysystem design

Distributed Tracing Sampling Strategies: Head-Based, Tail-Based & Beyond

March 29, 2026 7 min readBy Codelit Team Discussion

Distributed tracing captures the path of a request as it flows through microservices. In production, a busy system can generate millions of traces per minute. Storing and analyzing all of them is expensive and often unnecessary. Sampling decides which traces to keep and which to discard — and the strategy you choose determines whether you catch the problems that matter.

Why Sampling Is Necessary#

A service handling 10,000 requests per second across 20 services produces roughly 200,000 spans per second. At 1 KB per span, that is 200 MB/s — over 17 TB per day. Without sampling, your observability costs dwarf your infrastructure costs.

But sampling introduces risk: you might discard the one trace that shows a critical bug. The goal is to keep interesting traces and drop boring ones.

All traces (100%)
      │
      ▼
┌──────────────┐
│   Sampling    │──▶ Kept traces (1-10%)
│   Strategy    │        → stored, indexed, searchable
│               │──▶ Dropped traces (90-99%)
└──────────────┘        → discarded at source or collector

Head-Based Sampling#

Head-based sampling makes the keep-or-drop decision at the start of a trace, before any spans are generated. The decision propagates via trace context headers so all downstream services respect it.

How It Works#

The entry service generates a trace ID and flips a coin (e.g., 10% probability).
The sampling decision is encoded in the W3C tracestate or B3 propagation header.
Every downstream service reads the header and follows the same decision.

Client → Gateway (sample? YES) → Service A → Service B → Database
                                    ↓            ↓
                               span kept     span kept

Client → Gateway (sample? NO)  → Service A → Service B → Database
                                    ↓            ↓
                               span dropped  span dropped

Pros and Cons#

Pros	Cons
Simple to implement	Cannot know if a trace is interesting at the start
Low overhead — decision is instant	Rare errors may be missed entirely
Consistent — all spans in a trace are kept or dropped	Fixed rate ignores traffic patterns
Supported by all tracing SDKs	No way to keep all error traces

Head-based sampling is the default in OpenTelemetry and most tracing libraries. It works well as a baseline but misses important traces.

Tail-Based Sampling#

Tail-based sampling defers the decision until the trace is complete. A collector buffers all spans, assembles full traces, and then decides which to keep based on the trace's characteristics.

Decision Criteria#

Error presence — Keep any trace containing a span with an error status.
Latency threshold — Keep traces where total duration exceeds a percentile (e.g., p99).
Specific operations — Always keep traces involving payment processing or auth flows.
Status codes — Keep traces with 5xx responses.
Span count — Unusually deep traces may indicate a problem.

┌───────────────────────────────────┐
│  Tail-Based Sampling Collector     │
│                                   │
│  Buffer:  trace-abc [5 spans]     │
│           trace-def [12 spans]    │
│           trace-ghi [3 spans]     │
│                                   │
│  Rules:                           │
│    error == true    → KEEP        │
│    duration > 2s    → KEEP        │
│    route == /pay    → KEEP        │
│    otherwise        → 5% random   │
└───────────────────────────────────┘

Pros and Cons#

Pros	Cons
Captures all error and slow traces	Requires buffering — memory intensive
Policy-driven, flexible rules	Adds latency before traces appear in storage
Adapts to trace content	Complex to operate at scale
Better signal-to-noise ratio	Must handle incomplete traces (timeout)

Collector Architecture#

Tail-based sampling requires a stateful collector that can assemble traces from spans arriving across multiple services. In OpenTelemetry, the tailsamplingprocessor runs in the collector and groups spans by trace ID.

To ensure all spans for a trace reach the same collector, use a load balancer with trace-ID-based routing:

Services → Exporters → LB (hash on trace ID) → Collector Pool
                                                  │
                                            ┌─────┴─────┐
                                            │ Collector 1│ (traces a-m)
                                            │ Collector 2│ (traces n-z)
                                            └────────────┘

Priority Sampling#

Priority sampling assigns a numeric priority to each trace. The priority can be set by the application (e.g., high priority for admin operations) or computed by the SDK based on rules.

Datadog popularized this approach with three priority levels:

USER_KEEP (2) — Always keep, set by application code.
AUTO_KEEP (1) — Sampler decided to keep.
AUTO_DROP (0) — Sampler decided to drop.
USER_DROP (-1) — Application explicitly drops (e.g., health checks).

Priority sampling combines well with both head-based and tail-based approaches. The SDK sets a default priority, and the collector can override it.

Adaptive Sampling#

Adaptive sampling adjusts the sampling rate dynamically based on traffic volume. When traffic is low, sample more (even 100%). When traffic spikes, reduce the rate to control costs.

if (traces_per_second < 100):
    sample_rate = 1.0       # keep everything
elif (traces_per_second < 1000):
    sample_rate = 0.1       # keep 10%
else:
    sample_rate = 100 / traces_per_second  # keep ~100 traces/sec

This ensures a consistent volume of stored traces regardless of traffic patterns. AWS X-Ray and Jaeger both support adaptive sampling with a target traces-per-second configuration.

Rate-Limiting Sampler#

A rate-limiting sampler keeps a fixed number of traces per time window (e.g., 50 traces per second). It uses a token bucket or leaky bucket internally.

This is simpler than adaptive sampling and provides predictable cost control. The downside is that during low traffic, you waste budget, and during high traffic, you may miss important traces.

OpenTelemetry provides a RateLimitingSampler that caps traces per second while still propagating context for unsampled traces.

Always-On for Errors#

The most pragmatic strategy combines a baseline sampler with an always-on rule for errors:

Baseline: Head-based sampling at 5% for normal traces.
Error override: Tail-based collector keeps 100% of traces with errors.
Latency override: Keep 100% of traces above p99 latency.
Manual override: Application code can force-keep traces for debugging.

This hybrid approach captures the traces you actually need while controlling volume for routine traffic.

Sampling at SDK vs Collector#

Where you sample matters for cost and completeness:

Location	Pros	Cons
SDK (in-process)	Lowest network cost — dropped spans never leave the service	Cannot make tail-based decisions
Agent (sidecar)	Offloads work from the application process	Still limited to local span data
Collector (centralized)	Full trace visibility, tail-based sampling possible	Higher network cost, operational complexity

Recommended architecture: Use a lightweight SDK sampler (head-based at 10-20%) to reduce noise, then a collector-level tail-based sampler to rescue error and slow traces that the head-based sampler would otherwise drop.

OpenTelemetry Sampling Configuration#

OpenTelemetry supports composable samplers:

# SDK-level (head-based)
sampler:
  type: parentbased_traceidratio
  ratio: 0.1  # 10% baseline

# Collector-level (tail-based)
processors:
  tail_sampling:
    decision_wait: 10s
    policies:
      - name: errors
        type: status_code
        status_code:
          status_codes: [ERROR]
      - name: slow
        type: latency
        latency:
          threshold_ms: 2000
      - name: baseline
        type: probabilistic
        probabilistic:
          sampling_percentage: 5

Key Takeaways#

Head-based sampling is simple and low-overhead but blind to trace outcomes — errors and slow requests may be dropped.
Tail-based sampling keeps interesting traces by evaluating complete trace data, at the cost of buffering and operational complexity.
Priority sampling lets application code influence sampling decisions for critical flows.
Adaptive and rate-limiting samplers control cost by targeting a fixed trace volume regardless of traffic.
The best production setup combines head-based SDK sampling with tail-based collector sampling and always-on rules for errors and high latency.
Use trace-ID-based load balancing to route all spans for a trace to the same tail-sampling collector.

Build and explore system design concepts hands-on at codelit.io.

393 articles on system design at codelit.io/blog.

Try it on Codelit

GitHub Integration

Paste any repo URL to generate an interactive architecture diagram from real code

Build this architecture →

Comments

AI agents

AgentOps Observability for AI Agents

3 min read

AI agents

Agentic Data Pipeline Workflow

2 min read

AI agents

An Incident Response Agent Should Slow Down at the Right Moments

2 min read

Try these templates

Distributed Rate Limiter

API rate limiting with sliding window, token bucket, and per-user quotas.

7 components

Logging & Observability Platform

Datadog-like platform with log aggregation, metrics collection, distributed tracing, and alerting.

8 components

Distributed Key-Value Store

Redis/DynamoDB-like distributed KV store with consistent hashing, replication, and tunable consistency.

8 components

Build this architecture

Generate an interactive architecture for Distributed Tracing Sampling Strategies in seconds.

Try it in Codelit →

distributed tracingtrace samplingobservabilityhead-based samplingtail-based samplingOpenTelemetrysystem design

Distributed Tracing Sampling Strategies: Head-Based, Tail-Based & Beyond

March 29, 2026 7 min readBy Codelit Team Discussion

Why Sampling Is Necessary#

But sampling introduces risk: you might discard the one trace that shows a critical bug. The goal is to keep interesting traces and drop boring ones.

All traces (100%)
      │
      ▼
┌──────────────┐
│   Sampling    │──▶ Kept traces (1-10%)
│   Strategy    │        → stored, indexed, searchable
│               │──▶ Dropped traces (90-99%)
└──────────────┘        → discarded at source or collector

Head-Based Sampling#

How It Works#

The entry service generates a trace ID and flips a coin (e.g., 10% probability).
The sampling decision is encoded in the W3C tracestate or B3 propagation header.
Every downstream service reads the header and follows the same decision.

Client → Gateway (sample? YES) → Service A → Service B → Database
                                    ↓            ↓
                               span kept     span kept

Client → Gateway (sample? NO)  → Service A → Service B → Database
                                    ↓            ↓
                               span dropped  span dropped

Pros and Cons#

Pros	Cons
Simple to implement	Cannot know if a trace is interesting at the start
Low overhead — decision is instant	Rare errors may be missed entirely
Consistent — all spans in a trace are kept or dropped	Fixed rate ignores traffic patterns
Supported by all tracing SDKs	No way to keep all error traces

Head-based sampling is the default in OpenTelemetry and most tracing libraries. It works well as a baseline but misses important traces.

Tail-Based Sampling#

Tail-based sampling defers the decision until the trace is complete. A collector buffers all spans, assembles full traces, and then decides which to keep based on the trace's characteristics.

Decision Criteria#

Error presence — Keep any trace containing a span with an error status.
Latency threshold — Keep traces where total duration exceeds a percentile (e.g., p99).
Specific operations — Always keep traces involving payment processing or auth flows.
Status codes — Keep traces with 5xx responses.
Span count — Unusually deep traces may indicate a problem.

┌───────────────────────────────────┐
│  Tail-Based Sampling Collector     │
│                                   │
│  Buffer:  trace-abc [5 spans]     │
│           trace-def [12 spans]    │
│           trace-ghi [3 spans]     │
│                                   │
│  Rules:                           │
│    error == true    → KEEP        │
│    duration > 2s    → KEEP        │
│    route == /pay    → KEEP        │
│    otherwise        → 5% random   │
└───────────────────────────────────┘

Pros and Cons#

Pros	Cons
Captures all error and slow traces	Requires buffering — memory intensive
Policy-driven, flexible rules	Adds latency before traces appear in storage
Adapts to trace content	Complex to operate at scale
Better signal-to-noise ratio	Must handle incomplete traces (timeout)

Collector Architecture#

To ensure all spans for a trace reach the same collector, use a load balancer with trace-ID-based routing:

Services → Exporters → LB (hash on trace ID) → Collector Pool
                                                  │
                                            ┌─────┴─────┐
                                            │ Collector 1│ (traces a-m)
                                            │ Collector 2│ (traces n-z)
                                            └────────────┘

Priority Sampling#

Priority sampling assigns a numeric priority to each trace. The priority can be set by the application (e.g., high priority for admin operations) or computed by the SDK based on rules.

Datadog popularized this approach with three priority levels:

USER_KEEP (2) — Always keep, set by application code.
AUTO_KEEP (1) — Sampler decided to keep.
AUTO_DROP (0) — Sampler decided to drop.
USER_DROP (-1) — Application explicitly drops (e.g., health checks).

Priority sampling combines well with both head-based and tail-based approaches. The SDK sets a default priority, and the collector can override it.

Adaptive Sampling#

Adaptive sampling adjusts the sampling rate dynamically based on traffic volume. When traffic is low, sample more (even 100%). When traffic spikes, reduce the rate to control costs.

if (traces_per_second < 100):
    sample_rate = 1.0       # keep everything
elif (traces_per_second < 1000):
    sample_rate = 0.1       # keep 10%
else:
    sample_rate = 100 / traces_per_second  # keep ~100 traces/sec

This ensures a consistent volume of stored traces regardless of traffic patterns. AWS X-Ray and Jaeger both support adaptive sampling with a target traces-per-second configuration.

Rate-Limiting Sampler#

A rate-limiting sampler keeps a fixed number of traces per time window (e.g., 50 traces per second). It uses a token bucket or leaky bucket internally.

This is simpler than adaptive sampling and provides predictable cost control. The downside is that during low traffic, you waste budget, and during high traffic, you may miss important traces.

OpenTelemetry provides a RateLimitingSampler that caps traces per second while still propagating context for unsampled traces.

Always-On for Errors#

The most pragmatic strategy combines a baseline sampler with an always-on rule for errors:

Baseline: Head-based sampling at 5% for normal traces.
Error override: Tail-based collector keeps 100% of traces with errors.
Latency override: Keep 100% of traces above p99 latency.
Manual override: Application code can force-keep traces for debugging.

This hybrid approach captures the traces you actually need while controlling volume for routine traffic.

Sampling at SDK vs Collector#

Where you sample matters for cost and completeness:

Location	Pros	Cons
SDK (in-process)	Lowest network cost — dropped spans never leave the service	Cannot make tail-based decisions
Agent (sidecar)	Offloads work from the application process	Still limited to local span data
Collector (centralized)	Full trace visibility, tail-based sampling possible	Higher network cost, operational complexity

OpenTelemetry Sampling Configuration#

OpenTelemetry supports composable samplers:

# SDK-level (head-based)
sampler:
  type: parentbased_traceidratio
  ratio: 0.1  # 10% baseline

# Collector-level (tail-based)
processors:
  tail_sampling:
    decision_wait: 10s
    policies:
      - name: errors
        type: status_code
        status_code:
          status_codes: [ERROR]
      - name: slow
        type: latency
        latency:
          threshold_ms: 2000
      - name: baseline
        type: probabilistic
        probabilistic:
          sampling_percentage: 5

Key Takeaways#

Head-based sampling is simple and low-overhead but blind to trace outcomes — errors and slow requests may be dropped.
Tail-based sampling keeps interesting traces by evaluating complete trace data, at the cost of buffering and operational complexity.
Priority sampling lets application code influence sampling decisions for critical flows.
Adaptive and rate-limiting samplers control cost by targeting a fixed trace volume regardless of traffic.
The best production setup combines head-based SDK sampling with tail-based collector sampling and always-on rules for errors and high latency.
Use trace-ID-based load balancing to route all spans for a trace to the same tail-sampling collector.

Build and explore system design concepts hands-on at codelit.io.

393 articles on system design at codelit.io/blog.

Try it on Codelit

GitHub Integration

Paste any repo URL to generate an interactive architecture diagram from real code

Build this architecture →

Comments

AI agents

Build this architecture

Generate an interactive architecture for Distributed Tracing Sampling Strategies in seconds.

Try it in Codelit →

Distributed Tracing Sampling Strategies: Head-Based, Tail-Based & Beyond

Why Sampling Is Necessary#

Head-Based Sampling#

How It Works#

Pros and Cons#

Tail-Based Sampling#

Decision Criteria#

Pros and Cons#

Collector Architecture#

Priority Sampling#

Adaptive Sampling#

Rate-Limiting Sampler#

Always-On for Errors#

Sampling at SDK vs Collector#

OpenTelemetry Sampling Configuration#

Key Takeaways#

Comments

Related articles

AgentOps Observability for AI Agents

Agentic Data Pipeline Workflow

An Incident Response Agent Should Slow Down at the Right Moments

Try these templates

Distributed Rate Limiter

Logging & Observability Platform

Distributed Key-Value Store

Build this architecture

Distributed Tracing Sampling Strategies: Head-Based, Tail-Based & Beyond

Why Sampling Is Necessary#

Head-Based Sampling#

How It Works#

Pros and Cons#

Tail-Based Sampling#

Decision Criteria#

Pros and Cons#

Collector Architecture#

Priority Sampling#

Adaptive Sampling#

Rate-Limiting Sampler#

Always-On for Errors#

Sampling at SDK vs Collector#

OpenTelemetry Sampling Configuration#

Key Takeaways#

Comments

Related articles

AgentOps Observability for AI Agents

Agentic Data Pipeline Workflow

An Incident Response Agent Should Slow Down at the Right Moments

Try these templates

Distributed Rate Limiter

Logging & Observability Platform

Distributed Key-Value Store

Build this architecture