Distributed Tracing Sampling Strategies: Head-Based, Tail-Based & Beyond
Distributed tracing captures the path of a request as it flows through microservices. In production, a busy system can generate millions of traces per minute. Storing and analyzing all of them is expensive and often unnecessary. Sampling decides which traces to keep and which to discard — and the strategy you choose determines whether you catch the problems that matter.
Why Sampling Is Necessary#
A service handling 10,000 requests per second across 20 services produces roughly 200,000 spans per second. At 1 KB per span, that is 200 MB/s — over 17 TB per day. Without sampling, your observability costs dwarf your infrastructure costs.
But sampling introduces risk: you might discard the one trace that shows a critical bug. The goal is to keep interesting traces and drop boring ones.
All traces (100%)
│
▼
┌──────────────┐
│ Sampling │──▶ Kept traces (1-10%)
│ Strategy │ → stored, indexed, searchable
│ │──▶ Dropped traces (90-99%)
└──────────────┘ → discarded at source or collector
Head-Based Sampling#
Head-based sampling makes the keep-or-drop decision at the start of a trace, before any spans are generated. The decision propagates via trace context headers so all downstream services respect it.
How It Works#
- The entry service generates a trace ID and flips a coin (e.g., 10% probability).
- The sampling decision is encoded in the W3C
tracestateor B3 propagation header. - Every downstream service reads the header and follows the same decision.
Client → Gateway (sample? YES) → Service A → Service B → Database
↓ ↓
span kept span kept
Client → Gateway (sample? NO) → Service A → Service B → Database
↓ ↓
span dropped span dropped
Pros and Cons#
| Pros | Cons |
|---|---|
| Simple to implement | Cannot know if a trace is interesting at the start |
| Low overhead — decision is instant | Rare errors may be missed entirely |
| Consistent — all spans in a trace are kept or dropped | Fixed rate ignores traffic patterns |
| Supported by all tracing SDKs | No way to keep all error traces |
Head-based sampling is the default in OpenTelemetry and most tracing libraries. It works well as a baseline but misses important traces.
Tail-Based Sampling#
Tail-based sampling defers the decision until the trace is complete. A collector buffers all spans, assembles full traces, and then decides which to keep based on the trace's characteristics.
Decision Criteria#
- Error presence — Keep any trace containing a span with an error status.
- Latency threshold — Keep traces where total duration exceeds a percentile (e.g., p99).
- Specific operations — Always keep traces involving payment processing or auth flows.
- Status codes — Keep traces with 5xx responses.
- Span count — Unusually deep traces may indicate a problem.
┌───────────────────────────────────┐
│ Tail-Based Sampling Collector │
│ │
│ Buffer: trace-abc [5 spans] │
│ trace-def [12 spans] │
│ trace-ghi [3 spans] │
│ │
│ Rules: │
│ error == true → KEEP │
│ duration > 2s → KEEP │
│ route == /pay → KEEP │
│ otherwise → 5% random │
└───────────────────────────────────┘
Pros and Cons#
| Pros | Cons |
|---|---|
| Captures all error and slow traces | Requires buffering — memory intensive |
| Policy-driven, flexible rules | Adds latency before traces appear in storage |
| Adapts to trace content | Complex to operate at scale |
| Better signal-to-noise ratio | Must handle incomplete traces (timeout) |
Collector Architecture#
Tail-based sampling requires a stateful collector that can assemble traces from spans arriving across multiple services. In OpenTelemetry, the tailsamplingprocessor runs in the collector and groups spans by trace ID.
To ensure all spans for a trace reach the same collector, use a load balancer with trace-ID-based routing:
Services → Exporters → LB (hash on trace ID) → Collector Pool
│
┌─────┴─────┐
│ Collector 1│ (traces a-m)
│ Collector 2│ (traces n-z)
└────────────┘
Priority Sampling#
Priority sampling assigns a numeric priority to each trace. The priority can be set by the application (e.g., high priority for admin operations) or computed by the SDK based on rules.
Datadog popularized this approach with three priority levels:
- USER_KEEP (2) — Always keep, set by application code.
- AUTO_KEEP (1) — Sampler decided to keep.
- AUTO_DROP (0) — Sampler decided to drop.
- USER_DROP (-1) — Application explicitly drops (e.g., health checks).
Priority sampling combines well with both head-based and tail-based approaches. The SDK sets a default priority, and the collector can override it.
Adaptive Sampling#
Adaptive sampling adjusts the sampling rate dynamically based on traffic volume. When traffic is low, sample more (even 100%). When traffic spikes, reduce the rate to control costs.
if (traces_per_second < 100):
sample_rate = 1.0 # keep everything
elif (traces_per_second < 1000):
sample_rate = 0.1 # keep 10%
else:
sample_rate = 100 / traces_per_second # keep ~100 traces/sec
This ensures a consistent volume of stored traces regardless of traffic patterns. AWS X-Ray and Jaeger both support adaptive sampling with a target traces-per-second configuration.
Rate-Limiting Sampler#
A rate-limiting sampler keeps a fixed number of traces per time window (e.g., 50 traces per second). It uses a token bucket or leaky bucket internally.
This is simpler than adaptive sampling and provides predictable cost control. The downside is that during low traffic, you waste budget, and during high traffic, you may miss important traces.
OpenTelemetry provides a RateLimitingSampler that caps traces per second while still propagating context for unsampled traces.
Always-On for Errors#
The most pragmatic strategy combines a baseline sampler with an always-on rule for errors:
- Baseline: Head-based sampling at 5% for normal traces.
- Error override: Tail-based collector keeps 100% of traces with errors.
- Latency override: Keep 100% of traces above p99 latency.
- Manual override: Application code can force-keep traces for debugging.
This hybrid approach captures the traces you actually need while controlling volume for routine traffic.
Sampling at SDK vs Collector#
Where you sample matters for cost and completeness:
| Location | Pros | Cons |
|---|---|---|
| SDK (in-process) | Lowest network cost — dropped spans never leave the service | Cannot make tail-based decisions |
| Agent (sidecar) | Offloads work from the application process | Still limited to local span data |
| Collector (centralized) | Full trace visibility, tail-based sampling possible | Higher network cost, operational complexity |
Recommended architecture: Use a lightweight SDK sampler (head-based at 10-20%) to reduce noise, then a collector-level tail-based sampler to rescue error and slow traces that the head-based sampler would otherwise drop.
OpenTelemetry Sampling Configuration#
OpenTelemetry supports composable samplers:
# SDK-level (head-based)
sampler:
type: parentbased_traceidratio
ratio: 0.1 # 10% baseline
# Collector-level (tail-based)
processors:
tail_sampling:
decision_wait: 10s
policies:
- name: errors
type: status_code
status_code:
status_codes: [ERROR]
- name: slow
type: latency
latency:
threshold_ms: 2000
- name: baseline
type: probabilistic
probabilistic:
sampling_percentage: 5
Key Takeaways#
- Head-based sampling is simple and low-overhead but blind to trace outcomes — errors and slow requests may be dropped.
- Tail-based sampling keeps interesting traces by evaluating complete trace data, at the cost of buffering and operational complexity.
- Priority sampling lets application code influence sampling decisions for critical flows.
- Adaptive and rate-limiting samplers control cost by targeting a fixed trace volume regardless of traffic.
- The best production setup combines head-based SDK sampling with tail-based collector sampling and always-on rules for errors and high latency.
- Use trace-ID-based load balancing to route all spans for a trace to the same tail-sampling collector.
Build and explore system design concepts hands-on at codelit.io.
393 articles on system design at codelit.io/blog.
Try it on Codelit
GitHub Integration
Paste any repo URL to generate an interactive architecture diagram from real code
Related articles
Try these templates
Distributed Rate Limiter
API rate limiting with sliding window, token bucket, and per-user quotas.
7 componentsLogging & Observability Platform
Datadog-like platform with log aggregation, metrics collection, distributed tracing, and alerting.
8 componentsDistributed Key-Value Store
Redis/DynamoDB-like distributed KV store with consistent hashing, replication, and tunable consistency.
8 componentsBuild this architecture
Generate an interactive architecture for Distributed Tracing Sampling Strategies in seconds.
Try it in Codelit →
Comments