OpenTelemetry Instrumentation Guide: Auto vs Manual, SDK Setup & Vendor-Agnostic Observability
OpenTelemetry (OTel) has become the industry standard for collecting telemetry data — traces, metrics, and logs — across distributed systems. This guide covers everything from auto-instrumentation to manual spans, SDK setup across languages, exporters, collectors, context propagation, sampling strategies, and migrating away from proprietary agents.
Why OpenTelemetry?#
Vendor lock-in has long plagued observability. Teams adopt Datadog, New Relic, or Dynatrace, then find migration painful because instrumentation is tightly coupled to the vendor SDK.
OpenTelemetry solves this by providing:
- A single, vendor-agnostic API for traces, metrics, and logs
- Auto-instrumentation libraries that require zero code changes
- A collector that decouples data production from data consumption
- Wide ecosystem support — every major observability vendor accepts OTel data
Auto vs Manual Instrumentation#
Auto-Instrumentation#
Auto-instrumentation intercepts well-known libraries (HTTP clients, database drivers, messaging frameworks) and generates spans automatically.
Node.js — use @opentelemetry/auto-instrumentations-node:
const { NodeSDK } = require("@opentelemetry/sdk-node");
const {
getNodeAutoInstrumentations,
} = require("@opentelemetry/auto-instrumentations-node");
const sdk = new NodeSDK({
instrumentations: [getNodeAutoInstrumentations()],
});
sdk.start();
Python — use opentelemetry-distro and opentelemetry-bootstrap:
pip install opentelemetry-distro opentelemetry-exporter-otlp
opentelemetry-bootstrap -a install
opentelemetry-instrument python app.py
Go — auto-instrumentation is more limited; use instrumented library wrappers like otelhttp and otelgrpc.
Manual Instrumentation#
Manual instrumentation gives you full control. You create spans around business-critical operations that auto-instrumentation cannot detect.
const { trace } = require("@opentelemetry/api");
const tracer = trace.getTracer("checkout-service");
async function processOrder(order) {
return tracer.startActiveSpan("processOrder", async (span) => {
span.setAttribute("order.id", order.id);
span.setAttribute("order.total", order.total);
try {
await chargePayment(order);
await reserveInventory(order);
span.setStatus({ code: 1 }); // OK
} catch (err) {
span.setStatus({ code: 2, message: err.message });
span.recordException(err);
throw err;
} finally {
span.end();
}
});
}
Best practice: use auto-instrumentation as a baseline, then add manual spans for domain-specific operations.
SDK Setup#
Node.js#
const { NodeSDK } = require("@opentelemetry/sdk-node");
const { OTLPTraceExporter } = require("@opentelemetry/exporter-trace-otlp-grpc");
const { OTLPMetricExporter } = require("@opentelemetry/exporter-metrics-otlp-grpc");
const { PeriodicExportingMetricReader } = require("@opentelemetry/sdk-metrics");
const sdk = new NodeSDK({
traceExporter: new OTLPTraceExporter({ url: "http://collector:4317" }),
metricReader: new PeriodicExportingMetricReader({
exporter: new OTLPMetricExporter({ url: "http://collector:4317" }),
exportIntervalMillis: 15000,
}),
});
sdk.start();
Python#
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
provider = TracerProvider()
processor = BatchSpanProcessor(OTLPSpanExporter(endpoint="http://collector:4317"))
provider.add_span_processor(processor)
trace.set_tracer_provider(provider)
Go#
import (
"go.opentelemetry.io/otel"
"go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc"
sdktrace "go.opentelemetry.io/otel/sdk/trace"
)
func initTracer() (*sdktrace.TracerProvider, error) {
exporter, err := otlptracegrpc.New(ctx,
otlptracegrpc.WithEndpoint("collector:4317"),
otlptracegrpc.WithInsecure(),
)
if err != nil {
return nil, err
}
tp := sdktrace.NewTracerProvider(
sdktrace.WithBatcher(exporter),
)
otel.SetTracerProvider(tp)
return tp, nil
}
Exporters and the Collector#
Exporters#
Exporters send telemetry data from your application to a backend. Common choices:
- OTLP (gRPC or HTTP) — the native OTel protocol; preferred for collector communication
- Jaeger — popular for tracing
- Prometheus — standard for metrics
- Zipkin — lightweight tracing alternative
The OpenTelemetry Collector#
The collector sits between your applications and backends. It receives, processes, and exports telemetry data.
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
http:
endpoint: 0.0.0.0:4318
processors:
batch:
timeout: 5s
send_batch_size: 1024
memory_limiter:
check_interval: 1s
limit_mib: 512
exporters:
otlp:
endpoint: "tempo:4317"
tls:
insecure: true
prometheus:
endpoint: "0.0.0.0:8889"
service:
pipelines:
traces:
receivers: [otlp]
processors: [memory_limiter, batch]
exporters: [otlp]
metrics:
receivers: [otlp]
processors: [memory_limiter, batch]
exporters: [prometheus]
Context Propagation#
Context propagation ensures trace context flows across service boundaries. OTel supports two primary propagators:
- W3C TraceContext — the standard (
traceparent/tracestateheaders) - B3 — used by Zipkin-based systems
The SDK automatically injects and extracts context for HTTP requests when auto-instrumentation is enabled. For messaging systems (Kafka, RabbitMQ), you must manually inject context into message headers.
Sampling Strategies#
At scale, tracing every request is expensive. Sampling controls which traces are recorded:
| Strategy | Description | Use Case |
|---|---|---|
| AlwaysOn | Record everything | Development, low-traffic services |
| AlwaysOff | Record nothing | Disabled services |
| TraceIdRatio | Probabilistic sampling | General production use |
| ParentBased | Inherit parent decision | Consistent cross-service sampling |
| Tail-based | Decide after span completes | Capture errors and slow requests |
Head-based sampling (decided at trace start) is simple but misses interesting traces. Tail-based sampling (decided at the collector) captures anomalies but requires buffering all spans temporarily.
Migrating from Proprietary Agents#
Migrating to OTel follows a phased approach:
- Deploy the OTel Collector alongside your existing agent
- Dual-ship telemetry — send data to both old and new backends
- Replace SDK instrumentation service by service, starting with the least critical
- Validate parity — ensure dashboards and alerts produce equivalent results
- Remove the proprietary agent once confidence is established
The collector's fan-out capability makes dual-shipping trivial — just add multiple exporters to the same pipeline.
Key Takeaways#
- Start with auto-instrumentation, add manual spans for business logic
- Use the OTel Collector as a central telemetry gateway
- Adopt W3C TraceContext for cross-service propagation
- Use tail-based sampling at the collector for cost-effective anomaly capture
- Migrate incrementally — the collector enables dual-shipping with zero application changes
OpenTelemetry eliminates vendor lock-in while giving you best-in-class instrumentation. Combined with a well-configured collector, it forms the backbone of modern observability.
Want to go deeper on distributed systems observability? Explore our full library — 349 articles on system design at codelit.io/blog.
Try it on Codelit
Chaos Mode
Simulate node failures and watch cascading impact across your architecture
Related articles
Try these templates
Logging & Observability Platform
Datadog-like platform with log aggregation, metrics collection, distributed tracing, and alerting.
8 componentsAutonomous Vehicle Platform
Self-driving car system with perception, planning, control, HD maps, and fleet management.
8 componentsPrometheus Monitoring Stack
Metrics collection, alerting, and visualization with Prometheus, Grafana, Alertmanager, and exporters.
10 componentsBuild this architecture
Generate an interactive architecture for OpenTelemetry Instrumentation Guide in seconds.
Try it in Codelit →
Comments