databaseinfrastructuresystem-designmonitoring

Time Series Database Architecture — Storage, Compression & Query Patterns

March 28, 2026 7 min readBy Codelit Team Discussion

Data with a timestamp changes everything#

Time series data is any data where each point is associated with a timestamp: server CPU metrics every 10 seconds, stock prices every millisecond, IoT sensor readings every minute. The defining characteristic is that data arrives ordered by time, is almost always appended (never updated), and queries almost always filter by time range.

This access pattern is so different from general-purpose workloads that it demands specialized storage engines.

What makes time series data different#

Traditional databases assume random reads and writes across the entire dataset. Time series workloads have unique properties:

Write-heavy — millions of data points per second, rarely updated or deleted
Time-ordered — data arrives roughly in chronological order
Recent data is hot — most queries target the last hours or days
Old data is cold — historical data is queried rarely and can be lower resolution
High cardinality metadata — thousands of unique series (host + metric combinations)
Predictable patterns — values change slowly, enabling aggressive compression

Write-optimized storage engines#

Time series databases optimize for high-throughput sequential writes.

Log-Structured Merge Trees (LSM)#

Many TSDBs use LSM trees or variants. Writes go to an in-memory buffer (memtable), then flush to sorted, immutable files on disk (SSTables). Background compaction merges files.

Write amplification is traded for write throughput
No random I/O on writes — everything is sequential
InfluxDB's TSI (Time Series Index) and storage engine use this approach

Time-Structured Merge Tree (TSMT)#

InfluxDB's TSM engine is a purpose-built variant. Data is organized into shards by time range, each shard containing a set of TSM files. This makes retention policies trivial — dropping old data means deleting entire shards.

Append-only columnar storage#

Prometheus uses a custom append-only storage format. Each time series gets its own chunk of samples. Chunks are immutable once full and are memory-mapped for fast reads.

ClickHouse uses a columnar MergeTree engine where data is stored column-by-column, enabling vectorized query execution and extreme compression ratios.

Compression — fitting billions of points in memory#

Time series data compresses extraordinarily well because consecutive values are similar.

Gorilla compression (Facebook)#

Facebook's Gorilla paper introduced a compression scheme that achieves 12x compression on real-world metrics:

Timestamps: Delta-of-delta encoding. Most consecutive timestamps have the same delta (e.g., exactly 10 seconds apart), so the delta-of-delta is zero — encoded in a single bit.
Values: XOR encoding. Consecutive floating-point values are XORed — when values change slowly, most bits are zero, requiring very few bits to encode.

Raw:        1709251200, 45.2, 1709251210, 45.3, 1709251220, 45.2
Deltas:     10, 10, 10 (timestamps)
Delta-of-delta: 0, 0 → 1 bit each
XOR values: small differences → few bits each

This compression is used by Prometheus, VictoriaMetrics, and Thanos.

Delta-of-delta encoding#

Beyond Gorilla, integer metrics (counters, gauges with integer values) benefit from simple delta-of-delta encoding:

Raw values:     100, 105, 110, 115, 120
Deltas:         5, 5, 5, 5
Delta-of-delta: 0, 0, 0 → nearly free to store

Dictionary encoding for tags#

High-cardinality tag values (hostnames, region names) repeat constantly. Dictionary encoding maps each unique string to a small integer, drastically reducing storage for metadata.

Retention policies and tiered storage#

Not all data deserves the same storage treatment.

Retention policies automatically delete data older than a threshold. A typical setup:

Tier	Resolution	Retention	Storage
Hot	Raw (10s)	7 days	SSD / memory
Warm	1-minute avg	90 days	SSD
Cold	1-hour avg	2 years	HDD / object storage
Archive	Daily summary	Forever	S3 / GCS

This tiered approach keeps costs manageable while preserving historical trends.

Downsampling — trading resolution for efficiency#

Downsampling aggregates high-resolution data into lower-resolution summaries. Instead of keeping every 10-second CPU reading for a year, store 5-minute averages after 30 days.

Common aggregation functions:

avg — average value in the window
min / max — extremes for alerting review
sum — total for counters
count — number of raw points (for weighted re-aggregation)
percentile — p50, p95, p99 for latency data

Tools handling downsampling:

InfluxDB — continuous queries and tasks
Prometheus — recording rules
TimescaleDB — continuous aggregates (materialized views that auto-refresh)
VictoriaMetrics — downsampling via -downsampling.period flag

The tool landscape#

InfluxDB#

Purpose-built TSDB with its own query language (Flux, InfluxQL). Strong ecosystem, cloud-hosted option, built-in dashboarding with Chronograf.

Best for: Metrics, IoT, and application monitoring
Storage: TSM engine with built-in compression
Query: Flux (functional) or InfluxQL (SQL-like)

TimescaleDB#

PostgreSQL extension that adds time series superpowers. Full SQL compatibility, joins with relational data, and hypertables that auto-partition by time.

Best for: Teams already on PostgreSQL who need time series alongside relational data
Storage: PostgreSQL heap with chunk-based partitioning
Query: Full PostgreSQL SQL — joins, CTEs, window functions

Prometheus#

Pull-based monitoring system and TSDB. The standard for Kubernetes and cloud-native monitoring. Paired with Grafana for visualization.

Best for: Infrastructure and application monitoring, alerting
Storage: Local append-only chunks with Gorilla compression
Query: PromQL — purpose-built for time series aggregation
Limitation: Local storage only — use Thanos or Cortex for long-term retention

ClickHouse#

Columnar OLAP database that excels at time series analytics. Handles petabytes of data with sub-second query performance.

Best for: Analytics on time series data, high-cardinality workloads, log analysis
Storage: MergeTree with columnar compression
Query: SQL with time series extensions

QuestDB#

High-performance TSDB written in Java and C++, optimized for fast SQL queries on time series data. Uses memory-mapped files and SIMD instructions.

Best for: Financial data, high-frequency ingestion, SQL-native teams
Storage: Column-based with append-only design
Query: PostgreSQL-compatible SQL with time series extensions

Query patterns for time series#

Time series queries follow predictable patterns:

Range queries: "Give me CPU usage for host-42 in the last 6 hours"

SELECT time, cpu_usage FROM metrics
WHERE host = 'host-42'
  AND time > now() - INTERVAL '6 hours'
ORDER BY time;

Aggregation over windows: "Average CPU per 5-minute bucket"

SELECT time_bucket('5 minutes', time) AS bucket,
       avg(cpu_usage) AS avg_cpu
FROM metrics
WHERE time > now() - INTERVAL '24 hours'
GROUP BY bucket ORDER BY bucket;

Top-N series: "Which 10 hosts have the highest p99 latency?"

SELECT host, percentile_cont(0.99) WITHIN GROUP (ORDER BY latency) AS p99
FROM requests
WHERE time > now() - INTERVAL '1 hour'
GROUP BY host ORDER BY p99 DESC LIMIT 10;

Rate of change: "What's the request rate per second?"

rate(http_requests_total[5m])

Use cases by industry#

Monitoring and observability#

Server metrics, application traces, log volumes. Prometheus + Grafana is the de facto stack. At scale, companies use Thanos, Cortex, or VictoriaMetrics for long-term storage.

IoT and industrial#

Sensor readings from thousands of devices — temperature, pressure, vibration. InfluxDB and TimescaleDB dominate this space. Key challenge: handling device cardinality and intermittent connectivity.

Financial markets#

Tick data, order book snapshots, trade execution metrics. QuestDB and ClickHouse handle the throughput. Microsecond-precision timestamps and out-of-order ingestion are critical requirements.

Visualize your time series architecture#

See how ingestion pipelines, storage tiers, and query layers connect — try Codelit to generate an interactive diagram showing your time series infrastructure from collectors to dashboards.

Key takeaways#

Time series data has unique access patterns — write-heavy, time-ordered, recent-hot
Gorilla compression is transformative — 12x compression makes in-memory storage viable
Retention policies are mandatory — raw data at full resolution cannot be kept forever
Downsampling preserves trends — trade resolution for storage efficiency on historical data
Choose tools by use case — Prometheus for monitoring, TimescaleDB for SQL, ClickHouse for analytics, QuestDB for finance
Columnar storage wins — column-oriented designs enable both compression and fast aggregation

This is article #174 on the Codelit engineering blog — we publish in-depth guides on system design, infrastructure, and software architecture. Explore all of them at codelit.io.

{ }

Explore the Netflix architecture interactively

Try it →

Try it on Codelit

Chaos Mode

Simulate node failures and watch cascading impact across your architecture

Cost Estimator

See estimated AWS monthly costs for every component in your architecture

AI Architecture Review

Get an AI audit covering security gaps, bottlenecks, and scaling risks

Build this architecture →

Comments

api design

Batch API Endpoints — Patterns for Bulk Operations, Partial Success, and Idempotency

8 min read

system design

Circuit Breaker Implementation — State Machine, Failure Counting, Fallbacks, and Resilience4j

7 min read

api

API-First Design Methodology — Design Before You Implement

7 min read

Try these templates

Uber Real-Time Location System

Handles 5M+ GPS pings per second using H3 hexagonal geospatial indexing.

6 components

Real-Time Collaborative Editor

Notion-like document editor with real-time collaboration, conflict resolution, and rich media.

9 components

Netflix Video Streaming Architecture

Global video streaming platform with adaptive bitrate, CDN distribution, and recommendation engine.

10 components

Build this architecture

Generate an interactive Time Series Database Architecture in seconds.

Try it in Codelit →

databaseinfrastructuresystem-designmonitoring

Time Series Database Architecture — Storage, Compression & Query Patterns

March 28, 2026 7 min readBy Codelit Team Discussion

Data with a timestamp changes everything#

This access pattern is so different from general-purpose workloads that it demands specialized storage engines.

What makes time series data different#

Traditional databases assume random reads and writes across the entire dataset. Time series workloads have unique properties:

Write-heavy — millions of data points per second, rarely updated or deleted
Time-ordered — data arrives roughly in chronological order
Recent data is hot — most queries target the last hours or days
Old data is cold — historical data is queried rarely and can be lower resolution
High cardinality metadata — thousands of unique series (host + metric combinations)
Predictable patterns — values change slowly, enabling aggressive compression

Write-optimized storage engines#

Time series databases optimize for high-throughput sequential writes.

Log-Structured Merge Trees (LSM)#

Many TSDBs use LSM trees or variants. Writes go to an in-memory buffer (memtable), then flush to sorted, immutable files on disk (SSTables). Background compaction merges files.

Write amplification is traded for write throughput
No random I/O on writes — everything is sequential
InfluxDB's TSI (Time Series Index) and storage engine use this approach

Time-Structured Merge Tree (TSMT)#

Append-only columnar storage#

Prometheus uses a custom append-only storage format. Each time series gets its own chunk of samples. Chunks are immutable once full and are memory-mapped for fast reads.

ClickHouse uses a columnar MergeTree engine where data is stored column-by-column, enabling vectorized query execution and extreme compression ratios.

Compression — fitting billions of points in memory#

Time series data compresses extraordinarily well because consecutive values are similar.

Gorilla compression (Facebook)#

Facebook's Gorilla paper introduced a compression scheme that achieves 12x compression on real-world metrics:

Timestamps: Delta-of-delta encoding. Most consecutive timestamps have the same delta (e.g., exactly 10 seconds apart), so the delta-of-delta is zero — encoded in a single bit.
Values: XOR encoding. Consecutive floating-point values are XORed — when values change slowly, most bits are zero, requiring very few bits to encode.

Raw:        1709251200, 45.2, 1709251210, 45.3, 1709251220, 45.2
Deltas:     10, 10, 10 (timestamps)
Delta-of-delta: 0, 0 → 1 bit each
XOR values: small differences → few bits each

This compression is used by Prometheus, VictoriaMetrics, and Thanos.

Delta-of-delta encoding#

Beyond Gorilla, integer metrics (counters, gauges with integer values) benefit from simple delta-of-delta encoding:

Raw values:     100, 105, 110, 115, 120
Deltas:         5, 5, 5, 5
Delta-of-delta: 0, 0, 0 → nearly free to store

Dictionary encoding for tags#

High-cardinality tag values (hostnames, region names) repeat constantly. Dictionary encoding maps each unique string to a small integer, drastically reducing storage for metadata.

Retention policies and tiered storage#

Not all data deserves the same storage treatment.

Retention policies automatically delete data older than a threshold. A typical setup:

Tier	Resolution	Retention	Storage
Hot	Raw (10s)	7 days	SSD / memory
Warm	1-minute avg	90 days	SSD
Cold	1-hour avg	2 years	HDD / object storage
Archive	Daily summary	Forever	S3 / GCS

This tiered approach keeps costs manageable while preserving historical trends.

Downsampling — trading resolution for efficiency#

Downsampling aggregates high-resolution data into lower-resolution summaries. Instead of keeping every 10-second CPU reading for a year, store 5-minute averages after 30 days.

Common aggregation functions:

avg — average value in the window
min / max — extremes for alerting review
sum — total for counters
count — number of raw points (for weighted re-aggregation)
percentile — p50, p95, p99 for latency data

Tools handling downsampling:

InfluxDB — continuous queries and tasks
Prometheus — recording rules
TimescaleDB — continuous aggregates (materialized views that auto-refresh)
VictoriaMetrics — downsampling via -downsampling.period flag

The tool landscape#

InfluxDB#

Purpose-built TSDB with its own query language (Flux, InfluxQL). Strong ecosystem, cloud-hosted option, built-in dashboarding with Chronograf.

Best for: Metrics, IoT, and application monitoring
Storage: TSM engine with built-in compression
Query: Flux (functional) or InfluxQL (SQL-like)

TimescaleDB#

PostgreSQL extension that adds time series superpowers. Full SQL compatibility, joins with relational data, and hypertables that auto-partition by time.

Best for: Teams already on PostgreSQL who need time series alongside relational data
Storage: PostgreSQL heap with chunk-based partitioning
Query: Full PostgreSQL SQL — joins, CTEs, window functions

Prometheus#

Pull-based monitoring system and TSDB. The standard for Kubernetes and cloud-native monitoring. Paired with Grafana for visualization.

Best for: Infrastructure and application monitoring, alerting
Storage: Local append-only chunks with Gorilla compression
Query: PromQL — purpose-built for time series aggregation
Limitation: Local storage only — use Thanos or Cortex for long-term retention

ClickHouse#

Columnar OLAP database that excels at time series analytics. Handles petabytes of data with sub-second query performance.

Best for: Analytics on time series data, high-cardinality workloads, log analysis
Storage: MergeTree with columnar compression
Query: SQL with time series extensions

QuestDB#

High-performance TSDB written in Java and C++, optimized for fast SQL queries on time series data. Uses memory-mapped files and SIMD instructions.

Best for: Financial data, high-frequency ingestion, SQL-native teams
Storage: Column-based with append-only design
Query: PostgreSQL-compatible SQL with time series extensions

Query patterns for time series#

Time series queries follow predictable patterns:

Range queries: "Give me CPU usage for host-42 in the last 6 hours"

SELECT time, cpu_usage FROM metrics
WHERE host = 'host-42'
  AND time > now() - INTERVAL '6 hours'
ORDER BY time;

Aggregation over windows: "Average CPU per 5-minute bucket"

SELECT time_bucket('5 minutes', time) AS bucket,
       avg(cpu_usage) AS avg_cpu
FROM metrics
WHERE time > now() - INTERVAL '24 hours'
GROUP BY bucket ORDER BY bucket;

Top-N series: "Which 10 hosts have the highest p99 latency?"

SELECT host, percentile_cont(0.99) WITHIN GROUP (ORDER BY latency) AS p99
FROM requests
WHERE time > now() - INTERVAL '1 hour'
GROUP BY host ORDER BY p99 DESC LIMIT 10;

Rate of change: "What's the request rate per second?"

rate(http_requests_total[5m])

Use cases by industry#

Monitoring and observability#

Server metrics, application traces, log volumes. Prometheus + Grafana is the de facto stack. At scale, companies use Thanos, Cortex, or VictoriaMetrics for long-term storage.

IoT and industrial#

Sensor readings from thousands of devices — temperature, pressure, vibration. InfluxDB and TimescaleDB dominate this space. Key challenge: handling device cardinality and intermittent connectivity.

Financial markets#

Tick data, order book snapshots, trade execution metrics. QuestDB and ClickHouse handle the throughput. Microsecond-precision timestamps and out-of-order ingestion are critical requirements.

Visualize your time series architecture#

See how ingestion pipelines, storage tiers, and query layers connect — try Codelit to generate an interactive diagram showing your time series infrastructure from collectors to dashboards.

Key takeaways#

Time series data has unique access patterns — write-heavy, time-ordered, recent-hot
Gorilla compression is transformative — 12x compression makes in-memory storage viable
Retention policies are mandatory — raw data at full resolution cannot be kept forever
Downsampling preserves trends — trade resolution for storage efficiency on historical data
Choose tools by use case — Prometheus for monitoring, TimescaleDB for SQL, ClickHouse for analytics, QuestDB for finance
Columnar storage wins — column-oriented designs enable both compression and fast aggregation

This is article #174 on the Codelit engineering blog — we publish in-depth guides on system design, infrastructure, and software architecture. Explore all of them at codelit.io.

{ }

Explore the Netflix architecture interactively

Try it →

Try it on Codelit

Chaos Mode

Simulate node failures and watch cascading impact across your architecture

Cost Estimator

See estimated AWS monthly costs for every component in your architecture

AI Architecture Review

Get an AI audit covering security gaps, bottlenecks, and scaling risks

Build this architecture →

Comments

api design

Build this architecture

Generate an interactive Time Series Database Architecture in seconds.

Try it in Codelit →

Time Series Database Architecture — Storage, Compression & Query Patterns

Data with a timestamp changes everything#

What makes time series data different#

Write-optimized storage engines#

Log-Structured Merge Trees (LSM)#

Time-Structured Merge Tree (TSMT)#

Append-only columnar storage#

Compression — fitting billions of points in memory#

Gorilla compression (Facebook)#

Delta-of-delta encoding#

Dictionary encoding for tags#

Retention policies and tiered storage#

Downsampling — trading resolution for efficiency#

The tool landscape#

InfluxDB#

TimescaleDB#

Prometheus#

ClickHouse#

QuestDB#

Query patterns for time series#

Use cases by industry#

Monitoring and observability#

IoT and industrial#

Financial markets#

Visualize your time series architecture#

Key takeaways#

Comments

Related articles

Batch API Endpoints — Patterns for Bulk Operations, Partial Success, and Idempotency

Circuit Breaker Implementation — State Machine, Failure Counting, Fallbacks, and Resilience4j

API-First Design Methodology — Design Before You Implement

Try these templates

Uber Real-Time Location System

Real-Time Collaborative Editor

Netflix Video Streaming Architecture

Build this architecture

Time Series Database Architecture — Storage, Compression & Query Patterns

Data with a timestamp changes everything#

What makes time series data different#

Write-optimized storage engines#

Log-Structured Merge Trees (LSM)#

Time-Structured Merge Tree (TSMT)#

Append-only columnar storage#

Compression — fitting billions of points in memory#

Gorilla compression (Facebook)#

Delta-of-delta encoding#

Dictionary encoding for tags#

Retention policies and tiered storage#

Downsampling — trading resolution for efficiency#

The tool landscape#

InfluxDB#

TimescaleDB#

Prometheus#

ClickHouse#

QuestDB#

Query patterns for time series#

Use cases by industry#

Monitoring and observability#

IoT and industrial#

Financial markets#

Visualize your time series architecture#

Key takeaways#

Comments

Related articles

Batch API Endpoints — Patterns for Bulk Operations, Partial Success, and Idempotency

Circuit Breaker Implementation — State Machine, Failure Counting, Fallbacks, and Resilience4j

API-First Design Methodology — Design Before You Implement

Try these templates

Uber Real-Time Location System

Real-Time Collaborative Editor

Netflix Video Streaming Architecture

Build this architecture