real-time analyticsClickHouseApache Druidstream processingmaterialized viewsOLAPdata engineeringsystem design

Real-Time Analytics Architecture: From Events to Dashboards

March 29, 2026 7 min readBy Codelit Team Discussion

Batch analytics with nightly ETL jobs served us well for decades. But modern products demand real-time insights — fraud detection in milliseconds, live dashboards updating every second, A/B test results as they happen. Building a real-time analytics architecture requires rethinking every layer of the data pipeline, from ingestion to visualization.

High-Level Architecture#

A real-time analytics system has four layers:

┌────────────────────────────────────────────────────────┐
│                    Data Sources                         │
│  (clickstream, transactions, IoT sensors, app events)  │
└───────────────────────┬────────────────────────────────┘
                        │
                        ▼
┌────────────────────────────────────────────────────────┐
│               Stream Ingestion                          │
│  (Kafka, Kinesis, Pulsar, Redpanda)                    │
└───────────────────────┬────────────────────────────────┘
                        │
                ┌───────┴────────┐
                ▼                ▼
┌──────────────────┐  ┌──────────────────────┐
│ Stream Processing │  │  Direct Ingestion    │
│ (Flink, Spark     │  │  (Kafka Connect,     │
│  Streaming)       │  │   native connectors) │
└────────┬─────────┘  └──────────┬───────────┘
         │                       │
         └───────────┬───────────┘
                     ▼
┌────────────────────────────────────────────────────────┐
│              OLAP / Analytics Store                      │
│  (ClickHouse, Apache Druid, Apache Pinot, StarRocks)   │
└───────────────────────┬────────────────────────────────┘
                        │
                        ▼
┌────────────────────────────────────────────────────────┐
│              Query / Visualization                      │
│  (Grafana, Superset, Metabase, custom dashboards)      │
└────────────────────────────────────────────────────────┘

Stream Processing Layer#

Stream processors transform, enrich, and aggregate events before they land in the analytics store.

Apache Flink is the most capable open-source stream processor:

-- Flink SQL: Compute real-time revenue per product category
-- over tumbling 1-minute windows
SELECT
    category,
    TUMBLE_START(event_time, INTERVAL '1' MINUTE) AS window_start,
    COUNT(*) AS order_count,
    SUM(amount) AS revenue
FROM orders
GROUP BY
    category,
    TUMBLE(event_time, INTERVAL '1' MINUTE);

When to use stream processing vs. direct ingestion:

Scenario	Approach
Raw event storage for ad-hoc queries	Direct ingestion
Pre-aggregation to reduce storage	Stream processing
Enrichment with dimension data	Stream processing
Filtering / PII redaction	Stream processing
Simple append-only logging	Direct ingestion

Choosing an OLAP Engine#

The OLAP engine is the core of your analytics architecture. Here is how the major options compare:

ClickHouse#

ClickHouse is a column-oriented database optimized for analytical queries on large datasets.

Strengths: Fastest single-node performance, excellent compression, rich SQL dialect, materialized views, wide ecosystem.

Architecture: Shared-nothing, each node stores and queries its own data. Replication via ZooKeeper or ClickHouse Keeper.

-- ClickHouse: Create a table optimized for time-series analytics
CREATE TABLE events (
    event_id    UUID,
    user_id     UInt64,
    event_type  LowCardinality(String),
    properties  Map(String, String),
    amount      Decimal64(2),
    event_time  DateTime64(3),
    date        Date MATERIALIZED toDate(event_time)
) ENGINE = MergeTree()
PARTITION BY toYYYYMM(event_time)
ORDER BY (event_type, user_id, event_time)
TTL event_time + INTERVAL 90 DAY;

Apache Druid#

Druid is designed for sub-second OLAP queries on streaming data with high concurrency.

Strengths: Native Kafka ingestion, automatic data tiering (hot/warm/cold), high query concurrency, segment-level indexing.

Architecture: Separated ingestion, query, and storage nodes. Segments are immutable and stored in deep storage (S3/HDFS).

Apache Pinot#

Pinot was built at LinkedIn for user-facing real-time analytics at massive scale.

Strengths: Upsert support, star-tree index for pre-aggregation, native Kafka consumer, designed for high-QPS user-facing queries.

Comparison#

Feature	ClickHouse	Druid	Pinot
Query latency (p99)	10–500 ms	50–500 ms	10–200 ms
Ingestion latency	Seconds	Seconds	Seconds
SQL support	Full SQL	Druid SQL (limited)	Multi-stage SQL
Upserts	ReplacingMergeTree	Limited	Native
Joins	Yes	Limited	Multi-stage
Managed offering	ClickHouse Cloud	Imply	StarTree
Best for	Ad-hoc analytics	High-concurrency	User-facing analytics

Materialized Views#

Materialized views pre-compute aggregations at ingestion time, trading storage for query speed:

-- ClickHouse: Materialized view for real-time dashboard
CREATE MATERIALIZED VIEW revenue_by_minute
ENGINE = SummingMergeTree()
ORDER BY (category, minute)
AS SELECT
    category,
    toStartOfMinute(event_time) AS minute,
    count() AS order_count,
    sum(amount) AS revenue
FROM events
WHERE event_type = 'purchase'
GROUP BY category, minute;

-- Query the materialized view — instant response
SELECT
    category,
    sum(order_count) AS orders,
    sum(revenue) AS total_revenue
FROM revenue_by_minute
WHERE minute >= now() - INTERVAL 1 HOUR
GROUP BY category
ORDER BY total_revenue DESC;

Materialized view design principles:

Pre-aggregate the most common dashboard queries — Identify the top 5–10 queries and create views for them.
Use appropriate merge engines — SummingMergeTree for additive metrics, AggregatingMergeTree for complex aggregations like uniques.
Partition by time — Enables efficient TTL-based cleanup and partition pruning.
Keep raw data — Materialized views complement raw tables; do not replace them. Ad-hoc exploration needs raw granularity.

Approximate Algorithms#

When exact counts on billions of rows are too expensive, approximate algorithms trade precision for speed:

HyperLogLog — Count Distinct#

Estimates the number of unique values using ~12 KB of memory regardless of cardinality:

-- ClickHouse: Approximate unique users per day
SELECT
    toDate(event_time) AS day,
    uniqHLL12(user_id) AS approx_unique_users
FROM events
GROUP BY day
ORDER BY day DESC
LIMIT 30;

Typical error rate: 0.5–2%. For dashboards showing "1.2M unique users today," this is perfectly acceptable.

Count-Min Sketch — Frequency Estimation#

Estimates how often a specific item appears in a stream. Useful for top-N queries:

-- Approximate top 10 most viewed pages (last hour)
SELECT
    page_url,
    count() AS views
FROM events
WHERE event_type = 'pageview'
  AND event_time >= now() - INTERVAL 1 HOUR
GROUP BY page_url
ORDER BY views DESC
LIMIT 10;

T-Digest — Approximate Percentiles#

Computes approximate quantiles (p50, p95, p99) with high accuracy at the tails:

SELECT
    quantilesTDigest(0.5, 0.95, 0.99)(response_time_ms) AS latency_percentiles
FROM api_events
WHERE event_time >= now() - INTERVAL 5 MINUTE;

Dashboard Architecture#

The visualization layer connects users to the analytics store. Key design decisions:

Query Caching#

Dashboards with many viewers should not each fire independent queries:

┌──────────┐     ┌─────────────┐     ┌────────────┐
│ Dashboard │────▶│  Query Cache │────▶│  ClickHouse │
│ (browser) │     │  (Redis/     │     │             │
│           │◀────│   in-memory) │◀────│             │
└──────────┘     └─────────────┘     └────────────┘

Cache key = hash(query + time_bucket)
TTL = refresh interval (e.g., 10 seconds)

Push vs. Poll#

Approach	Mechanism	Best for
Polling	Dashboard queries every N sec	Simple, tolerates N-sec lag
WebSocket	Server pushes on data change	Sub-second updates
SSE	Server-sent events	One-directional push

Multi-Tenant Isolation#

For SaaS analytics dashboards serving many customers:

Row-level security — Filter queries by tenant_id at the query layer.
Separate tables/databases — Stronger isolation but higher operational cost.
Query quotas — Rate-limit queries per tenant to prevent noisy neighbors.

Ingestion Patterns#

Exactly-Once Semantics#

Achieving exactly-once delivery from Kafka to ClickHouse:

Kafka Consumer Group
    │
    ▼
┌──────────────────────┐
│  Batch events (10s   │
│  or 10K rows)        │
├──────────────────────┤
│  Insert batch into   │
│  ClickHouse          │
├──────────────────────┤
│  Commit Kafka offset │
│  AFTER insert success│
└──────────────────────┘

If the consumer crashes after insert but before commit, the batch is re-inserted. Use ReplacingMergeTree with a deduplication key to handle duplicates:

CREATE TABLE events (
    event_id UUID,
    -- ... other columns
) ENGINE = ReplacingMergeTree()
ORDER BY event_id;

Backfill Strategy#

Real-time systems need a way to reprocess historical data:

Dual-write — Write to both the real-time path and a batch path (S3/data lake).
Replay — Re-read from Kafka (if retention allows) or from the data lake.
Swap — Load backfilled data into a shadow table, then swap with an atomic EXCHANGE TABLES (ClickHouse) or segment replacement (Druid).

Scaling Considerations#

Ingestion throughput — ClickHouse handles 1–5 million rows/second per node. Scale horizontally with sharding.
Query concurrency — Druid and Pinot are designed for 1000+ QPS. ClickHouse handles fewer concurrent queries but each is faster.
Storage — Column compression typically achieves 5–10x reduction. Plan capacity on compressed size.
Retention — Use TTL to automatically drop old data. Archive to object storage for compliance.

Real-time analytics is no longer a luxury — users expect instant feedback, and businesses need immediate visibility. The combination of stream processing, modern OLAP engines, and smart pre-aggregation makes it achievable at reasonable cost.

That is article #389 on Codelit. Browse all articles or explore the platform to level up your engineering skills.

Try it on Codelit

Chaos Mode

Simulate node failures and watch cascading impact across your architecture

Build this architecture →

Comments

AI agents

Agentic Data Pipeline Workflow

2 min read

AI search

AI-Powered Search Architecture: Semantic Search, Hybrid Search, and RAG

8 min read

AI safety

AI Safety Guardrails Architecture: Input Validation, Output Filtering, and Human-in-the-Loop

8 min read

Try these templates

Uber Real-Time Location System

Handles 5M+ GPS pings per second using H3 hexagonal geospatial indexing.

6 components

Real-Time Collaborative Editor

Notion-like document editor with real-time collaboration, conflict resolution, and rich media.

9 components

Real-Time Analytics Dashboard

Live analytics platform with event ingestion, stream processing, and interactive dashboards.

8 components

Build this architecture

Generate an interactive Real in seconds.

Try it in Codelit →

real-time analyticsClickHouseApache Druidstream processingmaterialized viewsOLAPdata engineeringsystem design

Real-Time Analytics Architecture: From Events to Dashboards

March 29, 2026 7 min readBy Codelit Team Discussion

High-Level Architecture#

A real-time analytics system has four layers:

┌────────────────────────────────────────────────────────┐
│                    Data Sources                         │
│  (clickstream, transactions, IoT sensors, app events)  │
└───────────────────────┬────────────────────────────────┘
                        │
                        ▼
┌────────────────────────────────────────────────────────┐
│               Stream Ingestion                          │
│  (Kafka, Kinesis, Pulsar, Redpanda)                    │
└───────────────────────┬────────────────────────────────┘
                        │
                ┌───────┴────────┐
                ▼                ▼
┌──────────────────┐  ┌──────────────────────┐
│ Stream Processing │  │  Direct Ingestion    │
│ (Flink, Spark     │  │  (Kafka Connect,     │
│  Streaming)       │  │   native connectors) │
└────────┬─────────┘  └──────────┬───────────┘
         │                       │
         └───────────┬───────────┘
                     ▼
┌────────────────────────────────────────────────────────┐
│              OLAP / Analytics Store                      │
│  (ClickHouse, Apache Druid, Apache Pinot, StarRocks)   │
└───────────────────────┬────────────────────────────────┘
                        │
                        ▼
┌────────────────────────────────────────────────────────┐
│              Query / Visualization                      │
│  (Grafana, Superset, Metabase, custom dashboards)      │
└────────────────────────────────────────────────────────┘

Stream Processing Layer#

Stream processors transform, enrich, and aggregate events before they land in the analytics store.

Apache Flink is the most capable open-source stream processor:

-- Flink SQL: Compute real-time revenue per product category
-- over tumbling 1-minute windows
SELECT
    category,
    TUMBLE_START(event_time, INTERVAL '1' MINUTE) AS window_start,
    COUNT(*) AS order_count,
    SUM(amount) AS revenue
FROM orders
GROUP BY
    category,
    TUMBLE(event_time, INTERVAL '1' MINUTE);

When to use stream processing vs. direct ingestion:

Scenario	Approach
Raw event storage for ad-hoc queries	Direct ingestion
Pre-aggregation to reduce storage	Stream processing
Enrichment with dimension data	Stream processing
Filtering / PII redaction	Stream processing
Simple append-only logging	Direct ingestion

Choosing an OLAP Engine#

The OLAP engine is the core of your analytics architecture. Here is how the major options compare:

ClickHouse#

ClickHouse is a column-oriented database optimized for analytical queries on large datasets.

Strengths: Fastest single-node performance, excellent compression, rich SQL dialect, materialized views, wide ecosystem.

Architecture: Shared-nothing, each node stores and queries its own data. Replication via ZooKeeper or ClickHouse Keeper.

-- ClickHouse: Create a table optimized for time-series analytics
CREATE TABLE events (
    event_id    UUID,
    user_id     UInt64,
    event_type  LowCardinality(String),
    properties  Map(String, String),
    amount      Decimal64(2),
    event_time  DateTime64(3),
    date        Date MATERIALIZED toDate(event_time)
) ENGINE = MergeTree()
PARTITION BY toYYYYMM(event_time)
ORDER BY (event_type, user_id, event_time)
TTL event_time + INTERVAL 90 DAY;

Apache Druid#

Druid is designed for sub-second OLAP queries on streaming data with high concurrency.

Strengths: Native Kafka ingestion, automatic data tiering (hot/warm/cold), high query concurrency, segment-level indexing.

Architecture: Separated ingestion, query, and storage nodes. Segments are immutable and stored in deep storage (S3/HDFS).

Apache Pinot#

Pinot was built at LinkedIn for user-facing real-time analytics at massive scale.

Strengths: Upsert support, star-tree index for pre-aggregation, native Kafka consumer, designed for high-QPS user-facing queries.

Comparison#

Feature	ClickHouse	Druid	Pinot
Query latency (p99)	10–500 ms	50–500 ms	10–200 ms
Ingestion latency	Seconds	Seconds	Seconds
SQL support	Full SQL	Druid SQL (limited)	Multi-stage SQL
Upserts	ReplacingMergeTree	Limited	Native
Joins	Yes	Limited	Multi-stage
Managed offering	ClickHouse Cloud	Imply	StarTree
Best for	Ad-hoc analytics	High-concurrency	User-facing analytics

Materialized Views#

Materialized views pre-compute aggregations at ingestion time, trading storage for query speed:

-- ClickHouse: Materialized view for real-time dashboard
CREATE MATERIALIZED VIEW revenue_by_minute
ENGINE = SummingMergeTree()
ORDER BY (category, minute)
AS SELECT
    category,
    toStartOfMinute(event_time) AS minute,
    count() AS order_count,
    sum(amount) AS revenue
FROM events
WHERE event_type = 'purchase'
GROUP BY category, minute;

-- Query the materialized view — instant response
SELECT
    category,
    sum(order_count) AS orders,
    sum(revenue) AS total_revenue
FROM revenue_by_minute
WHERE minute >= now() - INTERVAL 1 HOUR
GROUP BY category
ORDER BY total_revenue DESC;

Materialized view design principles:

Pre-aggregate the most common dashboard queries — Identify the top 5–10 queries and create views for them.
Use appropriate merge engines — SummingMergeTree for additive metrics, AggregatingMergeTree for complex aggregations like uniques.
Partition by time — Enables efficient TTL-based cleanup and partition pruning.
Keep raw data — Materialized views complement raw tables; do not replace them. Ad-hoc exploration needs raw granularity.

Approximate Algorithms#

When exact counts on billions of rows are too expensive, approximate algorithms trade precision for speed:

HyperLogLog — Count Distinct#

Estimates the number of unique values using ~12 KB of memory regardless of cardinality:

-- ClickHouse: Approximate unique users per day
SELECT
    toDate(event_time) AS day,
    uniqHLL12(user_id) AS approx_unique_users
FROM events
GROUP BY day
ORDER BY day DESC
LIMIT 30;

Typical error rate: 0.5–2%. For dashboards showing "1.2M unique users today," this is perfectly acceptable.

Count-Min Sketch — Frequency Estimation#

Estimates how often a specific item appears in a stream. Useful for top-N queries:

-- Approximate top 10 most viewed pages (last hour)
SELECT
    page_url,
    count() AS views
FROM events
WHERE event_type = 'pageview'
  AND event_time >= now() - INTERVAL 1 HOUR
GROUP BY page_url
ORDER BY views DESC
LIMIT 10;

T-Digest — Approximate Percentiles#

Computes approximate quantiles (p50, p95, p99) with high accuracy at the tails:

SELECT
    quantilesTDigest(0.5, 0.95, 0.99)(response_time_ms) AS latency_percentiles
FROM api_events
WHERE event_time >= now() - INTERVAL 5 MINUTE;

Dashboard Architecture#

The visualization layer connects users to the analytics store. Key design decisions:

Query Caching#

Dashboards with many viewers should not each fire independent queries:

┌──────────┐     ┌─────────────┐     ┌────────────┐
│ Dashboard │────▶│  Query Cache │────▶│  ClickHouse │
│ (browser) │     │  (Redis/     │     │             │
│           │◀────│   in-memory) │◀────│             │
└──────────┘     └─────────────┘     └────────────┘

Cache key = hash(query + time_bucket)
TTL = refresh interval (e.g., 10 seconds)

Push vs. Poll#

Approach	Mechanism	Best for
Polling	Dashboard queries every N sec	Simple, tolerates N-sec lag
WebSocket	Server pushes on data change	Sub-second updates
SSE	Server-sent events	One-directional push

Multi-Tenant Isolation#

For SaaS analytics dashboards serving many customers:

Row-level security — Filter queries by tenant_id at the query layer.
Separate tables/databases — Stronger isolation but higher operational cost.
Query quotas — Rate-limit queries per tenant to prevent noisy neighbors.

Ingestion Patterns#

Exactly-Once Semantics#

Achieving exactly-once delivery from Kafka to ClickHouse:

Kafka Consumer Group
    │
    ▼
┌──────────────────────┐
│  Batch events (10s   │
│  or 10K rows)        │
├──────────────────────┤
│  Insert batch into   │
│  ClickHouse          │
├──────────────────────┤
│  Commit Kafka offset │
│  AFTER insert success│
└──────────────────────┘

If the consumer crashes after insert but before commit, the batch is re-inserted. Use ReplacingMergeTree with a deduplication key to handle duplicates:

CREATE TABLE events (
    event_id UUID,
    -- ... other columns
) ENGINE = ReplacingMergeTree()
ORDER BY event_id;

Backfill Strategy#

Real-time systems need a way to reprocess historical data:

Dual-write — Write to both the real-time path and a batch path (S3/data lake).
Replay — Re-read from Kafka (if retention allows) or from the data lake.
Swap — Load backfilled data into a shadow table, then swap with an atomic EXCHANGE TABLES (ClickHouse) or segment replacement (Druid).

Scaling Considerations#

Ingestion throughput — ClickHouse handles 1–5 million rows/second per node. Scale horizontally with sharding.
Query concurrency — Druid and Pinot are designed for 1000+ QPS. ClickHouse handles fewer concurrent queries but each is faster.
Storage — Column compression typically achieves 5–10x reduction. Plan capacity on compressed size.
Retention — Use TTL to automatically drop old data. Archive to object storage for compliance.

That is article #389 on Codelit. Browse all articles or explore the platform to level up your engineering skills.

Try it on Codelit

Chaos Mode

Simulate node failures and watch cascading impact across your architecture

Build this architecture →

Comments

AI agents

Build this architecture

Generate an interactive Real in seconds.

Try it in Codelit →

Real-Time Analytics Architecture: From Events to Dashboards

High-Level Architecture#

Stream Processing Layer#

Choosing an OLAP Engine#

ClickHouse#

Apache Druid#

Apache Pinot#

Comparison#

Materialized Views#

Approximate Algorithms#

HyperLogLog — Count Distinct#

Count-Min Sketch — Frequency Estimation#

T-Digest — Approximate Percentiles#

Dashboard Architecture#

Query Caching#

Push vs. Poll#

Multi-Tenant Isolation#

Ingestion Patterns#

Exactly-Once Semantics#

Backfill Strategy#

Scaling Considerations#

Comments

Related articles

Agentic Data Pipeline Workflow

AI-Powered Search Architecture: Semantic Search, Hybrid Search, and RAG

AI Safety Guardrails Architecture: Input Validation, Output Filtering, and Human-in-the-Loop

Try these templates

Uber Real-Time Location System

Real-Time Collaborative Editor

Real-Time Analytics Dashboard

Build this architecture

Real-Time Analytics Architecture: From Events to Dashboards

High-Level Architecture#

Stream Processing Layer#

Choosing an OLAP Engine#

ClickHouse#

Apache Druid#

Apache Pinot#

Comparison#

Materialized Views#

Approximate Algorithms#

HyperLogLog — Count Distinct#

Count-Min Sketch — Frequency Estimation#

T-Digest — Approximate Percentiles#

Dashboard Architecture#

Query Caching#

Push vs. Poll#

Multi-Tenant Isolation#

Ingestion Patterns#

Exactly-Once Semantics#

Backfill Strategy#

Scaling Considerations#

Comments

Related articles

Agentic Data Pipeline Workflow

AI-Powered Search Architecture: Semantic Search, Hybrid Search, and RAG

AI Safety Guardrails Architecture: Input Validation, Output Filtering, and Human-in-the-Loop

Try these templates

Uber Real-Time Location System

Real-Time Collaborative Editor

Real-Time Analytics Dashboard

Build this architecture