Materialized Views — Precomputed Query Results for Fast Reads at Scale
What is a materialized view?#
A materialized view is a precomputed query result stored as a physical table. Unlike a regular (virtual) view that re-executes the query on every access, a materialized view holds the data on disk and serves reads directly — no computation required at query time.
The tradeoff: you pay storage and maintenance cost in exchange for dramatically faster reads.
Materialized views vs virtual views#
| Aspect | Virtual view | Materialized view |
|---|---|---|
| Storage | None — query runs each time | Physical table on disk |
| Read latency | Depends on query complexity | Near-instant (table scan) |
| Staleness | Always fresh | Can be stale until refreshed |
| Write overhead | None | Maintenance cost on source changes |
| Use case | Abstraction, access control | Dashboards, analytics, read-heavy APIs |
Virtual views are aliases for queries. Materialized views are caches with structure.
Full refresh vs incremental maintenance#
Full refresh#
The simplest approach: drop the materialized view contents and recompute from scratch.
REFRESH MATERIALIZED VIEW monthly_revenue;
This works for small datasets or infrequent updates. For a 100-million-row aggregation, full refresh can take minutes and block reads (unless you use CONCURRENTLY in PostgreSQL).
Incremental maintenance#
Instead of recomputing everything, incremental maintenance applies only the delta — the rows that changed since the last refresh.
The database tracks insertions, updates, and deletions on source tables. When a refresh is triggered, it:
- Identifies changed rows using change tracking or triggers
- Computes the effect of those changes on the materialized result
- Applies the delta to the existing materialized data
Incremental maintenance is orders of magnitude faster for large datasets with small change rates. Oracle and SQL Server have supported it for decades. PostgreSQL does not natively support incremental refresh — you need extensions like pg_ivm or application-level logic.
CQRS and read models#
Command Query Responsibility Segregation (CQRS) separates write operations from read operations. The write side processes commands and emits events. The read side consumes those events and builds optimized read models — which are, conceptually, materialized views.
How CQRS read models work#
- A command modifies the write model (normalized, optimized for writes)
- An event is published describing the change
- A read-model projector consumes the event and updates a denormalized read store
- Queries hit the read store directly — no joins, no aggregations at query time
Each read model is tailored to a specific query pattern. An e-commerce system might maintain separate read models for product listings, order histories, and inventory dashboards — each shaped differently from the same source events.
Benefits of CQRS read models#
- Independent scaling — read and write stores can scale separately
- Query-optimized shapes — each read model matches its access pattern exactly
- Technology freedom — the read store can be Elasticsearch, Redis, DynamoDB, or anything else
- Eventual consistency — the tradeoff; read models lag behind writes by the event processing delay
Denormalized views#
Denormalization precomputes joins by embedding related data directly into the materialized result. Instead of joining orders, customers, and products at query time, the denormalized view stores a flat row with all fields.
Common denormalization patterns:
- Embedded references — store the customer name and email alongside each order row
- Pre-aggregated summaries — store daily totals instead of computing from individual transactions
- Nested documents — in document stores, embed related arrays (e.g., order items inside the order)
The cost is data duplication and increased write amplification. Every source change may require updating multiple denormalized rows.
Streaming materialized views#
Traditional materialized views are refreshed on a schedule or on demand. Streaming materialized views update continuously as source data changes — the view is never stale.
ksqlDB#
ksqlDB creates materialized views over Kafka topics using SQL syntax:
CREATE TABLE order_totals AS
SELECT customer_id, SUM(amount) AS total_spent
FROM orders_stream
GROUP BY customer_id
EMIT CHANGES;
The order_totals table updates in real time as new events arrive on the orders_stream topic. Applications query it via pull queries for point lookups or push queries for streaming results.
Materialize#
Materialize is a streaming database that maintains SQL materialized views over change streams (Kafka, PostgreSQL CDC, S3). It uses differential dataflow to incrementally maintain complex queries including joins, aggregations, and temporal windows.
CREATE MATERIALIZED VIEW active_subscriptions AS
SELECT customer_id, plan, started_at
FROM subscriptions
WHERE canceled_at IS NULL;
Unlike batch systems, Materialize updates the view within milliseconds of the source change. This makes it suitable for operational dashboards, real-time analytics, and serving live API responses.
Apache Flink materialized views#
Flink SQL supports creating dynamic tables from streaming sources. Combined with a state backend (RocksDB), Flink maintains materialized aggregations that applications can query via the Flink REST API or by sinking results to an external store.
Cache as a materialized view#
Martin Kleppmann argues that a cache is just an informal materialized view. Both store precomputed results derived from source data. The difference is that caches are typically managed by application code with ad-hoc invalidation, while materialized views are maintained by a system with well-defined consistency guarantees.
Treating your cache as a materialized view means:
- Derive it from a changelog — subscribe to database change events (CDC) and update the cache deterministically
- Make it rebuildable — the cache can be reconstructed from the event log at any time
- Eliminate manual invalidation — the event stream tells you exactly what changed and when
- Version it — include a sequence number so consumers know how fresh the cache is
This pattern eliminates the classic cache invalidation problem by turning it into a stream processing problem.
Event-driven refresh strategies#
Change Data Capture (CDC)#
CDC captures row-level changes from the database transaction log and publishes them as events. Tools like Debezium stream these changes to Kafka, where downstream consumers update materialized views.
This is the most reliable refresh mechanism because it captures every change, including those made by direct SQL or background jobs that bypass the application layer.
Trigger-based refresh#
Database triggers fire on INSERT, UPDATE, or DELETE and update the materialized view synchronously. This guarantees zero lag but adds latency to every write operation.
Scheduled refresh#
A cron job or scheduler refreshes the view at fixed intervals. Simple and predictable, but the view can be stale between refreshes. Suitable for dashboards and reports where minute-level freshness is acceptable.
Hybrid approach#
Many production systems combine strategies:
- CDC for near-real-time updates to critical read models
- Scheduled full refresh as a consistency safety net (catches any missed events)
- On-demand refresh for ad-hoc analytical queries
When materialized views break down#
- High write volume with complex joins — maintenance cost can exceed the read savings
- Frequent schema changes — every source schema change requires updating the view definition and backfilling
- Strong consistency requirements — materialized views are inherently eventually consistent (except synchronous trigger-based approaches)
- Storage constraints — denormalized views can be significantly larger than the source data
Visualize materialized views in your architecture#
On Codelit, generate a CQRS system or a streaming pipeline with ksqlDB to see how materialized views connect to source data, event streams, and query endpoints. Click on any view to explore its refresh strategy and data flow.
This is article #232 in the Codelit engineering blog series.
Build and explore data architectures visually at codelit.io.
Try it on Codelit
Chaos Mode
Simulate node failures and watch cascading impact across your architecture
AI Architecture Review
Get an AI audit covering security gaps, bottlenecks, and scaling risks
Related articles
Try these templates
WhatsApp-Scale Messaging System
End-to-end encrypted messaging with offline delivery, group chats, and media sharing at billions-of-messages scale.
9 componentsGmail-Scale Email Service
Email platform handling billions of messages with spam filtering, search indexing, attachment storage, and push notifications.
10 componentsBuild this architecture
Generate an interactive architecture for Materialized Views in seconds.
Try it in Codelit →
Comments