databasequery optimizationSQLindexingN+1 queriesmaterialized viewsdenormalizationperformancesystem design

Database Query Optimization: From EXPLAIN Plans to Materialized Views

March 29, 2026 8 min readBy Codelit Team Discussion

A feature that works in development with 100 rows can grind to a halt in production with 10 million. Database query optimization is not a nice-to-have — it is the difference between a snappy product and one that hemorrhages users waiting for pages to load.

Reading EXPLAIN Plans#

Every major database engine provides an EXPLAIN command that reveals how the engine plans to execute your query. Learning to read it is the single highest-leverage optimization skill.

PostgreSQL EXPLAIN ANALYZE#

EXPLAIN ANALYZE
SELECT o.id, o.total, c.name
FROM orders o
JOIN customers c ON c.id = o.customer_id
WHERE o.created_at &gt;= '2026-01-01'
ORDER BY o.total DESC
LIMIT 20;

Key fields to examine:

Field	What it tells you
Seq Scan	Full table scan — often a sign of a missing index
Index Scan	Using an index — usually good
Bitmap Heap Scan	Index narrows rows, then fetches from heap — efficient for medium selectivity
Nested Loop	Row-by-row join — fast for small outer sets, catastrophic for large ones
Hash Join	Builds hash table from one side — good for large equi-joins
Sort	Explicit sort operation — check if an index could provide order
Actual Time	Wall-clock time in milliseconds
Rows	Estimated vs. actual row count — large gaps indicate stale statistics

MySQL EXPLAIN#

MySQL uses a tabular format. Watch for:

type: ALL (full table scan) vs. type: ref or type: range (index usage).
Extra: Using filesort — the engine cannot use an index for ORDER BY.
Extra: Using temporary — a temporary table is created, often for GROUP BY or DISTINCT.

When Estimates Are Wrong#

If the planner estimates 100 rows but the actual count is 100,000, the chosen plan may be wildly suboptimal. Fix this by:

Running ANALYZE (PostgreSQL) or ANALYZE TABLE (MySQL) to refresh statistics.
Increasing the statistics target for columns with skewed distributions.
Using pg_stat_statements to find the slowest queries in production.

Index Strategy#

Types of Indexes#

B-tree — Default for most columns. Supports equality, range, and ordering.
Hash — Equality only. Rarely the best choice in modern PostgreSQL (B-tree is competitive).
GIN (Generalized Inverted Index) — Full-text search, JSONB containment, array overlap.
GiST — Geometric data, range types, nearest-neighbor search.
Partial index — Index only rows matching a predicate. Example: CREATE INDEX idx_active_orders ON orders(created_at) WHERE status = 'active';
Covering index (INCLUDE) — Stores extra columns in the index leaf pages so the query never touches the heap.

Index Selection Heuristics#

Start with WHERE clauses — Columns that appear in filters are the first candidates.
Consider JOIN keys — Foreign key columns should almost always be indexed.
ORDER BY and GROUP BY — An index matching the sort order avoids a filesort.
Composite indexes — Column order matters. Place equality columns first, then range columns.
Don't over-index — Every index slows down writes and consumes storage. Audit unused indexes with pg_stat_user_indexes.

Composite Index Example#

-- Query pattern: filter by status, range on created_at, order by total
SELECT * FROM orders
WHERE status = 'shipped'
  AND created_at &gt;= '2026-01-01'
ORDER BY total DESC
LIMIT 50;

-- Optimal composite index
CREATE INDEX idx_orders_status_created_total
ON orders(status, created_at, total DESC);

The index handles the equality filter, the range filter, and the sort in a single B-tree traversal.

Solving N+1 Queries#

The N+1 problem occurs when your application executes one query to fetch a list and then N additional queries to fetch related data for each item.

Detection#

ORM logging — Enable query logging and look for repeated patterns.
APM tools — Datadog, New Relic, and Scout highlight N+1 patterns automatically.
pg_stat_statements — A query executed thousands of times per second with a low per-execution cost is a strong N+1 signal.

Solutions#

Eager loading (ORM-level)

# Django: select_related for FK, prefetch_related for M2M
orders = Order.objects.select_related('customer').prefetch_related('items')

# Rails: includes
orders = Order.includes(:customer, :items).where(status: 'active')

JOIN in raw SQL

SELECT o.id, o.total, c.name, i.product_name, i.quantity
FROM orders o
JOIN customers c ON c.id = o.customer_id
JOIN order_items i ON i.order_id = o.id
WHERE o.status = 'active';

Batch loading (GraphQL / DataLoader pattern)

Collect all requested IDs, then issue a single WHERE id IN (...) query. Libraries like DataLoader (JavaScript) and Strawberry DataLoader (Python) formalize this pattern.

Query Caching#

Application-Level Cache#

Cache the result set, not the query string. Use a deterministic cache key derived from the query parameters.

cache_key = "orders:status=active:page=3:per=20"
TTL = 60 seconds

Invalidate on writes using:

Write-through — Update cache immediately after the database write.
Cache-aside with TTL — Let entries expire naturally. Simpler but allows brief staleness.
Event-driven invalidation — Listen to CDC (Change Data Capture) events and evict affected keys.

Database-Level Cache#

PostgreSQL shared_buffers — Tune to 25 % of available RAM as a starting point.
MySQL query cache — Deprecated in MySQL 8.0. Use ProxySQL or application-level caching instead.
PgBouncer / ProxySQL — Connection poolers that reduce overhead from connection setup and prepared statement re-parsing.

Materialized Views#

A materialized view stores the result of a query physically on disk. It trades write-time cost for read-time speed.

When to Use#

Dashboards and analytics queries that aggregate millions of rows.
Queries with expensive JOINs across multiple large tables.
Data that changes infrequently relative to how often it is read.

PostgreSQL Example#

CREATE MATERIALIZED VIEW mv_daily_revenue AS
SELECT
  date_trunc('day', created_at) AS day,
  SUM(total) AS revenue,
  COUNT(*) AS order_count
FROM orders
WHERE status = 'completed'
GROUP BY 1;

-- Refresh (blocks reads during refresh)
REFRESH MATERIALIZED VIEW mv_daily_revenue;

-- Concurrent refresh (requires a unique index)
CREATE UNIQUE INDEX idx_mv_daily_revenue_day ON mv_daily_revenue(day);
REFRESH MATERIALIZED VIEW CONCURRENTLY mv_daily_revenue;

Refresh Strategies#

Strategy	Latency	Complexity
Cron job (e.g., every 5 min)	Minutes	Low
Trigger-based refresh	Seconds	Medium
CDC + event-driven refresh	Seconds	High
Incremental materialized view (Materialize, ksqlDB)	Sub-second	High

Denormalization#

Normalization eliminates redundancy. Denormalization re-introduces it strategically to avoid expensive JOINs at read time.

Common Patterns#

Embedding aggregates — Store order_count directly on the customers table. Update it via a trigger or application logic.
Duplicating columns — Store customer_name on the orders table so dashboards skip the JOIN.
JSON columns — Store nested data (e.g., line items) as JSONB on the parent row for single-row reads.

Trade-Offs#

Benefit	Cost
Faster reads (fewer JOINs)	Slower writes (must update multiple locations)
Simpler read queries	Risk of data inconsistency
Reduced query complexity for analytics	Harder schema evolution

Rule of thumb: Denormalize when your read-to-write ratio is high (100:1 or more) and the duplicated data changes infrequently.

Batch Queries#

Batch Inserts#

Instead of inserting rows one at a time, use multi-value inserts:

INSERT INTO events (user_id, event_type, created_at)
VALUES
  (1, 'click', NOW()),
  (2, 'view', NOW()),
  (3, 'purchase', NOW());

Most ORMs support bulk creation:

# Django
Event.objects.bulk_create([Event(...), Event(...), Event(...)])

# SQLAlchemy
session.bulk_save_objects([Event(...), Event(...)])

Batch Updates#

Use UPDATE ... FROM (PostgreSQL) or CASE expressions:

UPDATE products
SET price = v.new_price
FROM (VALUES (1, 29.99), (2, 49.99), (3, 9.99)) AS v(id, new_price)
WHERE products.id = v.id;

Batch Deletes#

Delete in chunks to avoid long-running transactions and lock contention:

DELETE FROM logs
WHERE id IN (
  SELECT id FROM logs
  WHERE created_at &lt; '2025-01-01'
  LIMIT 10000
);
-- Repeat until zero rows affected

Optimization Checklist#

Measure first — Use EXPLAIN ANALYZE, not guesswork.
Index strategically — Cover your most frequent query patterns without over-indexing.
Eliminate N+1 — Eager load, JOIN, or batch-load related data.
Cache hot paths — Application-level cache with smart invalidation.
Materialize expensive aggregations — Refresh on a schedule or via events.
Denormalize deliberately — Only where read-heavy patterns justify the write overhead.
Batch writes — Multi-value inserts, chunked deletes, bulk updates.
Monitor continuously — Track slow query logs, pg_stat_statements, and query count trends.

The best-performing database is one where every query has a plan, every index has a purpose, and every optimization decision is backed by measurement.

This is article #288 on Codelit.io — leveling up your engineering knowledge, one deep dive at a time. Explore more at codelit.io.

Try it on Codelit

AI Architecture Review

Get an AI audit covering security gaps, bottlenecks, and scaling risks

Build this architecture →

Comments

AI search

AI-Powered Search Architecture: Semantic Search, Hybrid Search, and RAG

8 min read

AI safety

AI Safety Guardrails Architecture: Input Validation, Output Filtering, and Human-in-the-Loop

8 min read

API design

API Backward Compatibility: Ship Changes Without Breaking Consumers

6 min read

Try these templates

PostgreSQL High Availability Cluster

Production PostgreSQL with streaming replication, connection pooling, automated failover, and monitoring.

10 components

Build this architecture

Generate an interactive architecture for Database Query Optimization in seconds.

Try it in Codelit →

databasequery optimizationSQLindexingN+1 queriesmaterialized viewsdenormalizationperformancesystem design

Database Query Optimization: From EXPLAIN Plans to Materialized Views

March 29, 2026 8 min readBy Codelit Team Discussion

Reading EXPLAIN Plans#

Every major database engine provides an EXPLAIN command that reveals how the engine plans to execute your query. Learning to read it is the single highest-leverage optimization skill.

PostgreSQL EXPLAIN ANALYZE#

EXPLAIN ANALYZE
SELECT o.id, o.total, c.name
FROM orders o
JOIN customers c ON c.id = o.customer_id
WHERE o.created_at &gt;= '2026-01-01'
ORDER BY o.total DESC
LIMIT 20;

Key fields to examine:

Field	What it tells you
Seq Scan	Full table scan — often a sign of a missing index
Index Scan	Using an index — usually good
Bitmap Heap Scan	Index narrows rows, then fetches from heap — efficient for medium selectivity
Nested Loop	Row-by-row join — fast for small outer sets, catastrophic for large ones
Hash Join	Builds hash table from one side — good for large equi-joins
Sort	Explicit sort operation — check if an index could provide order
Actual Time	Wall-clock time in milliseconds
Rows	Estimated vs. actual row count — large gaps indicate stale statistics

MySQL EXPLAIN#

MySQL uses a tabular format. Watch for:

type: ALL (full table scan) vs. type: ref or type: range (index usage).
Extra: Using filesort — the engine cannot use an index for ORDER BY.
Extra: Using temporary — a temporary table is created, often for GROUP BY or DISTINCT.

When Estimates Are Wrong#

If the planner estimates 100 rows but the actual count is 100,000, the chosen plan may be wildly suboptimal. Fix this by:

Running ANALYZE (PostgreSQL) or ANALYZE TABLE (MySQL) to refresh statistics.
Increasing the statistics target for columns with skewed distributions.
Using pg_stat_statements to find the slowest queries in production.

Index Strategy#

Types of Indexes#

B-tree — Default for most columns. Supports equality, range, and ordering.
Hash — Equality only. Rarely the best choice in modern PostgreSQL (B-tree is competitive).
GIN (Generalized Inverted Index) — Full-text search, JSONB containment, array overlap.
GiST — Geometric data, range types, nearest-neighbor search.
Partial index — Index only rows matching a predicate. Example: CREATE INDEX idx_active_orders ON orders(created_at) WHERE status = 'active';
Covering index (INCLUDE) — Stores extra columns in the index leaf pages so the query never touches the heap.

Index Selection Heuristics#

Start with WHERE clauses — Columns that appear in filters are the first candidates.
Consider JOIN keys — Foreign key columns should almost always be indexed.
ORDER BY and GROUP BY — An index matching the sort order avoids a filesort.
Composite indexes — Column order matters. Place equality columns first, then range columns.
Don't over-index — Every index slows down writes and consumes storage. Audit unused indexes with pg_stat_user_indexes.

Composite Index Example#

-- Query pattern: filter by status, range on created_at, order by total
SELECT * FROM orders
WHERE status = 'shipped'
  AND created_at &gt;= '2026-01-01'
ORDER BY total DESC
LIMIT 50;

-- Optimal composite index
CREATE INDEX idx_orders_status_created_total
ON orders(status, created_at, total DESC);

The index handles the equality filter, the range filter, and the sort in a single B-tree traversal.

Solving N+1 Queries#

The N+1 problem occurs when your application executes one query to fetch a list and then N additional queries to fetch related data for each item.

Detection#

ORM logging — Enable query logging and look for repeated patterns.
APM tools — Datadog, New Relic, and Scout highlight N+1 patterns automatically.
pg_stat_statements — A query executed thousands of times per second with a low per-execution cost is a strong N+1 signal.

Solutions#

Eager loading (ORM-level)

# Django: select_related for FK, prefetch_related for M2M
orders = Order.objects.select_related('customer').prefetch_related('items')

# Rails: includes
orders = Order.includes(:customer, :items).where(status: 'active')

JOIN in raw SQL

SELECT o.id, o.total, c.name, i.product_name, i.quantity
FROM orders o
JOIN customers c ON c.id = o.customer_id
JOIN order_items i ON i.order_id = o.id
WHERE o.status = 'active';

Batch loading (GraphQL / DataLoader pattern)

Collect all requested IDs, then issue a single WHERE id IN (...) query. Libraries like DataLoader (JavaScript) and Strawberry DataLoader (Python) formalize this pattern.

Query Caching#

Application-Level Cache#

Cache the result set, not the query string. Use a deterministic cache key derived from the query parameters.

cache_key = "orders:status=active:page=3:per=20"
TTL = 60 seconds

Invalidate on writes using:

Write-through — Update cache immediately after the database write.
Cache-aside with TTL — Let entries expire naturally. Simpler but allows brief staleness.
Event-driven invalidation — Listen to CDC (Change Data Capture) events and evict affected keys.

Database-Level Cache#

PostgreSQL shared_buffers — Tune to 25 % of available RAM as a starting point.
MySQL query cache — Deprecated in MySQL 8.0. Use ProxySQL or application-level caching instead.
PgBouncer / ProxySQL — Connection poolers that reduce overhead from connection setup and prepared statement re-parsing.

Materialized Views#

A materialized view stores the result of a query physically on disk. It trades write-time cost for read-time speed.

When to Use#

Dashboards and analytics queries that aggregate millions of rows.
Queries with expensive JOINs across multiple large tables.
Data that changes infrequently relative to how often it is read.

PostgreSQL Example#

CREATE MATERIALIZED VIEW mv_daily_revenue AS
SELECT
  date_trunc('day', created_at) AS day,
  SUM(total) AS revenue,
  COUNT(*) AS order_count
FROM orders
WHERE status = 'completed'
GROUP BY 1;

-- Refresh (blocks reads during refresh)
REFRESH MATERIALIZED VIEW mv_daily_revenue;

-- Concurrent refresh (requires a unique index)
CREATE UNIQUE INDEX idx_mv_daily_revenue_day ON mv_daily_revenue(day);
REFRESH MATERIALIZED VIEW CONCURRENTLY mv_daily_revenue;

Refresh Strategies#

Strategy	Latency	Complexity
Cron job (e.g., every 5 min)	Minutes	Low
Trigger-based refresh	Seconds	Medium
CDC + event-driven refresh	Seconds	High
Incremental materialized view (Materialize, ksqlDB)	Sub-second	High

Denormalization#

Normalization eliminates redundancy. Denormalization re-introduces it strategically to avoid expensive JOINs at read time.

Common Patterns#

Embedding aggregates — Store order_count directly on the customers table. Update it via a trigger or application logic.
Duplicating columns — Store customer_name on the orders table so dashboards skip the JOIN.
JSON columns — Store nested data (e.g., line items) as JSONB on the parent row for single-row reads.

Trade-Offs#

Benefit	Cost
Faster reads (fewer JOINs)	Slower writes (must update multiple locations)
Simpler read queries	Risk of data inconsistency
Reduced query complexity for analytics	Harder schema evolution

Rule of thumb: Denormalize when your read-to-write ratio is high (100:1 or more) and the duplicated data changes infrequently.

Batch Queries#

Batch Inserts#

Instead of inserting rows one at a time, use multi-value inserts:

INSERT INTO events (user_id, event_type, created_at)
VALUES
  (1, 'click', NOW()),
  (2, 'view', NOW()),
  (3, 'purchase', NOW());

Most ORMs support bulk creation:

# Django
Event.objects.bulk_create([Event(...), Event(...), Event(...)])

# SQLAlchemy
session.bulk_save_objects([Event(...), Event(...)])

Batch Updates#

Use UPDATE ... FROM (PostgreSQL) or CASE expressions:

UPDATE products
SET price = v.new_price
FROM (VALUES (1, 29.99), (2, 49.99), (3, 9.99)) AS v(id, new_price)
WHERE products.id = v.id;

Batch Deletes#

Delete in chunks to avoid long-running transactions and lock contention:

DELETE FROM logs
WHERE id IN (
  SELECT id FROM logs
  WHERE created_at &lt; '2025-01-01'
  LIMIT 10000
);
-- Repeat until zero rows affected

Optimization Checklist#

Measure first — Use EXPLAIN ANALYZE, not guesswork.
Index strategically — Cover your most frequent query patterns without over-indexing.
Eliminate N+1 — Eager load, JOIN, or batch-load related data.
Cache hot paths — Application-level cache with smart invalidation.
Materialize expensive aggregations — Refresh on a schedule or via events.
Denormalize deliberately — Only where read-heavy patterns justify the write overhead.
Batch writes — Multi-value inserts, chunked deletes, bulk updates.
Monitor continuously — Track slow query logs, pg_stat_statements, and query count trends.

The best-performing database is one where every query has a plan, every index has a purpose, and every optimization decision is backed by measurement.

This is article #288 on Codelit.io — leveling up your engineering knowledge, one deep dive at a time. Explore more at codelit.io.

Try it on Codelit

AI Architecture Review

Get an AI audit covering security gaps, bottlenecks, and scaling risks

Build this architecture →

Comments

AI search

Try these templates

PostgreSQL High Availability Cluster

Production PostgreSQL with streaming replication, connection pooling, automated failover, and monitoring.

10 components

Build this architecture

Generate an interactive architecture for Database Query Optimization in seconds.

Try it in Codelit →

Database Query Optimization: From EXPLAIN Plans to Materialized Views

Reading EXPLAIN Plans#

PostgreSQL EXPLAIN ANALYZE#

MySQL EXPLAIN#

When Estimates Are Wrong#

Index Strategy#

Types of Indexes#

Index Selection Heuristics#

Composite Index Example#

Solving N+1 Queries#

Detection#

Solutions#

Query Caching#

Application-Level Cache#

Database-Level Cache#

Materialized Views#

When to Use#

PostgreSQL Example#

Refresh Strategies#

Denormalization#

Common Patterns#

Trade-Offs#

Batch Queries#

Batch Inserts#

Batch Updates#

Batch Deletes#

Optimization Checklist#

Comments

Related articles

AI-Powered Search Architecture: Semantic Search, Hybrid Search, and RAG

AI Safety Guardrails Architecture: Input Validation, Output Filtering, and Human-in-the-Loop

API Backward Compatibility: Ship Changes Without Breaking Consumers

Try these templates

PostgreSQL High Availability Cluster

Build this architecture

Database Query Optimization: From EXPLAIN Plans to Materialized Views

Reading EXPLAIN Plans#

PostgreSQL EXPLAIN ANALYZE#

MySQL EXPLAIN#

When Estimates Are Wrong#

Index Strategy#

Types of Indexes#

Index Selection Heuristics#

Composite Index Example#

Solving N+1 Queries#

Detection#

Solutions#

Query Caching#

Application-Level Cache#

Database-Level Cache#

Materialized Views#

When to Use#

PostgreSQL Example#

Refresh Strategies#

Denormalization#

Common Patterns#

Trade-Offs#

Batch Queries#

Batch Inserts#

Batch Updates#

Batch Deletes#

Optimization Checklist#

Comments

Related articles

AI-Powered Search Architecture: Semantic Search, Hybrid Search, and RAG

AI Safety Guardrails Architecture: Input Validation, Output Filtering, and Human-in-the-Loop

API Backward Compatibility: Ship Changes Without Breaking Consumers

Try these templates

PostgreSQL High Availability Cluster

Build this architecture