Database Query Optimization: From EXPLAIN Plans to Materialized Views
A feature that works in development with 100 rows can grind to a halt in production with 10 million. Database query optimization is not a nice-to-have — it is the difference between a snappy product and one that hemorrhages users waiting for pages to load.
Reading EXPLAIN Plans#
Every major database engine provides an EXPLAIN command that reveals how the engine plans to execute your query. Learning to read it is the single highest-leverage optimization skill.
PostgreSQL EXPLAIN ANALYZE#
EXPLAIN ANALYZE
SELECT o.id, o.total, c.name
FROM orders o
JOIN customers c ON c.id = o.customer_id
WHERE o.created_at >= '2026-01-01'
ORDER BY o.total DESC
LIMIT 20;
Key fields to examine:
| Field | What it tells you |
|---|---|
| Seq Scan | Full table scan — often a sign of a missing index |
| Index Scan | Using an index — usually good |
| Bitmap Heap Scan | Index narrows rows, then fetches from heap — efficient for medium selectivity |
| Nested Loop | Row-by-row join — fast for small outer sets, catastrophic for large ones |
| Hash Join | Builds hash table from one side — good for large equi-joins |
| Sort | Explicit sort operation — check if an index could provide order |
| Actual Time | Wall-clock time in milliseconds |
| Rows | Estimated vs. actual row count — large gaps indicate stale statistics |
MySQL EXPLAIN#
MySQL uses a tabular format. Watch for:
type: ALL(full table scan) vs.type: refortype: range(index usage).Extra: Using filesort— the engine cannot use an index for ORDER BY.Extra: Using temporary— a temporary table is created, often for GROUP BY or DISTINCT.
When Estimates Are Wrong#
If the planner estimates 100 rows but the actual count is 100,000, the chosen plan may be wildly suboptimal. Fix this by:
- Running
ANALYZE(PostgreSQL) orANALYZE TABLE(MySQL) to refresh statistics. - Increasing the statistics target for columns with skewed distributions.
- Using
pg_stat_statementsto find the slowest queries in production.
Index Strategy#
Types of Indexes#
- B-tree — Default for most columns. Supports equality, range, and ordering.
- Hash — Equality only. Rarely the best choice in modern PostgreSQL (B-tree is competitive).
- GIN (Generalized Inverted Index) — Full-text search, JSONB containment, array overlap.
- GiST — Geometric data, range types, nearest-neighbor search.
- Partial index — Index only rows matching a predicate. Example:
CREATE INDEX idx_active_orders ON orders(created_at) WHERE status = 'active'; - Covering index (INCLUDE) — Stores extra columns in the index leaf pages so the query never touches the heap.
Index Selection Heuristics#
- Start with WHERE clauses — Columns that appear in filters are the first candidates.
- Consider JOIN keys — Foreign key columns should almost always be indexed.
- ORDER BY and GROUP BY — An index matching the sort order avoids a filesort.
- Composite indexes — Column order matters. Place equality columns first, then range columns.
- Don't over-index — Every index slows down writes and consumes storage. Audit unused indexes with
pg_stat_user_indexes.
Composite Index Example#
-- Query pattern: filter by status, range on created_at, order by total
SELECT * FROM orders
WHERE status = 'shipped'
AND created_at >= '2026-01-01'
ORDER BY total DESC
LIMIT 50;
-- Optimal composite index
CREATE INDEX idx_orders_status_created_total
ON orders(status, created_at, total DESC);
The index handles the equality filter, the range filter, and the sort in a single B-tree traversal.
Solving N+1 Queries#
The N+1 problem occurs when your application executes one query to fetch a list and then N additional queries to fetch related data for each item.
Detection#
- ORM logging — Enable query logging and look for repeated patterns.
- APM tools — Datadog, New Relic, and Scout highlight N+1 patterns automatically.
- pg_stat_statements — A query executed thousands of times per second with a low per-execution cost is a strong N+1 signal.
Solutions#
Eager loading (ORM-level)
# Django: select_related for FK, prefetch_related for M2M
orders = Order.objects.select_related('customer').prefetch_related('items')
# Rails: includes
orders = Order.includes(:customer, :items).where(status: 'active')
JOIN in raw SQL
SELECT o.id, o.total, c.name, i.product_name, i.quantity
FROM orders o
JOIN customers c ON c.id = o.customer_id
JOIN order_items i ON i.order_id = o.id
WHERE o.status = 'active';
Batch loading (GraphQL / DataLoader pattern)
Collect all requested IDs, then issue a single WHERE id IN (...) query. Libraries like DataLoader (JavaScript) and Strawberry DataLoader (Python) formalize this pattern.
Query Caching#
Application-Level Cache#
Cache the result set, not the query string. Use a deterministic cache key derived from the query parameters.
cache_key = "orders:status=active:page=3:per=20"
TTL = 60 seconds
Invalidate on writes using:
- Write-through — Update cache immediately after the database write.
- Cache-aside with TTL — Let entries expire naturally. Simpler but allows brief staleness.
- Event-driven invalidation — Listen to CDC (Change Data Capture) events and evict affected keys.
Database-Level Cache#
- PostgreSQL shared_buffers — Tune to 25 % of available RAM as a starting point.
- MySQL query cache — Deprecated in MySQL 8.0. Use ProxySQL or application-level caching instead.
- PgBouncer / ProxySQL — Connection poolers that reduce overhead from connection setup and prepared statement re-parsing.
Materialized Views#
A materialized view stores the result of a query physically on disk. It trades write-time cost for read-time speed.
When to Use#
- Dashboards and analytics queries that aggregate millions of rows.
- Queries with expensive JOINs across multiple large tables.
- Data that changes infrequently relative to how often it is read.
PostgreSQL Example#
CREATE MATERIALIZED VIEW mv_daily_revenue AS
SELECT
date_trunc('day', created_at) AS day,
SUM(total) AS revenue,
COUNT(*) AS order_count
FROM orders
WHERE status = 'completed'
GROUP BY 1;
-- Refresh (blocks reads during refresh)
REFRESH MATERIALIZED VIEW mv_daily_revenue;
-- Concurrent refresh (requires a unique index)
CREATE UNIQUE INDEX idx_mv_daily_revenue_day ON mv_daily_revenue(day);
REFRESH MATERIALIZED VIEW CONCURRENTLY mv_daily_revenue;
Refresh Strategies#
| Strategy | Latency | Complexity |
|---|---|---|
| Cron job (e.g., every 5 min) | Minutes | Low |
| Trigger-based refresh | Seconds | Medium |
| CDC + event-driven refresh | Seconds | High |
| Incremental materialized view (Materialize, ksqlDB) | Sub-second | High |
Denormalization#
Normalization eliminates redundancy. Denormalization re-introduces it strategically to avoid expensive JOINs at read time.
Common Patterns#
- Embedding aggregates — Store
order_countdirectly on thecustomerstable. Update it via a trigger or application logic. - Duplicating columns — Store
customer_nameon theorderstable so dashboards skip the JOIN. - JSON columns — Store nested data (e.g., line items) as JSONB on the parent row for single-row reads.
Trade-Offs#
| Benefit | Cost |
|---|---|
| Faster reads (fewer JOINs) | Slower writes (must update multiple locations) |
| Simpler read queries | Risk of data inconsistency |
| Reduced query complexity for analytics | Harder schema evolution |
Rule of thumb: Denormalize when your read-to-write ratio is high (100:1 or more) and the duplicated data changes infrequently.
Batch Queries#
Batch Inserts#
Instead of inserting rows one at a time, use multi-value inserts:
INSERT INTO events (user_id, event_type, created_at)
VALUES
(1, 'click', NOW()),
(2, 'view', NOW()),
(3, 'purchase', NOW());
Most ORMs support bulk creation:
# Django
Event.objects.bulk_create([Event(...), Event(...), Event(...)])
# SQLAlchemy
session.bulk_save_objects([Event(...), Event(...)])
Batch Updates#
Use UPDATE ... FROM (PostgreSQL) or CASE expressions:
UPDATE products
SET price = v.new_price
FROM (VALUES (1, 29.99), (2, 49.99), (3, 9.99)) AS v(id, new_price)
WHERE products.id = v.id;
Batch Deletes#
Delete in chunks to avoid long-running transactions and lock contention:
DELETE FROM logs
WHERE id IN (
SELECT id FROM logs
WHERE created_at < '2025-01-01'
LIMIT 10000
);
-- Repeat until zero rows affected
Optimization Checklist#
- Measure first — Use EXPLAIN ANALYZE, not guesswork.
- Index strategically — Cover your most frequent query patterns without over-indexing.
- Eliminate N+1 — Eager load, JOIN, or batch-load related data.
- Cache hot paths — Application-level cache with smart invalidation.
- Materialize expensive aggregations — Refresh on a schedule or via events.
- Denormalize deliberately — Only where read-heavy patterns justify the write overhead.
- Batch writes — Multi-value inserts, chunked deletes, bulk updates.
- Monitor continuously — Track slow query logs,
pg_stat_statements, and query count trends.
The best-performing database is one where every query has a plan, every index has a purpose, and every optimization decision is backed by measurement.
This is article #288 on Codelit.io — leveling up your engineering knowledge, one deep dive at a time. Explore more at codelit.io.
Try it on Codelit
AI Architecture Review
Get an AI audit covering security gaps, bottlenecks, and scaling risks
Related articles
Try these templates
Build this architecture
Generate an interactive architecture for Database Query Optimization in seconds.
Try it in Codelit →
Comments