Database Index Design Patterns — Covering, Partial, Expression, and Beyond
Why index design matters more than index existence#
Adding an index is easy. Adding the right index is the difference between a 2ms query and a 2-second one. Most teams create indexes reactively — a query is slow, so they slap an index on the WHERE column. That works until it doesn't.
Real index design requires understanding your query patterns, data distribution, and the trade-offs between read speed and write overhead.
B-tree indexes — the default workhorse#
B-tree is the default index type in PostgreSQL, MySQL, and most relational databases. It works for equality checks, range scans, and sorting.
CREATE INDEX idx_users_email ON users (email);
B-trees are balanced — lookups are O(log n). They support operators like =, <, >, <=, >=, and BETWEEN.
When B-tree falls short: full-text search, JSON queries, array containment, and geometric lookups all need specialized index types.
Covering indexes — eliminate table lookups#
A covering index includes all columns a query needs. The database can answer the query entirely from the index without touching the heap (table data).
-- Query: SELECT email, name FROM users WHERE email = 'x@y.com'
-- Covering index:
CREATE INDEX idx_users_email_name ON users (email) INCLUDE (name);
The INCLUDE clause (PostgreSQL 11+) adds columns to the leaf pages without affecting the sort order. MySQL achieves this with composite indexes.
Index-only scans#
When a covering index satisfies the query, PostgreSQL performs an index-only scan. You can verify this in EXPLAIN ANALYZE:
Index Only Scan using idx_users_email_name on users
Index Cond: (email = 'x@y.com')
Heap Fetches: 0
Heap Fetches: 0 means the table was never touched. This is the fastest possible read path.
Caveat: index-only scans require the visibility map to be up-to-date. Run VACUUM regularly on heavily updated tables.
Multi-column indexes — column order matters#
A multi-column index on (a, b, c) supports queries filtering on:
aaloneaandba,b, andc
It does not efficiently support queries filtering only on b or c. This is the leftmost prefix rule.
-- Good: uses the index
SELECT * FROM orders WHERE customer_id = 5 AND status = 'shipped';
-- Bad: cannot use index on (customer_id, status)
SELECT * FROM orders WHERE status = 'shipped';
Choosing column order#
- Equality columns first — columns compared with
=go leftmost - Range columns last — columns with
<,>,BETWEENgo rightmost - High cardinality first — when multiple equality columns exist, put the most selective one first
-- Optimal for: WHERE tenant_id = ? AND created_at > ?
CREATE INDEX idx_orders_tenant_created
ON orders (tenant_id, created_at);
Partial indexes — index only what you query#
A partial index includes only rows matching a predicate. Smaller index = faster scans, less storage, less write overhead.
-- Only 2% of orders are 'pending', but you query them constantly
CREATE INDEX idx_orders_pending
ON orders (created_at)
WHERE status = 'pending';
This index is a fraction of the size of a full index on created_at. PostgreSQL uses it when the query includes the matching WHERE clause.
Common use cases#
- Soft deletes:
WHERE deleted_at IS NULL— index only active records - Status filters:
WHERE status = 'active'— index the hot subset - Null exclusion:
WHERE column IS NOT NULL— skip sparse data
Expression indexes — index computed values#
When queries filter on a function or expression, a regular column index won't help. Expression indexes solve this.
-- Query: WHERE LOWER(email) = 'x@y.com'
CREATE INDEX idx_users_lower_email ON users (LOWER(email));
-- Query: WHERE (data ->> 'type') = 'premium'
CREATE INDEX idx_events_type ON events ((data ->> 'type'));
The expression in the index must exactly match the expression in the query. PostgreSQL compares expressions literally.
Performance tip: expression indexes are recomputed on every INSERT and UPDATE. Use them when the expression is cheap and the read benefit is large.
GIN indexes — full-text search and JSON#
GIN (Generalized Inverted Index) maps values to the rows containing them. It's the backbone of full-text search, array operations, and JSONB queries in PostgreSQL.
Full-text search#
-- Create a tsvector column or use an expression index
CREATE INDEX idx_articles_search
ON articles USING GIN (to_tsvector('english', title || ' ' || body));
-- Query
SELECT * FROM articles
WHERE to_tsvector('english', title || ' ' || body)
@@ to_tsquery('english', 'kubernetes & scaling');
GIN breaks the text into lexemes and builds an inverted index. Searches are fast even on millions of documents.
JSONB containment#
CREATE INDEX idx_events_data ON events USING GIN (data);
-- Query: find events where data contains {"type": "click"}
SELECT * FROM events WHERE data @> '{"type": "click"}';
Trade-off: GIN indexes are slower to update than B-tree. They use a pending list that gets merged during VACUUM. For write-heavy tables, tune gin_pending_list_limit.
BRIN indexes — time-series and append-only data#
BRIN (Block Range Index) stores summary information (min/max) for ranges of physical pages. It's tiny — often 1000x smaller than a B-tree — and perfect for naturally ordered data.
-- Time-series table where rows are inserted in timestamp order
CREATE INDEX idx_events_time ON events USING BRIN (created_at);
BRIN works because physically adjacent rows have similar created_at values. When you query WHERE created_at BETWEEN '2026-03-01' AND '2026-03-29', BRIN skips entire page ranges that fall outside the window.
When BRIN works well#
- Append-only or insert-mostly tables
- Data is physically ordered by the indexed column
- Table is large (millions of rows)
- You need low storage overhead
When BRIN fails#
- Randomly inserted data (physical order doesn't match logical order)
- Frequent updates that scatter values across pages
- Point lookups (B-tree is faster for exact matches)
Choosing the right index type — decision guide#
| Pattern | Index type | Why |
|---|---|---|
| Equality and range | B-tree | Default, fast for most queries |
| Query reads only indexed columns | Covering (INCLUDE) | Avoids heap fetches |
| Subset of rows queried | Partial | Smaller, faster index |
| Function in WHERE clause | Expression | Matches computed predicates |
| Full-text search | GIN | Inverted index for lexemes |
| JSONB containment | GIN | Supports @> and ? operators |
| Time-series range scans | BRIN | Tiny index, huge tables |
| Geospatial queries | GiST/SP-GiST | R-tree for geometry |
Anti-patterns to avoid#
Indexing every column — each index slows writes and consumes storage. Only index columns that appear in WHERE, JOIN, and ORDER BY.
Redundant indexes — an index on (a, b) makes a separate index on (a) redundant. Audit with pg_stat_user_indexes.
Unused indexes — check idx_scan in pg_stat_user_indexes. If it's zero for weeks, drop the index.
Wrong column order — placing a low-cardinality column first in a composite index wastes selectivity.
Monitoring index health#
-- Find unused indexes
SELECT schemaname, indexrelname, idx_scan
FROM pg_stat_user_indexes
WHERE idx_scan = 0
ORDER BY pg_relation_size(indexrelid) DESC;
-- Check index size vs table size
SELECT pg_size_pretty(pg_indexes_size('orders')) AS index_size,
pg_size_pretty(pg_table_size('orders')) AS table_size;
If your total index size exceeds your table size, you likely have redundant or unused indexes.
Visualize your database architecture#
Map out your index strategy alongside your schema and query patterns — try Codelit to generate interactive architecture diagrams.
Key takeaways#
- Covering indexes eliminate heap fetches — use INCLUDE for read-heavy queries
- Column order matters — equality first, range last, high cardinality first
- Partial indexes reduce size and overhead for hot subsets
- Expression indexes match computed WHERE clauses exactly
- GIN powers full-text search and JSONB — accept slower writes
- BRIN is ideal for time-series data with natural physical ordering
- Audit regularly — unused and redundant indexes cost you on every write
Article #428 in the Codelit engineering series. Explore our full library of system design, infrastructure, and architecture guides at codelit.io.
Try it on Codelit
Chaos Mode
Simulate node failures and watch cascading impact across your architecture
AI Architecture Review
Get an AI audit covering security gaps, bottlenecks, and scaling risks
Related articles
API Backward Compatibility: Ship Changes Without Breaking Consumers
6 min read
api designBatch API Endpoints — Patterns for Bulk Operations, Partial Success, and Idempotency
8 min read
system designCircuit Breaker Implementation — State Machine, Failure Counting, Fallbacks, and Resilience4j
7 min read
Try these templates
Build this architecture
Generate an interactive architecture for Database Index Design Patterns in seconds.
Try it in Codelit →
Comments