Database Indexing Strategies — B-Tree, Hash, GIN, Partial, and Covering Indexes
Why indexing matters#
Without an index, every query performs a full table scan — the database reads every row to find matches. On a table with millions of rows, that means seconds of latency for a single lookup. An index is a separate data structure that lets the database jump directly to the rows it needs.
B-tree indexes#
B-tree (balanced tree) is the default index type in virtually every relational database. It keeps keys sorted, supports equality and range queries, and self-balances on insert.
CREATE INDEX idx_users_email ON users (email);
-- Uses the index for both:
SELECT * FROM users WHERE email = 'alice@example.com';
SELECT * FROM users WHERE email BETWEEN 'a' AND 'b';
B-tree vs B+tree#
| Property | B-tree | B+tree |
|---|---|---|
| Data stored in | All nodes | Leaf nodes only |
| Leaf node linking | No | Yes (doubly linked) |
| Range scan efficiency | Lower | Higher |
| Used by | Some key-value stores | PostgreSQL, MySQL InnoDB |
Most relational databases use B+tree internally even when they call it "B-tree" in their documentation.
Hash indexes#
Hash indexes map keys through a hash function to bucket locations. They are extremely fast for equality lookups but cannot serve range queries.
CREATE INDEX idx_sessions_token ON sessions USING hash (token);
-- Fast:
SELECT * FROM sessions WHERE token = 'abc123';
-- Cannot use the hash index:
SELECT * FROM sessions WHERE token > 'abc';
PostgreSQL hash indexes became WAL-logged and crash-safe in version 10. Before that they were discouraged.
GiST indexes (PostgreSQL)#
Generalized Search Tree indexes support complex data types: geometric shapes, full-text search, ranges, and IP addresses.
CREATE INDEX idx_locations_point ON locations USING gist (coordinates);
SELECT * FROM locations
WHERE coordinates <@> point(40.7128, -74.0060) < 5.0;
GiST indexes are lossy — they may return false positives that the database filters in a recheck step.
GIN indexes (PostgreSQL)#
Generalized Inverted Indexes are designed for values that contain multiple elements: arrays, JSONB fields, and full-text search vectors.
CREATE INDEX idx_posts_tags ON posts USING gin (tags);
CREATE INDEX idx_docs_body ON documents USING gin (to_tsvector('english', body));
-- Array containment:
SELECT * FROM posts WHERE tags @> ARRAY['postgresql', 'indexing'];
-- Full-text search:
SELECT * FROM documents
WHERE to_tsvector('english', body) @@ to_tsquery('database & indexing');
GIN indexes are slower to build and update than B-tree but dramatically speed up containment and search queries.
Composite indexes#
A composite index covers multiple columns. Column order matters — the index is useful for queries that filter on a leftmost prefix of the indexed columns.
CREATE INDEX idx_orders_user_date ON orders (user_id, created_at);
-- Uses the index (both columns):
SELECT * FROM orders WHERE user_id = 42 AND created_at > '2026-01-01';
-- Uses the index (leftmost prefix):
SELECT * FROM orders WHERE user_id = 42;
-- Cannot use the index efficiently:
SELECT * FROM orders WHERE created_at > '2026-01-01';
Rule of thumb: put high-selectivity equality columns first, then range columns.
Covering indexes#
A covering index includes all columns a query needs, enabling an index-only scan — the database never touches the heap table.
CREATE INDEX idx_orders_covering
ON orders (user_id, created_at)
INCLUDE (total_amount, status);
-- Index-only scan — no heap access:
SELECT total_amount, status
FROM orders
WHERE user_id = 42 AND created_at > '2026-01-01';
In PostgreSQL, INCLUDE columns are stored in the index but not used for sorting or searching.
Partial indexes#
A partial index only covers rows matching a predicate. This keeps the index small and focused.
CREATE INDEX idx_orders_pending
ON orders (created_at)
WHERE status = 'pending';
-- Uses the partial index:
SELECT * FROM orders WHERE status = 'pending' AND created_at > '2026-03-01';
Partial indexes are excellent for columns with skewed distributions — index the rare values you actually query.
Index-only scans#
An index-only scan occurs when the index contains every column the query needs. PostgreSQL checks the visibility map to confirm rows are visible without accessing the heap.
To verify an index-only scan:
EXPLAIN (ANALYZE, BUFFERS)
SELECT total_amount FROM orders WHERE user_id = 42;
Look for Index Only Scan in the plan output. If you see Heap Fetches: 0, the scan is fully satisfied by the index.
Tip: run VACUUM regularly to keep the visibility map up to date.
When to index#
Good candidates for indexing:
- Columns in
WHERE,JOIN, andORDER BYclauses - Foreign keys (prevent sequential scans during cascading deletes)
- Columns with high selectivity (many distinct values)
Avoid indexing when:
- The table is small (sequential scan is faster)
- The column has very low cardinality (e.g., a boolean flag on a large table)
- Write throughput is critical and reads are infrequent
Index bloat#
As rows are updated and deleted, dead tuples leave holes in the index. Over time, the index grows larger than necessary — this is index bloat.
Detecting bloat#
-- pgstattuple extension:
CREATE EXTENSION IF NOT EXISTS pgstattuple;
SELECT * FROM pgstatindex('idx_users_email');
Key metrics: avg_leaf_density below 50% suggests significant bloat.
Fixing bloat#
REINDEX INDEX idx_users_email;— rebuilds the index (locks writes)REINDEX INDEX CONCURRENTLY idx_users_email;— rebuilds without blocking (PostgreSQL 12+)VACUUM (VERBOSE) table_name;— reclaims dead tuples
Monitoring index usage#
SELECT
schemaname,
relname AS table_name,
indexrelname AS index_name,
idx_scan AS times_used,
pg_size_pretty(pg_relation_size(indexrelid)) AS index_size
FROM pg_stat_user_indexes
ORDER BY idx_scan ASC;
Indexes with idx_scan = 0 over a long period are unused — drop them to save storage and speed up writes.
Quick reference#
| Index type | Equality | Range | Multi-value | Best for |
|---|---|---|---|---|
| B-tree | Yes | Yes | No | General purpose |
| Hash | Yes | No | No | Exact lookups |
| GiST | Yes | Yes | Yes | Geometry, ranges |
| GIN | Yes | No | Yes | Arrays, JSONB, FTS |
| Covering | Yes | Yes | No | Index-only scans |
| Partial | Yes | Yes | No | Filtered subsets |
Wrapping up#
Indexing is the single highest-leverage tool for database performance. Start with the query patterns your application actually uses, add indexes to support them, monitor usage over time, and remove what is not pulling its weight.
This is article #241 on Codelit. If this guide helped you think about database indexing more clearly, explore the rest of our library for deep dives on PostgreSQL, system design, and backend performance.
Try it on Codelit
AI Architecture Review
Get an AI audit covering security gaps, bottlenecks, and scaling risks
Related articles
API Backward Compatibility: Ship Changes Without Breaking Consumers
6 min read
api designBatch API Endpoints — Patterns for Bulk Operations, Partial Success, and Idempotency
8 min read
system designCircuit Breaker Implementation — State Machine, Failure Counting, Fallbacks, and Resilience4j
7 min read
Try these templates
Build this architecture
Generate an interactive architecture for Database Indexing Strategies in seconds.
Try it in Codelit →
Comments