databasesperformancepostgresqlbackend

Database Indexing Strategies — B-Tree, Hash, GIN, Partial, and Covering Indexes

March 29, 2026 6 min readBy Codelit Team Discussion

Why indexing matters#

Without an index, every query performs a full table scan — the database reads every row to find matches. On a table with millions of rows, that means seconds of latency for a single lookup. An index is a separate data structure that lets the database jump directly to the rows it needs.

B-tree indexes#

B-tree (balanced tree) is the default index type in virtually every relational database. It keeps keys sorted, supports equality and range queries, and self-balances on insert.

CREATE INDEX idx_users_email ON users (email);

-- Uses the index for both:
SELECT * FROM users WHERE email = 'alice@example.com';
SELECT * FROM users WHERE email BETWEEN 'a' AND 'b';

B-tree vs B+tree#

Property	B-tree	B+tree
Data stored in	All nodes	Leaf nodes only
Leaf node linking	No	Yes (doubly linked)
Range scan efficiency	Lower	Higher
Used by	Some key-value stores	PostgreSQL, MySQL InnoDB

Most relational databases use B+tree internally even when they call it "B-tree" in their documentation.

Hash indexes#

Hash indexes map keys through a hash function to bucket locations. They are extremely fast for equality lookups but cannot serve range queries.

CREATE INDEX idx_sessions_token ON sessions USING hash (token);

-- Fast:
SELECT * FROM sessions WHERE token = 'abc123';

-- Cannot use the hash index:
SELECT * FROM sessions WHERE token &gt; 'abc';

PostgreSQL hash indexes became WAL-logged and crash-safe in version 10. Before that they were discouraged.

GiST indexes (PostgreSQL)#

Generalized Search Tree indexes support complex data types: geometric shapes, full-text search, ranges, and IP addresses.

CREATE INDEX idx_locations_point ON locations USING gist (coordinates);

SELECT * FROM locations
WHERE coordinates &lt;@&gt; point(40.7128, -74.0060) &lt; 5.0;

GiST indexes are lossy — they may return false positives that the database filters in a recheck step.

GIN indexes (PostgreSQL)#

Generalized Inverted Indexes are designed for values that contain multiple elements: arrays, JSONB fields, and full-text search vectors.

CREATE INDEX idx_posts_tags ON posts USING gin (tags);
CREATE INDEX idx_docs_body ON documents USING gin (to_tsvector('english', body));

-- Array containment:
SELECT * FROM posts WHERE tags @&gt; ARRAY['postgresql', 'indexing'];

-- Full-text search:
SELECT * FROM documents
WHERE to_tsvector('english', body) @@ to_tsquery('database &amp; indexing');

GIN indexes are slower to build and update than B-tree but dramatically speed up containment and search queries.

Composite indexes#

A composite index covers multiple columns. Column order matters — the index is useful for queries that filter on a leftmost prefix of the indexed columns.

CREATE INDEX idx_orders_user_date ON orders (user_id, created_at);

-- Uses the index (both columns):
SELECT * FROM orders WHERE user_id = 42 AND created_at &gt; '2026-01-01';

-- Uses the index (leftmost prefix):
SELECT * FROM orders WHERE user_id = 42;

-- Cannot use the index efficiently:
SELECT * FROM orders WHERE created_at &gt; '2026-01-01';

Rule of thumb: put high-selectivity equality columns first, then range columns.

Covering indexes#

A covering index includes all columns a query needs, enabling an index-only scan — the database never touches the heap table.

CREATE INDEX idx_orders_covering
ON orders (user_id, created_at)
INCLUDE (total_amount, status);

-- Index-only scan — no heap access:
SELECT total_amount, status
FROM orders
WHERE user_id = 42 AND created_at &gt; '2026-01-01';

In PostgreSQL, INCLUDE columns are stored in the index but not used for sorting or searching.

Partial indexes#

A partial index only covers rows matching a predicate. This keeps the index small and focused.

CREATE INDEX idx_orders_pending
ON orders (created_at)
WHERE status = 'pending';

-- Uses the partial index:
SELECT * FROM orders WHERE status = 'pending' AND created_at &gt; '2026-03-01';

Partial indexes are excellent for columns with skewed distributions — index the rare values you actually query.

Index-only scans#

An index-only scan occurs when the index contains every column the query needs. PostgreSQL checks the visibility map to confirm rows are visible without accessing the heap.

To verify an index-only scan:

EXPLAIN (ANALYZE, BUFFERS)
SELECT total_amount FROM orders WHERE user_id = 42;

Look for Index Only Scan in the plan output. If you see Heap Fetches: 0, the scan is fully satisfied by the index.

Tip: run VACUUM regularly to keep the visibility map up to date.

When to index#

Good candidates for indexing:

Columns in WHERE, JOIN, and ORDER BY clauses
Foreign keys (prevent sequential scans during cascading deletes)
Columns with high selectivity (many distinct values)

Avoid indexing when:

The table is small (sequential scan is faster)
The column has very low cardinality (e.g., a boolean flag on a large table)
Write throughput is critical and reads are infrequent

Index bloat#

As rows are updated and deleted, dead tuples leave holes in the index. Over time, the index grows larger than necessary — this is index bloat.

Detecting bloat#

-- pgstattuple extension:
CREATE EXTENSION IF NOT EXISTS pgstattuple;
SELECT * FROM pgstatindex('idx_users_email');

Key metrics: avg_leaf_density below 50% suggests significant bloat.

Fixing bloat#

REINDEX INDEX idx_users_email; — rebuilds the index (locks writes)
REINDEX INDEX CONCURRENTLY idx_users_email; — rebuilds without blocking (PostgreSQL 12+)
VACUUM (VERBOSE) table_name; — reclaims dead tuples

Monitoring index usage#

SELECT
    schemaname,
    relname AS table_name,
    indexrelname AS index_name,
    idx_scan AS times_used,
    pg_size_pretty(pg_relation_size(indexrelid)) AS index_size
FROM pg_stat_user_indexes
ORDER BY idx_scan ASC;

Indexes with idx_scan = 0 over a long period are unused — drop them to save storage and speed up writes.

Quick reference#

Index type	Equality	Range	Multi-value	Best for
B-tree	Yes	Yes	No	General purpose
Hash	Yes	No	No	Exact lookups
GiST	Yes	Yes	Yes	Geometry, ranges
GIN	Yes	No	Yes	Arrays, JSONB, FTS
Covering	Yes	Yes	No	Index-only scans
Partial	Yes	Yes	No	Filtered subsets

Wrapping up#

Indexing is the single highest-leverage tool for database performance. Start with the query patterns your application actually uses, add indexes to support them, monitor usage over time, and remove what is not pulling its weight.

This is article #241 on Codelit. If this guide helped you think about database indexing more clearly, explore the rest of our library for deep dives on PostgreSQL, system design, and backend performance.

Try it on Codelit

AI Architecture Review

Get an AI audit covering security gaps, bottlenecks, and scaling risks

Build this architecture →

Comments

API design

API Backward Compatibility: Ship Changes Without Breaking Consumers

6 min read

api design

Batch API Endpoints — Patterns for Bulk Operations, Partial Success, and Idempotency

8 min read

system design

Circuit Breaker Implementation — State Machine, Failure Counting, Fallbacks, and Resilience4j

7 min read

Try these templates

Multiplayer Game Backend

Real-time multiplayer game server with matchmaking, state sync, leaderboards, and anti-cheat.

8 components

PostgreSQL High Availability Cluster

Production PostgreSQL with streaming replication, connection pooling, automated failover, and monitoring.

10 components

Build this architecture

Generate an interactive architecture for Database Indexing Strategies in seconds.

Try it in Codelit →

databasesperformancepostgresqlbackend

Database Indexing Strategies — B-Tree, Hash, GIN, Partial, and Covering Indexes

March 29, 2026 6 min readBy Codelit Team Discussion

Why indexing matters#

B-tree indexes#

B-tree (balanced tree) is the default index type in virtually every relational database. It keeps keys sorted, supports equality and range queries, and self-balances on insert.

CREATE INDEX idx_users_email ON users (email);

-- Uses the index for both:
SELECT * FROM users WHERE email = 'alice@example.com';
SELECT * FROM users WHERE email BETWEEN 'a' AND 'b';

B-tree vs B+tree#

Property	B-tree	B+tree
Data stored in	All nodes	Leaf nodes only
Leaf node linking	No	Yes (doubly linked)
Range scan efficiency	Lower	Higher
Used by	Some key-value stores	PostgreSQL, MySQL InnoDB

Most relational databases use B+tree internally even when they call it "B-tree" in their documentation.

Hash indexes#

Hash indexes map keys through a hash function to bucket locations. They are extremely fast for equality lookups but cannot serve range queries.

CREATE INDEX idx_sessions_token ON sessions USING hash (token);

-- Fast:
SELECT * FROM sessions WHERE token = 'abc123';

-- Cannot use the hash index:
SELECT * FROM sessions WHERE token &gt; 'abc';

PostgreSQL hash indexes became WAL-logged and crash-safe in version 10. Before that they were discouraged.

GiST indexes (PostgreSQL)#

Generalized Search Tree indexes support complex data types: geometric shapes, full-text search, ranges, and IP addresses.

CREATE INDEX idx_locations_point ON locations USING gist (coordinates);

SELECT * FROM locations
WHERE coordinates &lt;@&gt; point(40.7128, -74.0060) &lt; 5.0;

GiST indexes are lossy — they may return false positives that the database filters in a recheck step.

GIN indexes (PostgreSQL)#

Generalized Inverted Indexes are designed for values that contain multiple elements: arrays, JSONB fields, and full-text search vectors.

CREATE INDEX idx_posts_tags ON posts USING gin (tags);
CREATE INDEX idx_docs_body ON documents USING gin (to_tsvector('english', body));

-- Array containment:
SELECT * FROM posts WHERE tags @&gt; ARRAY['postgresql', 'indexing'];

-- Full-text search:
SELECT * FROM documents
WHERE to_tsvector('english', body) @@ to_tsquery('database &amp; indexing');

GIN indexes are slower to build and update than B-tree but dramatically speed up containment and search queries.

Composite indexes#

A composite index covers multiple columns. Column order matters — the index is useful for queries that filter on a leftmost prefix of the indexed columns.

CREATE INDEX idx_orders_user_date ON orders (user_id, created_at);

-- Uses the index (both columns):
SELECT * FROM orders WHERE user_id = 42 AND created_at &gt; '2026-01-01';

-- Uses the index (leftmost prefix):
SELECT * FROM orders WHERE user_id = 42;

-- Cannot use the index efficiently:
SELECT * FROM orders WHERE created_at &gt; '2026-01-01';

Rule of thumb: put high-selectivity equality columns first, then range columns.

Covering indexes#

A covering index includes all columns a query needs, enabling an index-only scan — the database never touches the heap table.

CREATE INDEX idx_orders_covering
ON orders (user_id, created_at)
INCLUDE (total_amount, status);

-- Index-only scan — no heap access:
SELECT total_amount, status
FROM orders
WHERE user_id = 42 AND created_at &gt; '2026-01-01';

In PostgreSQL, INCLUDE columns are stored in the index but not used for sorting or searching.

Partial indexes#

A partial index only covers rows matching a predicate. This keeps the index small and focused.

CREATE INDEX idx_orders_pending
ON orders (created_at)
WHERE status = 'pending';

-- Uses the partial index:
SELECT * FROM orders WHERE status = 'pending' AND created_at &gt; '2026-03-01';

Partial indexes are excellent for columns with skewed distributions — index the rare values you actually query.

Index-only scans#

An index-only scan occurs when the index contains every column the query needs. PostgreSQL checks the visibility map to confirm rows are visible without accessing the heap.

To verify an index-only scan:

EXPLAIN (ANALYZE, BUFFERS)
SELECT total_amount FROM orders WHERE user_id = 42;

Look for Index Only Scan in the plan output. If you see Heap Fetches: 0, the scan is fully satisfied by the index.

Tip: run VACUUM regularly to keep the visibility map up to date.

When to index#

Good candidates for indexing:

Columns in WHERE, JOIN, and ORDER BY clauses
Foreign keys (prevent sequential scans during cascading deletes)
Columns with high selectivity (many distinct values)

Avoid indexing when:

The table is small (sequential scan is faster)
The column has very low cardinality (e.g., a boolean flag on a large table)
Write throughput is critical and reads are infrequent

Index bloat#

As rows are updated and deleted, dead tuples leave holes in the index. Over time, the index grows larger than necessary — this is index bloat.

Detecting bloat#

-- pgstattuple extension:
CREATE EXTENSION IF NOT EXISTS pgstattuple;
SELECT * FROM pgstatindex('idx_users_email');

Key metrics: avg_leaf_density below 50% suggests significant bloat.

Fixing bloat#

REINDEX INDEX idx_users_email; — rebuilds the index (locks writes)
REINDEX INDEX CONCURRENTLY idx_users_email; — rebuilds without blocking (PostgreSQL 12+)
VACUUM (VERBOSE) table_name; — reclaims dead tuples

Monitoring index usage#

SELECT
    schemaname,
    relname AS table_name,
    indexrelname AS index_name,
    idx_scan AS times_used,
    pg_size_pretty(pg_relation_size(indexrelid)) AS index_size
FROM pg_stat_user_indexes
ORDER BY idx_scan ASC;

Indexes with idx_scan = 0 over a long period are unused — drop them to save storage and speed up writes.

Quick reference#

Index type	Equality	Range	Multi-value	Best for
B-tree	Yes	Yes	No	General purpose
Hash	Yes	No	No	Exact lookups
GiST	Yes	Yes	Yes	Geometry, ranges
GIN	Yes	No	Yes	Arrays, JSONB, FTS
Covering	Yes	Yes	No	Index-only scans
Partial	Yes	Yes	No	Filtered subsets

Wrapping up#

Try it on Codelit

AI Architecture Review

Get an AI audit covering security gaps, bottlenecks, and scaling risks

Build this architecture →

Comments

API design

Try these templates

Multiplayer Game Backend

Real-time multiplayer game server with matchmaking, state sync, leaderboards, and anti-cheat.

8 components

PostgreSQL High Availability Cluster

Production PostgreSQL with streaming replication, connection pooling, automated failover, and monitoring.

10 components

Build this architecture

Generate an interactive architecture for Database Indexing Strategies in seconds.

Try it in Codelit →

Database Indexing Strategies — B-Tree, Hash, GIN, Partial, and Covering Indexes

Why indexing matters#

B-tree indexes#

B-tree vs B+tree#

Hash indexes#

GiST indexes (PostgreSQL)#

GIN indexes (PostgreSQL)#

Composite indexes#

Covering indexes#

Partial indexes#

Index-only scans#

When to index#

Index bloat#

Detecting bloat#

Fixing bloat#

Monitoring index usage#

Quick reference#

Wrapping up#

Comments

Related articles

API Backward Compatibility: Ship Changes Without Breaking Consumers

Batch API Endpoints — Patterns for Bulk Operations, Partial Success, and Idempotency

Circuit Breaker Implementation — State Machine, Failure Counting, Fallbacks, and Resilience4j

Try these templates

Multiplayer Game Backend

PostgreSQL High Availability Cluster

Build this architecture

Database Indexing Strategies — B-Tree, Hash, GIN, Partial, and Covering Indexes

Why indexing matters#

B-tree indexes#

B-tree vs B+tree#

Hash indexes#

GiST indexes (PostgreSQL)#

GIN indexes (PostgreSQL)#

Composite indexes#

Covering indexes#

Partial indexes#

Index-only scans#

When to index#

Index bloat#

Detecting bloat#

Fixing bloat#

Monitoring index usage#

Quick reference#

Wrapping up#

Comments

Related articles

API Backward Compatibility: Ship Changes Without Breaking Consumers

Batch API Endpoints — Patterns for Bulk Operations, Partial Success, and Idempotency

Circuit Breaker Implementation — State Machine, Failure Counting, Fallbacks, and Resilience4j

Try these templates

Multiplayer Game Backend

PostgreSQL High Availability Cluster

Build this architecture