databasessystem-designdistributed-systemsscalability

Database Sharding Strategies — A Deep Dive into Partitioning at Scale

March 29, 2026 6 min readBy Codelit Team Discussion

What is database sharding?#

Sharding is a horizontal scaling strategy that splits a single database into multiple smaller databases — called shards — each holding a subset of the data. Every shard runs on its own server, so reads and writes distribute across machines instead of bottlenecking on one.

Unlike vertical scaling (bigger hardware), sharding scales linearly: add more shards, handle more load.

Why shard at all?#

You hit sharding territory when:

A single database server cannot handle the write throughput
The dataset exceeds the storage capacity of one machine
Read latency suffers because indexes no longer fit in memory
Replication lag on read replicas becomes unacceptable

Sharding is not free. It adds complexity to queries, transactions, and operations. Exhaust vertical scaling and read replicas first.

Shard key selection criteria#

The shard key determines which shard holds a given row. A poor choice creates hot shards, uneven storage, or impossible queries. Evaluate candidates on four axes:

Cardinality — the key must have enough distinct values to distribute evenly across shards
Write distribution — writes should spread uniformly, not cluster on one shard
Query isolation — common queries should target a single shard, not scatter across all of them
Growth pattern — the key should not cause monotonic routing (e.g., auto-increment IDs all landing on the latest shard)

Criteria	Good example	Bad example
Cardinality	`user_id` (millions)	`country` (dozens)
Write distribution	UUID-based tenant ID	timestamp (all writes hit latest shard)
Query isolation	tenant ID for SaaS apps	random hash (every query fans out)

Hash sharding#

Hash sharding applies a hash function to the shard key, then maps the hash to a shard using modulo or consistent hashing.

shard_number = hash(shard_key) % total_shards

Advantages:

Even data distribution regardless of key patterns
Simple to implement
Works well for point lookups

Disadvantages:

Range queries require scatter-gather across all shards
Adding shards requires rehashing (unless using consistent hashing)
No data locality for related records

Range sharding#

Range sharding assigns contiguous key ranges to each shard. For example, users A-M go to shard 1 and N-Z go to shard 2.

Shard 1: user_id 1 – 1,000,000
Shard 2: user_id 1,000,001 – 2,000,000
Shard 3: user_id 2,000,001 – 3,000,000

Advantages:

Range queries execute on a single shard (or a small subset)
Adjacent data lives together, improving cache locality
Easy to reason about data placement

Disadvantages:

Hot shards when writes cluster in one range (e.g., recent timestamps)
Uneven shard sizes as data grows asymmetrically
Requires manual or automated split/merge operations

Directory-based sharding#

A lookup service (the directory) maintains a mapping of each shard key to its shard. Every query first checks the directory, then routes to the correct shard.

Advantages:

Maximum flexibility — any key can live on any shard
Resharding is a directory update, not a data migration
Supports complex placement policies (geography, compliance)

Disadvantages:

The directory is a single point of failure and a latency bottleneck
Directory must be cached aggressively and kept consistent
Adds an extra network hop to every query

Cross-shard joins and transactions#

Sharding breaks joins. When order data lives on shard A and product data lives on shard B, a JOIN requires network round-trips.

Strategies for cross-shard queries:

Denormalization — duplicate data so each shard has everything it needs
Application-level joins — fetch from each shard separately, merge in the application
Materialized views — precompute joined data into a read-optimized store
Broadcast tables — replicate small reference tables to every shard

Cross-shard transactions are even harder. Options include:

Two-phase commit (2PC) — correct but slow and blocking
Saga pattern — break the transaction into compensatable local transactions
Avoid them — design the schema so transactions stay within a single shard

Resharding strategies#

When shards become unbalanced or you need to add capacity, you reshard.

Split and merge#

Split a hot shard into two smaller ones, or merge underutilized shards. This works well with range sharding.

Consistent hashing#

Use a hash ring so adding a shard only moves a fraction of keys. Virtual nodes improve balance. Systems like DynamoDB and Cassandra use this approach.

Double-write migration#

Start writing to both old and new shard configurations
Backfill historical data to the new shards
Verify consistency
Cut reads over to the new configuration
Stop writes to the old configuration

Ghost table approach#

Create the new shard layout as shadow tables, replicate data in the background using change data capture, then swap atomically.

Vitess implementation#

Vitess is the sharding middleware that powers YouTube and many large MySQL deployments.

VSchema defines the sharding strategy (hash, range, or lookup) per table
VTGate is the query router — it parses SQL, determines target shards, and merges results
VTTablet manages individual MySQL instances
Supports scatter-gather for cross-shard queries and sequence tables for auto-increment IDs
Resharding uses VReplication to stream changes to new shards with zero downtime

# Vitess VSchema example
keyspaces:
  commerce:
    sharded: true
    vindexes:
      hash:
        type: "hash"
    tables:
      orders:
        column_vindexes:
          - column: customer_id
            name: hash

Citus implementation#

Citus extends PostgreSQL with transparent sharding.

Distributed tables are sharded across worker nodes by a distribution column
Reference tables are replicated to every node (broadcast tables)
Co-location groups tables by the same distribution column so joins stay local
The coordinator node rewrites queries into per-shard fragments
Supports tenant isolation — move a hot tenant to a dedicated node

-- Distribute the orders table by customer_id
SELECT create_distributed_table('orders', 'customer_id');

-- Co-locate line_items with orders
SELECT create_distributed_table('line_items', 'customer_id',
  colocate_with =&gt; 'orders');

Hot shard mitigation#

A hot shard receives disproportionate traffic. Mitigation strategies:

Shard splitting — break the hot shard into smaller pieces
Read replicas per shard — offload read traffic
Caching layer — put Redis or Memcached in front of hot shards
Rate limiting — throttle abusive tenants at the application layer
Shard rebalancing — move some key ranges to less loaded shards
Salting — append a random suffix to the shard key to spread writes (trade-off: reads must fan out)

Monitoring sharded databases#

Track these metrics per shard:

Query latency (p50, p95, p99) — detect hot shards early
Storage utilization — trigger resharding before disks fill
Connection count — identify connection pool exhaustion
Replication lag — ensure replicas stay current
Cross-shard query ratio — high ratios indicate a poor shard key choice

Explore sharding visually#

On Codelit, generate a sharded database architecture to see how queries route through VTGate or a Citus coordinator. Click on individual shards to explore data distribution, replication, and failover paths.

This is article #375 in the Codelit engineering blog series.

Build and explore distributed database architectures visually at codelit.io.

{ }

Explore the WhatsApp architecture interactively

Try it →

Try it on Codelit

Chaos Mode

Simulate node failures and watch cascading impact across your architecture

Build this architecture →

Comments

api design

Batch API Endpoints — Patterns for Bulk Operations, Partial Success, and Idempotency

8 min read

system design

Circuit Breaker Implementation — State Machine, Failure Counting, Fallbacks, and Resilience4j

7 min read

api

API-First Design Methodology — Design Before You Implement

7 min read

Try these templates

WhatsApp-Scale Messaging System

End-to-end encrypted messaging with offline delivery, group chats, and media sharing at billions-of-messages scale.

9 components

Gmail-Scale Email Service

Email platform handling billions of messages with spam filtering, search indexing, attachment storage, and push notifications.

10 components

Build this architecture

Generate an interactive architecture for Database Sharding Strategies in seconds.

Try it in Codelit →

databasessystem-designdistributed-systemsscalability

Database Sharding Strategies — A Deep Dive into Partitioning at Scale

March 29, 2026 6 min readBy Codelit Team Discussion

What is database sharding?#

Unlike vertical scaling (bigger hardware), sharding scales linearly: add more shards, handle more load.

Why shard at all?#

You hit sharding territory when:

A single database server cannot handle the write throughput
The dataset exceeds the storage capacity of one machine
Read latency suffers because indexes no longer fit in memory
Replication lag on read replicas becomes unacceptable

Sharding is not free. It adds complexity to queries, transactions, and operations. Exhaust vertical scaling and read replicas first.

Shard key selection criteria#

The shard key determines which shard holds a given row. A poor choice creates hot shards, uneven storage, or impossible queries. Evaluate candidates on four axes:

Cardinality — the key must have enough distinct values to distribute evenly across shards
Write distribution — writes should spread uniformly, not cluster on one shard
Query isolation — common queries should target a single shard, not scatter across all of them
Growth pattern — the key should not cause monotonic routing (e.g., auto-increment IDs all landing on the latest shard)

Criteria	Good example	Bad example
Cardinality	`user_id` (millions)	`country` (dozens)
Write distribution	UUID-based tenant ID	timestamp (all writes hit latest shard)
Query isolation	tenant ID for SaaS apps	random hash (every query fans out)

Hash sharding#

Hash sharding applies a hash function to the shard key, then maps the hash to a shard using modulo or consistent hashing.

shard_number = hash(shard_key) % total_shards

Advantages:

Even data distribution regardless of key patterns
Simple to implement
Works well for point lookups

Disadvantages:

Range queries require scatter-gather across all shards
Adding shards requires rehashing (unless using consistent hashing)
No data locality for related records

Range sharding#

Range sharding assigns contiguous key ranges to each shard. For example, users A-M go to shard 1 and N-Z go to shard 2.

Shard 1: user_id 1 – 1,000,000
Shard 2: user_id 1,000,001 – 2,000,000
Shard 3: user_id 2,000,001 – 3,000,000

Advantages:

Range queries execute on a single shard (or a small subset)
Adjacent data lives together, improving cache locality
Easy to reason about data placement

Disadvantages:

Hot shards when writes cluster in one range (e.g., recent timestamps)
Uneven shard sizes as data grows asymmetrically
Requires manual or automated split/merge operations

Directory-based sharding#

A lookup service (the directory) maintains a mapping of each shard key to its shard. Every query first checks the directory, then routes to the correct shard.

Advantages:

Maximum flexibility — any key can live on any shard
Resharding is a directory update, not a data migration
Supports complex placement policies (geography, compliance)

Disadvantages:

The directory is a single point of failure and a latency bottleneck
Directory must be cached aggressively and kept consistent
Adds an extra network hop to every query

Cross-shard joins and transactions#

Sharding breaks joins. When order data lives on shard A and product data lives on shard B, a JOIN requires network round-trips.

Strategies for cross-shard queries:

Denormalization — duplicate data so each shard has everything it needs
Application-level joins — fetch from each shard separately, merge in the application
Materialized views — precompute joined data into a read-optimized store
Broadcast tables — replicate small reference tables to every shard

Cross-shard transactions are even harder. Options include:

Two-phase commit (2PC) — correct but slow and blocking
Saga pattern — break the transaction into compensatable local transactions
Avoid them — design the schema so transactions stay within a single shard

Resharding strategies#

When shards become unbalanced or you need to add capacity, you reshard.

Split and merge#

Split a hot shard into two smaller ones, or merge underutilized shards. This works well with range sharding.

Consistent hashing#

Use a hash ring so adding a shard only moves a fraction of keys. Virtual nodes improve balance. Systems like DynamoDB and Cassandra use this approach.

Double-write migration#

Start writing to both old and new shard configurations
Backfill historical data to the new shards
Verify consistency
Cut reads over to the new configuration
Stop writes to the old configuration

Ghost table approach#

Create the new shard layout as shadow tables, replicate data in the background using change data capture, then swap atomically.

Vitess implementation#

Vitess is the sharding middleware that powers YouTube and many large MySQL deployments.

VSchema defines the sharding strategy (hash, range, or lookup) per table
VTGate is the query router — it parses SQL, determines target shards, and merges results
VTTablet manages individual MySQL instances
Supports scatter-gather for cross-shard queries and sequence tables for auto-increment IDs
Resharding uses VReplication to stream changes to new shards with zero downtime

# Vitess VSchema example
keyspaces:
  commerce:
    sharded: true
    vindexes:
      hash:
        type: "hash"
    tables:
      orders:
        column_vindexes:
          - column: customer_id
            name: hash

Citus implementation#

Citus extends PostgreSQL with transparent sharding.

Distributed tables are sharded across worker nodes by a distribution column
Reference tables are replicated to every node (broadcast tables)
Co-location groups tables by the same distribution column so joins stay local
The coordinator node rewrites queries into per-shard fragments
Supports tenant isolation — move a hot tenant to a dedicated node

-- Distribute the orders table by customer_id
SELECT create_distributed_table('orders', 'customer_id');

-- Co-locate line_items with orders
SELECT create_distributed_table('line_items', 'customer_id',
  colocate_with =&gt; 'orders');

Hot shard mitigation#

A hot shard receives disproportionate traffic. Mitigation strategies:

Shard splitting — break the hot shard into smaller pieces
Read replicas per shard — offload read traffic
Caching layer — put Redis or Memcached in front of hot shards
Rate limiting — throttle abusive tenants at the application layer
Shard rebalancing — move some key ranges to less loaded shards
Salting — append a random suffix to the shard key to spread writes (trade-off: reads must fan out)

Monitoring sharded databases#

Track these metrics per shard:

Query latency (p50, p95, p99) — detect hot shards early
Storage utilization — trigger resharding before disks fill
Connection count — identify connection pool exhaustion
Replication lag — ensure replicas stay current
Cross-shard query ratio — high ratios indicate a poor shard key choice

Explore sharding visually#

This is article #375 in the Codelit engineering blog series.

Build and explore distributed database architectures visually at codelit.io.

{ }

Explore the WhatsApp architecture interactively

Try it →

Try it on Codelit

Chaos Mode

Simulate node failures and watch cascading impact across your architecture

Build this architecture →

Comments

api design

Batch API Endpoints — Patterns for Bulk Operations, Partial Success, and Idempotency

8 min read

system design

Circuit Breaker Implementation — State Machine, Failure Counting, Fallbacks, and Resilience4j

7 min read

api

API-First Design Methodology — Design Before You Implement

7 min read

Try these templates

WhatsApp-Scale Messaging System

End-to-end encrypted messaging with offline delivery, group chats, and media sharing at billions-of-messages scale.

9 components

Gmail-Scale Email Service

Email platform handling billions of messages with spam filtering, search indexing, attachment storage, and push notifications.

10 components

Build this architecture

Generate an interactive architecture for Database Sharding Strategies in seconds.

Try it in Codelit →

Database Sharding Strategies — A Deep Dive into Partitioning at Scale

What is database sharding?#

Why shard at all?#

Shard key selection criteria#

Hash sharding#

Range sharding#

Directory-based sharding#

Cross-shard joins and transactions#

Resharding strategies#

Split and merge#

Consistent hashing#

Double-write migration#

Ghost table approach#

Vitess implementation#

Citus implementation#

Hot shard mitigation#

Monitoring sharded databases#

Explore sharding visually#

Comments

Related articles

Batch API Endpoints — Patterns for Bulk Operations, Partial Success, and Idempotency

Circuit Breaker Implementation — State Machine, Failure Counting, Fallbacks, and Resilience4j

API-First Design Methodology — Design Before You Implement

Try these templates

WhatsApp-Scale Messaging System

Gmail-Scale Email Service

Build this architecture

Database Sharding Strategies — A Deep Dive into Partitioning at Scale

What is database sharding?#

Why shard at all?#

Shard key selection criteria#

Hash sharding#

Range sharding#

Directory-based sharding#

Cross-shard joins and transactions#

Resharding strategies#

Split and merge#

Consistent hashing#

Double-write migration#

Ghost table approach#

Vitess implementation#

Citus implementation#

Hot shard mitigation#

Monitoring sharded databases#

Explore sharding visually#

Comments

Related articles

Batch API Endpoints — Patterns for Bulk Operations, Partial Success, and Idempotency

Circuit Breaker Implementation — State Machine, Failure Counting, Fallbacks, and Resilience4j

API-First Design Methodology — Design Before You Implement

Try these templates

WhatsApp-Scale Messaging System

Gmail-Scale Email Service

Build this architecture