Database Sharding Strategies — A Deep Dive into Partitioning at Scale
What is database sharding?#
Sharding is a horizontal scaling strategy that splits a single database into multiple smaller databases — called shards — each holding a subset of the data. Every shard runs on its own server, so reads and writes distribute across machines instead of bottlenecking on one.
Unlike vertical scaling (bigger hardware), sharding scales linearly: add more shards, handle more load.
Why shard at all?#
You hit sharding territory when:
- A single database server cannot handle the write throughput
- The dataset exceeds the storage capacity of one machine
- Read latency suffers because indexes no longer fit in memory
- Replication lag on read replicas becomes unacceptable
Sharding is not free. It adds complexity to queries, transactions, and operations. Exhaust vertical scaling and read replicas first.
Shard key selection criteria#
The shard key determines which shard holds a given row. A poor choice creates hot shards, uneven storage, or impossible queries. Evaluate candidates on four axes:
- Cardinality — the key must have enough distinct values to distribute evenly across shards
- Write distribution — writes should spread uniformly, not cluster on one shard
- Query isolation — common queries should target a single shard, not scatter across all of them
- Growth pattern — the key should not cause monotonic routing (e.g., auto-increment IDs all landing on the latest shard)
| Criteria | Good example | Bad example |
|---|---|---|
| Cardinality | user_id (millions) | country (dozens) |
| Write distribution | UUID-based tenant ID | timestamp (all writes hit latest shard) |
| Query isolation | tenant ID for SaaS apps | random hash (every query fans out) |
Hash sharding#
Hash sharding applies a hash function to the shard key, then maps the hash to a shard using modulo or consistent hashing.
shard_number = hash(shard_key) % total_shards
Advantages:
- Even data distribution regardless of key patterns
- Simple to implement
- Works well for point lookups
Disadvantages:
- Range queries require scatter-gather across all shards
- Adding shards requires rehashing (unless using consistent hashing)
- No data locality for related records
Range sharding#
Range sharding assigns contiguous key ranges to each shard. For example, users A-M go to shard 1 and N-Z go to shard 2.
Shard 1: user_id 1 – 1,000,000
Shard 2: user_id 1,000,001 – 2,000,000
Shard 3: user_id 2,000,001 – 3,000,000
Advantages:
- Range queries execute on a single shard (or a small subset)
- Adjacent data lives together, improving cache locality
- Easy to reason about data placement
Disadvantages:
- Hot shards when writes cluster in one range (e.g., recent timestamps)
- Uneven shard sizes as data grows asymmetrically
- Requires manual or automated split/merge operations
Directory-based sharding#
A lookup service (the directory) maintains a mapping of each shard key to its shard. Every query first checks the directory, then routes to the correct shard.
Advantages:
- Maximum flexibility — any key can live on any shard
- Resharding is a directory update, not a data migration
- Supports complex placement policies (geography, compliance)
Disadvantages:
- The directory is a single point of failure and a latency bottleneck
- Directory must be cached aggressively and kept consistent
- Adds an extra network hop to every query
Cross-shard joins and transactions#
Sharding breaks joins. When order data lives on shard A and product data lives on shard B, a JOIN requires network round-trips.
Strategies for cross-shard queries:
- Denormalization — duplicate data so each shard has everything it needs
- Application-level joins — fetch from each shard separately, merge in the application
- Materialized views — precompute joined data into a read-optimized store
- Broadcast tables — replicate small reference tables to every shard
Cross-shard transactions are even harder. Options include:
- Two-phase commit (2PC) — correct but slow and blocking
- Saga pattern — break the transaction into compensatable local transactions
- Avoid them — design the schema so transactions stay within a single shard
Resharding strategies#
When shards become unbalanced or you need to add capacity, you reshard.
Split and merge#
Split a hot shard into two smaller ones, or merge underutilized shards. This works well with range sharding.
Consistent hashing#
Use a hash ring so adding a shard only moves a fraction of keys. Virtual nodes improve balance. Systems like DynamoDB and Cassandra use this approach.
Double-write migration#
- Start writing to both old and new shard configurations
- Backfill historical data to the new shards
- Verify consistency
- Cut reads over to the new configuration
- Stop writes to the old configuration
Ghost table approach#
Create the new shard layout as shadow tables, replicate data in the background using change data capture, then swap atomically.
Vitess implementation#
Vitess is the sharding middleware that powers YouTube and many large MySQL deployments.
- VSchema defines the sharding strategy (hash, range, or lookup) per table
- VTGate is the query router — it parses SQL, determines target shards, and merges results
- VTTablet manages individual MySQL instances
- Supports scatter-gather for cross-shard queries and sequence tables for auto-increment IDs
- Resharding uses VReplication to stream changes to new shards with zero downtime
# Vitess VSchema example
keyspaces:
commerce:
sharded: true
vindexes:
hash:
type: "hash"
tables:
orders:
column_vindexes:
- column: customer_id
name: hash
Citus implementation#
Citus extends PostgreSQL with transparent sharding.
- Distributed tables are sharded across worker nodes by a distribution column
- Reference tables are replicated to every node (broadcast tables)
- Co-location groups tables by the same distribution column so joins stay local
- The coordinator node rewrites queries into per-shard fragments
- Supports tenant isolation — move a hot tenant to a dedicated node
-- Distribute the orders table by customer_id
SELECT create_distributed_table('orders', 'customer_id');
-- Co-locate line_items with orders
SELECT create_distributed_table('line_items', 'customer_id',
colocate_with => 'orders');
Hot shard mitigation#
A hot shard receives disproportionate traffic. Mitigation strategies:
- Shard splitting — break the hot shard into smaller pieces
- Read replicas per shard — offload read traffic
- Caching layer — put Redis or Memcached in front of hot shards
- Rate limiting — throttle abusive tenants at the application layer
- Shard rebalancing — move some key ranges to less loaded shards
- Salting — append a random suffix to the shard key to spread writes (trade-off: reads must fan out)
Monitoring sharded databases#
Track these metrics per shard:
- Query latency (p50, p95, p99) — detect hot shards early
- Storage utilization — trigger resharding before disks fill
- Connection count — identify connection pool exhaustion
- Replication lag — ensure replicas stay current
- Cross-shard query ratio — high ratios indicate a poor shard key choice
Explore sharding visually#
On Codelit, generate a sharded database architecture to see how queries route through VTGate or a Citus coordinator. Click on individual shards to explore data distribution, replication, and failover paths.
This is article #375 in the Codelit engineering blog series.
Build and explore distributed database architectures visually at codelit.io.
Try it on Codelit
Chaos Mode
Simulate node failures and watch cascading impact across your architecture
Related articles
Try these templates
WhatsApp-Scale Messaging System
End-to-end encrypted messaging with offline delivery, group chats, and media sharing at billions-of-messages scale.
9 componentsGmail-Scale Email Service
Email platform handling billions of messages with spam filtering, search indexing, attachment storage, and push notifications.
10 componentsBuild this architecture
Generate an interactive architecture for Database Sharding Strategies in seconds.
Try it in Codelit →
Comments