databasesystem-designscaling

Database Scaling — Replication, Sharding, and When to Use What

March 23, 2026 5 min readBy Mo Discussion

Your database is the bottleneck. Now what?#

Every system hits the same wall eventually: the database can't keep up. Queries slow down, connections max out, and someone suggests "we should shard."

Before you shard anything, let's walk through the actual decision tree. Most teams jump to sharding way too early.

Step 1: Have you indexed properly?#

I'm serious. Before any scaling discussion:

Are your slow queries using indexes? Run EXPLAIN ANALYZE
Do you have composite indexes for your most common query patterns?
Are you doing full table scans on large tables?

Adding the right index often gives you 100x improvement. That's not an exaggeration — going from a sequential scan to an index lookup on a million-row table is the difference between 2 seconds and 2 milliseconds.

Step 2: Vertical scaling (scale up)#

The simplest scaling strategy: get a bigger machine.

More RAM → more data cached in memory → fewer disk reads
More CPU cores → more concurrent queries
Faster SSDs → faster writes and sequential scans

Vertical scaling is underrated because it's boring. But a single PostgreSQL instance on a 64-core machine with 256GB RAM can handle millions of rows and thousands of queries per second.

When to use: You haven't maxed out your current hardware. You have budget. You want simplicity.

When it fails: There's a ceiling. The biggest machine money can buy still has limits. And you have a single point of failure.

Step 3: Read replicas#

If your workload is read-heavy (most web apps are — 90% reads, 10% writes):

Write to a primary database
Replicate to one or more read replicas
Route read queries to replicas

This multiplies your read capacity linearly. Need 3x read throughput? Add 2 replicas.

The catch: Replication lag. A write to the primary might take 10-100ms to appear on replicas. If a user creates a post and immediately views their profile, they might not see it.

Solutions:

Read-your-own-writes: route the writing user's reads to the primary for a few seconds
Synchronous replication: replicas confirm before the write returns (slower writes, consistent reads)

Step 4: Caching#

Before sharding, add a cache layer:

Application cache (Redis/Memcached): Cache query results, computed values, session data
Query result cache: Database-level caching of frequent queries
CDN for API responses: Cache GET endpoints at the edge

A well-tuned Redis cache in front of your database can reduce database load by 80-95%.

Step 5: Sharding (the last resort)#

Sharding splits your data across multiple database instances. Each shard holds a subset of the data.

Choosing a shard key#

This is the most important decision. Get it wrong and you'll re-shard later (painful).

Good shard keys:

user_id — each user's data lives on one shard. Great for user-centric apps
tenant_id — for multi-tenant SaaS. Each customer on their own shard
region — for geo-distributed systems. US data on US shards, EU data on EU shards

Bad shard keys:

created_at — all new writes hit the latest shard (hot spot)
status — only a few values, uneven distribution
Anything with low cardinality

The trade-offs nobody talks about#

Cross-shard queries are expensive. "Show me all orders across all users" now requires querying every shard and merging results.

Joins across shards don't work. If users are on shard A and orders are on shard B, you can't JOIN them. You need to denormalize or use application-level joins.

Re-sharding is painful. If you start with 4 shards and need 16, you have to migrate data while the system is running. Consistent hashing helps but doesn't eliminate the pain.

Operational complexity explodes. 4 shards means 4x the backups, 4x the monitoring, 4x the upgrade coordination. With replicas, that's 12+ database instances to manage.

The decision tree#

Is the database actually the bottleneck?
├── No → Profile your app, fix N+1 queries, add indexes
└── Yes → What kind of bottleneck?
    ├── Read-heavy → Add read replicas + Redis cache
    ├── Write-heavy → Vertical scale first, then shard
    └── Storage → Archive old data, add partitioning
        └── Still not enough? → Shard by tenant/user/region

The architecture in practice#

On Codelit, you can see how database scaling looks in a real architecture. Generate a system and click on the database node — the audit tool will tell you if your database is a bottleneck and suggest specific scaling strategies.

You can also simulate database failure to see which other components are affected and whether your system has proper fallbacks.

The golden rule#

Scale the simplest way that solves your actual problem. Indexes before replicas. Replicas before caching. Caching before sharding. Sharding as the last resort.

Most applications never need to shard. The ones that do know it because they've exhausted every other option.

Explore database scaling patterns: generate any architecture on Codelit.io and audit the database layer for bottlenecks.

{ }

Explore the Instagram architecture interactively

Try it →

Try it on Codelit

Chaos Mode

Simulate node failures and watch cascading impact across your architecture

Cost Estimator

See estimated AWS monthly costs for every component in your architecture

AI Architecture Review

Get an AI audit covering security gaps, bottlenecks, and scaling risks

Build this architecture →

Comments

api design

Batch API Endpoints — Patterns for Bulk Operations, Partial Success, and Idempotency

8 min read

system design

Circuit Breaker Implementation — State Machine, Failure Counting, Fallbacks, and Resilience4j

7 min read

api

API-First Design Methodology — Design Before You Implement

7 min read

Try these templates

WhatsApp-Scale Messaging System

End-to-end encrypted messaging with offline delivery, group chats, and media sharing at billions-of-messages scale.

9 components

Build this architecture

Generate an interactive architecture for Database Scaling in seconds.

Try it in Codelit →

databasesystem-designscaling

Database Scaling — Replication, Sharding, and When to Use What

March 23, 2026 5 min readBy Mo Discussion

Your database is the bottleneck. Now what?#

Every system hits the same wall eventually: the database can't keep up. Queries slow down, connections max out, and someone suggests "we should shard."

Before you shard anything, let's walk through the actual decision tree. Most teams jump to sharding way too early.

Step 1: Have you indexed properly?#

I'm serious. Before any scaling discussion:

Are your slow queries using indexes? Run EXPLAIN ANALYZE
Do you have composite indexes for your most common query patterns?
Are you doing full table scans on large tables?

Step 2: Vertical scaling (scale up)#

The simplest scaling strategy: get a bigger machine.

More RAM → more data cached in memory → fewer disk reads
More CPU cores → more concurrent queries
Faster SSDs → faster writes and sequential scans

Vertical scaling is underrated because it's boring. But a single PostgreSQL instance on a 64-core machine with 256GB RAM can handle millions of rows and thousands of queries per second.

When to use: You haven't maxed out your current hardware. You have budget. You want simplicity.

When it fails: There's a ceiling. The biggest machine money can buy still has limits. And you have a single point of failure.

Step 3: Read replicas#

If your workload is read-heavy (most web apps are — 90% reads, 10% writes):

Write to a primary database
Replicate to one or more read replicas
Route read queries to replicas

This multiplies your read capacity linearly. Need 3x read throughput? Add 2 replicas.

The catch: Replication lag. A write to the primary might take 10-100ms to appear on replicas. If a user creates a post and immediately views their profile, they might not see it.

Solutions:

Read-your-own-writes: route the writing user's reads to the primary for a few seconds
Synchronous replication: replicas confirm before the write returns (slower writes, consistent reads)

Step 4: Caching#

Before sharding, add a cache layer:

Application cache (Redis/Memcached): Cache query results, computed values, session data
Query result cache: Database-level caching of frequent queries
CDN for API responses: Cache GET endpoints at the edge

A well-tuned Redis cache in front of your database can reduce database load by 80-95%.

Step 5: Sharding (the last resort)#

Sharding splits your data across multiple database instances. Each shard holds a subset of the data.

Choosing a shard key#

This is the most important decision. Get it wrong and you'll re-shard later (painful).

Good shard keys:

user_id — each user's data lives on one shard. Great for user-centric apps
tenant_id — for multi-tenant SaaS. Each customer on their own shard
region — for geo-distributed systems. US data on US shards, EU data on EU shards

Bad shard keys:

created_at — all new writes hit the latest shard (hot spot)
status — only a few values, uneven distribution
Anything with low cardinality

The trade-offs nobody talks about#

Cross-shard queries are expensive. "Show me all orders across all users" now requires querying every shard and merging results.

Joins across shards don't work. If users are on shard A and orders are on shard B, you can't JOIN them. You need to denormalize or use application-level joins.

Re-sharding is painful. If you start with 4 shards and need 16, you have to migrate data while the system is running. Consistent hashing helps but doesn't eliminate the pain.

Operational complexity explodes. 4 shards means 4x the backups, 4x the monitoring, 4x the upgrade coordination. With replicas, that's 12+ database instances to manage.

The decision tree#

Is the database actually the bottleneck?
├── No → Profile your app, fix N+1 queries, add indexes
└── Yes → What kind of bottleneck?
    ├── Read-heavy → Add read replicas + Redis cache
    ├── Write-heavy → Vertical scale first, then shard
    └── Storage → Archive old data, add partitioning
        └── Still not enough? → Shard by tenant/user/region

The architecture in practice#

You can also simulate database failure to see which other components are affected and whether your system has proper fallbacks.

The golden rule#

Scale the simplest way that solves your actual problem. Indexes before replicas. Replicas before caching. Caching before sharding. Sharding as the last resort.

Most applications never need to shard. The ones that do know it because they've exhausted every other option.

Explore database scaling patterns: generate any architecture on Codelit.io and audit the database layer for bottlenecks.

{ }

Explore the Instagram architecture interactively

Try it →

Try it on Codelit

Chaos Mode

Simulate node failures and watch cascading impact across your architecture

Cost Estimator

See estimated AWS monthly costs for every component in your architecture

AI Architecture Review

Get an AI audit covering security gaps, bottlenecks, and scaling risks

Build this architecture →

Comments

api design

Try these templates

WhatsApp-Scale Messaging System

End-to-end encrypted messaging with offline delivery, group chats, and media sharing at billions-of-messages scale.

9 components

Build this architecture

Generate an interactive architecture for Database Scaling in seconds.

Try it in Codelit →

Database Scaling — Replication, Sharding, and When to Use What

Your database is the bottleneck. Now what?#

Step 1: Have you indexed properly?#

Step 2: Vertical scaling (scale up)#

Step 3: Read replicas#

Step 4: Caching#

Step 5: Sharding (the last resort)#

Choosing a shard key#

The trade-offs nobody talks about#

The decision tree#

The architecture in practice#

The golden rule#

Comments

Related articles

Batch API Endpoints — Patterns for Bulk Operations, Partial Success, and Idempotency

Circuit Breaker Implementation — State Machine, Failure Counting, Fallbacks, and Resilience4j

API-First Design Methodology — Design Before You Implement

Try these templates

WhatsApp-Scale Messaging System

Build this architecture

Database Scaling — Replication, Sharding, and When to Use What

Your database is the bottleneck. Now what?#

Step 1: Have you indexed properly?#

Step 2: Vertical scaling (scale up)#

Step 3: Read replicas#

Step 4: Caching#

Step 5: Sharding (the last resort)#

Choosing a shard key#

The trade-offs nobody talks about#

The decision tree#

The architecture in practice#

The golden rule#

Comments

Related articles

Batch API Endpoints — Patterns for Bulk Operations, Partial Success, and Idempotency

Circuit Breaker Implementation — State Machine, Failure Counting, Fallbacks, and Resilience4j

API-First Design Methodology — Design Before You Implement

Try these templates

WhatsApp-Scale Messaging System

Build this architecture