Database Scaling — Replication, Sharding, and When to Use What
Your database is the bottleneck. Now what?#
Every system hits the same wall eventually: the database can't keep up. Queries slow down, connections max out, and someone suggests "we should shard."
Before you shard anything, let's walk through the actual decision tree. Most teams jump to sharding way too early.
Step 1: Have you indexed properly?#
I'm serious. Before any scaling discussion:
- Are your slow queries using indexes? Run
EXPLAIN ANALYZE - Do you have composite indexes for your most common query patterns?
- Are you doing full table scans on large tables?
Adding the right index often gives you 100x improvement. That's not an exaggeration — going from a sequential scan to an index lookup on a million-row table is the difference between 2 seconds and 2 milliseconds.
Step 2: Vertical scaling (scale up)#
The simplest scaling strategy: get a bigger machine.
- More RAM → more data cached in memory → fewer disk reads
- More CPU cores → more concurrent queries
- Faster SSDs → faster writes and sequential scans
Vertical scaling is underrated because it's boring. But a single PostgreSQL instance on a 64-core machine with 256GB RAM can handle millions of rows and thousands of queries per second.
When to use: You haven't maxed out your current hardware. You have budget. You want simplicity.
When it fails: There's a ceiling. The biggest machine money can buy still has limits. And you have a single point of failure.
Step 3: Read replicas#
If your workload is read-heavy (most web apps are — 90% reads, 10% writes):
- Write to a primary database
- Replicate to one or more read replicas
- Route read queries to replicas
This multiplies your read capacity linearly. Need 3x read throughput? Add 2 replicas.
The catch: Replication lag. A write to the primary might take 10-100ms to appear on replicas. If a user creates a post and immediately views their profile, they might not see it.
Solutions:
- Read-your-own-writes: route the writing user's reads to the primary for a few seconds
- Synchronous replication: replicas confirm before the write returns (slower writes, consistent reads)
Step 4: Caching#
Before sharding, add a cache layer:
- Application cache (Redis/Memcached): Cache query results, computed values, session data
- Query result cache: Database-level caching of frequent queries
- CDN for API responses: Cache GET endpoints at the edge
A well-tuned Redis cache in front of your database can reduce database load by 80-95%.
Step 5: Sharding (the last resort)#
Sharding splits your data across multiple database instances. Each shard holds a subset of the data.
Choosing a shard key#
This is the most important decision. Get it wrong and you'll re-shard later (painful).
Good shard keys:
user_id— each user's data lives on one shard. Great for user-centric appstenant_id— for multi-tenant SaaS. Each customer on their own shardregion— for geo-distributed systems. US data on US shards, EU data on EU shards
Bad shard keys:
created_at— all new writes hit the latest shard (hot spot)status— only a few values, uneven distribution- Anything with low cardinality
The trade-offs nobody talks about#
Cross-shard queries are expensive. "Show me all orders across all users" now requires querying every shard and merging results.
Joins across shards don't work. If users are on shard A and orders are on shard B, you can't JOIN them. You need to denormalize or use application-level joins.
Re-sharding is painful. If you start with 4 shards and need 16, you have to migrate data while the system is running. Consistent hashing helps but doesn't eliminate the pain.
Operational complexity explodes. 4 shards means 4x the backups, 4x the monitoring, 4x the upgrade coordination. With replicas, that's 12+ database instances to manage.
The decision tree#
Is the database actually the bottleneck?
├── No → Profile your app, fix N+1 queries, add indexes
└── Yes → What kind of bottleneck?
├── Read-heavy → Add read replicas + Redis cache
├── Write-heavy → Vertical scale first, then shard
└── Storage → Archive old data, add partitioning
└── Still not enough? → Shard by tenant/user/region
The architecture in practice#
On Codelit, you can see how database scaling looks in a real architecture. Generate a system and click on the database node — the audit tool will tell you if your database is a bottleneck and suggest specific scaling strategies.
You can also simulate database failure to see which other components are affected and whether your system has proper fallbacks.
The golden rule#
Scale the simplest way that solves your actual problem. Indexes before replicas. Replicas before caching. Caching before sharding. Sharding as the last resort.
Most applications never need to shard. The ones that do know it because they've exhausted every other option.
Explore database scaling patterns: generate any architecture on Codelit.io and audit the database layer for bottlenecks.
Try it on Codelit
Chaos Mode
Simulate node failures and watch cascading impact across your architecture
Cost Estimator
See estimated AWS monthly costs for every component in your architecture
AI Architecture Review
Get an AI audit covering security gaps, bottlenecks, and scaling risks
Related articles
Try these templates
Build this architecture
Generate an interactive architecture for Database Scaling in seconds.
Try it in Codelit →
Comments