Database Scaling: Sharding, Replication & Partitioning Explained
Database Scaling: Sharding, Replication & Partitioning#
Your app is growing. Queries are slowing down. The database CPU is at 90%. What do you do?
This guide covers every database scaling strategy — when to use each, the trade-offs, and how they fit into your architecture.
The Scaling Ladder#
Most systems scale in this order:
1. Optimize queries (indexes, query plans)
2. Add caching (Redis)
3. Vertical scaling (bigger server)
4. Read replicas
5. Connection pooling
6. Partitioning (table-level)
7. Sharding (database-level)
Don't jump to sharding. Most apps never need it.
Level 1: Query Optimization#
Before scaling hardware, scale your queries:
- Add indexes — most performance issues are missing indexes
- EXPLAIN ANALYZE — find full table scans
- Denormalize hot paths — trade storage for speed
- Batch writes — reduce round trips
- Pagination — don't SELECT * with no LIMIT
Cost: Free. Impact: Often 10-100x improvement.
Level 2: Caching (Redis)#
Cache frequent reads to avoid hitting the database:
App → Redis (cache hit?) → yes → return cached
→ no → PostgreSQL → cache result → return
Patterns:
- Cache-aside — app checks cache, falls back to DB, writes to cache
- Write-through — app writes to cache and DB simultaneously
- Write-behind — app writes to cache, async sync to DB
When: Read-heavy workloads (> 10:1 read-to-write ratio)
Level 3: Vertical Scaling#
Upgrade the server: more CPU, more RAM, faster SSD.
| Cloud Provider | Max Instance | vCPU | RAM |
|---|---|---|---|
| AWS RDS | db.r6g.16xlarge | 64 | 512 GB |
| GCP Cloud SQL | db-custom-96-614400 | 96 | 600 GB |
| Azure | E96s_v5 | 96 | 672 GB |
When: You haven't maxed out a single server yet. Limit: Eventually hits hardware ceiling (and cost becomes insane).
Level 4: Read Replicas#
Route read queries to replicas, writes to the primary:
App → Write → Primary DB
→ Read → Replica 1
→ Replica 2
→ Replica 3
How it works:
- Primary streams WAL (Write-Ahead Log) to replicas
- Replicas are milliseconds behind (replication lag)
- Reads from replicas are eventually consistent
When: Read-heavy workloads where slight staleness is acceptable.
Gotcha: Replication lag. Don't read-after-write from a replica — you might get stale data. Use "read your own writes" pattern.
Level 5: Connection Pooling#
Each database connection uses ~10MB RAM. 1000 connections = 10GB just for connections.
PgBouncer sits between your app and PostgreSQL:
App (1000 connections) → PgBouncer (50 connections) → PostgreSQL
Modes:
- Transaction pooling — connections shared between transactions (most common)
- Session pooling — connections shared between sessions
- Statement pooling — connections shared between statements
When: Serverless apps (Lambda, Vercel Edge) that create many short-lived connections.
Tools: PgBouncer, PgCat, Supabase Supavisor, Neon connection pooling.
Level 6: Partitioning#
Split one large table into smaller pieces on the same server:
-- Range partitioning by date
CREATE TABLE orders (
id BIGINT,
created_at TIMESTAMP,
amount DECIMAL
) PARTITION BY RANGE (created_at);
CREATE TABLE orders_2025 PARTITION OF orders
FOR VALUES FROM ('2025-01-01') TO ('2026-01-01');
CREATE TABLE orders_2026 PARTITION OF orders
FOR VALUES FROM ('2026-01-01') TO ('2027-01-01');
Types:
- Range — by date, ID range (most common)
- List — by category, region, tenant
- Hash — even distribution across N partitions
When: Single table > 100M rows and queries filter on the partition key.
Level 7: Sharding#
Split data across multiple database servers:
App → Shard Router
→ Shard 1 (users A-M) → DB Server 1
→ Shard 2 (users N-Z) → DB Server 2
→ Shard 3 (overflow) → DB Server 3
Sharding strategies:
- Hash-based —
shard = hash(user_id) % num_shards(even distribution) - Range-based —
shard = user_id / 1M(hot spots possible) - Geography — US users → US shard, EU users → EU shard
- Tenant-based — each enterprise customer gets their own shard
When: Single server can't handle write volume, or data exceeds single server storage.
Trade-offs:
- Cross-shard queries are expensive (JOINs across shards)
- Rebalancing shards is painful
- Application complexity increases significantly
- Transaction guarantees become harder
Who uses it: Instagram (user_id based), Notion (workspace-based), Vitess (YouTube/Slack).
Decision Matrix#
| Symptom | Solution |
|---|---|
| Slow queries | Add indexes, optimize queries |
| High read latency | Add Redis cache |
| CPU maxed on single server | Vertical scale or read replicas |
| Too many connections | PgBouncer connection pooling |
| Single table too large | Partitioning |
| Write throughput maxed | Sharding |
| Data sovereignty requirements | Geographic sharding |
| Multi-tenant isolation | Tenant-per-database (Turso, Neon) |
Modern Approaches#
Serverless Databases#
Neon, PlanetScale, and Turso auto-scale without manual sharding:
- Neon — Serverless Postgres with branching, scales to zero
- PlanetScale — MySQL with Vitess-powered horizontal scaling
- Turso — Edge SQLite with per-tenant databases
NewSQL#
CockroachDB and TiDB provide distributed SQL with automatic sharding:
- Horizontal scaling with ACID transactions
- No manual shard management
- PostgreSQL-compatible wire protocol
Architecture Examples#
E-Commerce (10M+ orders/year)#
App → PgBouncer → PostgreSQL Primary (writes)
→ Read Replica 1 (product reads)
→ Read Replica 2 (analytics)
→ Redis (product cache, session store)
→ Elasticsearch (product search)
SaaS Multi-Tenant (1000+ tenants)#
App → Shard Router
→ Shared DB (free tier tenants)
→ Dedicated DB per enterprise tenant
→ Redis (rate limiting, caching)
Summary#
- Don't optimize prematurely — index first, cache second
- Vertical scaling works longer than you think
- Read replicas handle most read-scaling needs
- Connection pooling is essential for serverless
- Partition before you shard — less complexity
- Shard only when you must — it's a one-way door
- Consider serverless DBs — they handle scaling for you
Design your database architecture at codelit.io — generate interactive diagrams with cost estimates, performance audits, and infrastructure exports.
Try it on Codelit
Chaos Mode
Simulate node failures and watch cascading impact across your architecture
Cost Estimator
See estimated AWS monthly costs for every component in your architecture
AI Architecture Review
Get an AI audit covering security gaps, bottlenecks, and scaling risks
Related articles
AI Agent Tool Use Architecture: Function Calling, ReAct Loops & Structured Outputs
6 min read
AI searchAI-Powered Search Architecture: Semantic Search, Hybrid Search, and RAG
8 min read
AI safetyAI Safety Guardrails Architecture: Input Validation, Output Filtering, and Human-in-the-Loop
8 min read
Try these templates
Netflix Video Streaming Architecture
Global video streaming platform with adaptive bitrate, CDN distribution, and recommendation engine.
10 componentsSearch Engine Architecture
Web-scale search with crawling, indexing, ranking, and sub-second query serving.
8 componentsGoogle Search Engine Architecture
Web-scale search with crawling, indexing, PageRank, query processing, ads, and knowledge graph.
10 componentsBuild this architecture
Generate an interactive architecture for Database Scaling in seconds.
Try it in Codelit →
Comments