databasescalingshardingsystem designarchitecture

Database Scaling: Sharding, Replication & Partitioning Explained

March 28, 2026 6 min readBy Codelit Team Discussion

Database Scaling: Sharding, Replication & Partitioning#

Your app is growing. Queries are slowing down. The database CPU is at 90%. What do you do?

This guide covers every database scaling strategy — when to use each, the trade-offs, and how they fit into your architecture.

The Scaling Ladder#

Most systems scale in this order:

1. Optimize queries (indexes, query plans)
2. Add caching (Redis)
3. Vertical scaling (bigger server)
4. Read replicas
5. Connection pooling
6. Partitioning (table-level)
7. Sharding (database-level)

Don't jump to sharding. Most apps never need it.

Level 1: Query Optimization#

Before scaling hardware, scale your queries:

Add indexes — most performance issues are missing indexes
EXPLAIN ANALYZE — find full table scans
Denormalize hot paths — trade storage for speed
Batch writes — reduce round trips
Pagination — don't SELECT * with no LIMIT

Cost: Free. Impact: Often 10-100x improvement.

Level 2: Caching (Redis)#

Cache frequent reads to avoid hitting the database:

App → Redis (cache hit?) → yes → return cached
                        → no  → PostgreSQL → cache result → return

Patterns:

Cache-aside — app checks cache, falls back to DB, writes to cache
Write-through — app writes to cache and DB simultaneously
Write-behind — app writes to cache, async sync to DB

When: Read-heavy workloads (> 10:1 read-to-write ratio)

Level 3: Vertical Scaling#

Upgrade the server: more CPU, more RAM, faster SSD.

Cloud Provider	Max Instance	vCPU	RAM
AWS RDS	db.r6g.16xlarge	64	512 GB
GCP Cloud SQL	db-custom-96-614400	96	600 GB
Azure	E96s_v5	96	672 GB

When: You haven't maxed out a single server yet. Limit: Eventually hits hardware ceiling (and cost becomes insane).

Level 4: Read Replicas#

Route read queries to replicas, writes to the primary:

App → Write → Primary DB
    → Read  → Replica 1
            → Replica 2
            → Replica 3

How it works:

Primary streams WAL (Write-Ahead Log) to replicas
Replicas are milliseconds behind (replication lag)
Reads from replicas are eventually consistent

When: Read-heavy workloads where slight staleness is acceptable.

Gotcha: Replication lag. Don't read-after-write from a replica — you might get stale data. Use "read your own writes" pattern.

Level 5: Connection Pooling#

Each database connection uses ~10MB RAM. 1000 connections = 10GB just for connections.

PgBouncer sits between your app and PostgreSQL:

App (1000 connections) → PgBouncer (50 connections) → PostgreSQL

Modes:

Transaction pooling — connections shared between transactions (most common)
Session pooling — connections shared between sessions
Statement pooling — connections shared between statements

When: Serverless apps (Lambda, Vercel Edge) that create many short-lived connections.

Tools: PgBouncer, PgCat, Supabase Supavisor, Neon connection pooling.

Level 6: Partitioning#

Split one large table into smaller pieces on the same server:

-- Range partitioning by date
CREATE TABLE orders (
  id BIGINT,
  created_at TIMESTAMP,
  amount DECIMAL
) PARTITION BY RANGE (created_at);

CREATE TABLE orders_2025 PARTITION OF orders
  FOR VALUES FROM ('2025-01-01') TO ('2026-01-01');

CREATE TABLE orders_2026 PARTITION OF orders
  FOR VALUES FROM ('2026-01-01') TO ('2027-01-01');

Types:

Range — by date, ID range (most common)
List — by category, region, tenant
Hash — even distribution across N partitions

When: Single table > 100M rows and queries filter on the partition key.

Level 7: Sharding#

Split data across multiple database servers:

App → Shard Router
        → Shard 1 (users A-M)  → DB Server 1
        → Shard 2 (users N-Z)  → DB Server 2
        → Shard 3 (overflow)   → DB Server 3

Sharding strategies:

Hash-based — shard = hash(user_id) % num_shards (even distribution)
Range-based — shard = user_id / 1M (hot spots possible)
Geography — US users → US shard, EU users → EU shard
Tenant-based — each enterprise customer gets their own shard

When: Single server can't handle write volume, or data exceeds single server storage.

Trade-offs:

Cross-shard queries are expensive (JOINs across shards)
Rebalancing shards is painful
Application complexity increases significantly
Transaction guarantees become harder

Who uses it: Instagram (user_id based), Notion (workspace-based), Vitess (YouTube/Slack).

Decision Matrix#

Symptom	Solution
Slow queries	Add indexes, optimize queries
High read latency	Add Redis cache
CPU maxed on single server	Vertical scale or read replicas
Too many connections	PgBouncer connection pooling
Single table too large	Partitioning
Write throughput maxed	Sharding
Data sovereignty requirements	Geographic sharding
Multi-tenant isolation	Tenant-per-database (Turso, Neon)

Modern Approaches#

Serverless Databases#

Neon, PlanetScale, and Turso auto-scale without manual sharding:

Neon — Serverless Postgres with branching, scales to zero
PlanetScale — MySQL with Vitess-powered horizontal scaling
Turso — Edge SQLite with per-tenant databases

NewSQL#

CockroachDB and TiDB provide distributed SQL with automatic sharding:

Horizontal scaling with ACID transactions
No manual shard management
PostgreSQL-compatible wire protocol

Architecture Examples#

E-Commerce (10M+ orders/year)#

App → PgBouncer → PostgreSQL Primary (writes)
                → Read Replica 1 (product reads)
                → Read Replica 2 (analytics)
    → Redis (product cache, session store)
    → Elasticsearch (product search)

SaaS Multi-Tenant (1000+ tenants)#

App → Shard Router
        → Shared DB (free tier tenants)
        → Dedicated DB per enterprise tenant
    → Redis (rate limiting, caching)

Generate your database architecture →

Summary#

Don't optimize prematurely — index first, cache second
Vertical scaling works longer than you think
Read replicas handle most read-scaling needs
Connection pooling is essential for serverless
Partition before you shard — less complexity
Shard only when you must — it's a one-way door
Consider serverless DBs — they handle scaling for you

Design your database architecture at codelit.io — generate interactive diagrams with cost estimates, performance audits, and infrastructure exports.

Try it on Codelit

Chaos Mode

Simulate node failures and watch cascading impact across your architecture

Cost Estimator

See estimated AWS monthly costs for every component in your architecture

AI Architecture Review

Get an AI audit covering security gaps, bottlenecks, and scaling risks

Build this architecture →

Comments

AI agents

Context Engineering for Agentic Systems

2 min read

AI agents

AI Agent Memory Architecture

2 min read

AI agents

Production AI Agent Deployment Checklist

2 min read

Try these templates

Netflix Video Streaming Architecture

Global video streaming platform with adaptive bitrate, CDN distribution, and recommendation engine.

10 components

Search Engine Architecture

Web-scale search with crawling, indexing, ranking, and sub-second query serving.

8 components

Google Search Engine Architecture

Web-scale search with crawling, indexing, PageRank, query processing, ads, and knowledge graph.

10 components

Build this architecture

Generate an interactive architecture for Database Scaling in seconds.

Try it in Codelit →

databasescalingshardingsystem designarchitecture

Database Scaling: Sharding, Replication & Partitioning Explained

March 28, 2026 6 min readBy Codelit Team Discussion

Database Scaling: Sharding, Replication & Partitioning#

Your app is growing. Queries are slowing down. The database CPU is at 90%. What do you do?

This guide covers every database scaling strategy — when to use each, the trade-offs, and how they fit into your architecture.

The Scaling Ladder#

Most systems scale in this order:

1. Optimize queries (indexes, query plans)
2. Add caching (Redis)
3. Vertical scaling (bigger server)
4. Read replicas
5. Connection pooling
6. Partitioning (table-level)
7. Sharding (database-level)

Don't jump to sharding. Most apps never need it.

Level 1: Query Optimization#

Before scaling hardware, scale your queries:

Add indexes — most performance issues are missing indexes
EXPLAIN ANALYZE — find full table scans
Denormalize hot paths — trade storage for speed
Batch writes — reduce round trips
Pagination — don't SELECT * with no LIMIT

Cost: Free. Impact: Often 10-100x improvement.

Level 2: Caching (Redis)#

Cache frequent reads to avoid hitting the database:

App → Redis (cache hit?) → yes → return cached
                        → no  → PostgreSQL → cache result → return

Patterns:

Cache-aside — app checks cache, falls back to DB, writes to cache
Write-through — app writes to cache and DB simultaneously
Write-behind — app writes to cache, async sync to DB

When: Read-heavy workloads (> 10:1 read-to-write ratio)

Level 3: Vertical Scaling#

Upgrade the server: more CPU, more RAM, faster SSD.

Cloud Provider	Max Instance	vCPU	RAM
AWS RDS	db.r6g.16xlarge	64	512 GB
GCP Cloud SQL	db-custom-96-614400	96	600 GB
Azure	E96s_v5	96	672 GB

When: You haven't maxed out a single server yet. Limit: Eventually hits hardware ceiling (and cost becomes insane).

Level 4: Read Replicas#

Route read queries to replicas, writes to the primary:

App → Write → Primary DB
    → Read  → Replica 1
            → Replica 2
            → Replica 3

How it works:

Primary streams WAL (Write-Ahead Log) to replicas
Replicas are milliseconds behind (replication lag)
Reads from replicas are eventually consistent

When: Read-heavy workloads where slight staleness is acceptable.

Gotcha: Replication lag. Don't read-after-write from a replica — you might get stale data. Use "read your own writes" pattern.

Level 5: Connection Pooling#

Each database connection uses ~10MB RAM. 1000 connections = 10GB just for connections.

PgBouncer sits between your app and PostgreSQL:

App (1000 connections) → PgBouncer (50 connections) → PostgreSQL

Modes:

Transaction pooling — connections shared between transactions (most common)
Session pooling — connections shared between sessions
Statement pooling — connections shared between statements

When: Serverless apps (Lambda, Vercel Edge) that create many short-lived connections.

Tools: PgBouncer, PgCat, Supabase Supavisor, Neon connection pooling.

Level 6: Partitioning#

Split one large table into smaller pieces on the same server:

-- Range partitioning by date
CREATE TABLE orders (
  id BIGINT,
  created_at TIMESTAMP,
  amount DECIMAL
) PARTITION BY RANGE (created_at);

CREATE TABLE orders_2025 PARTITION OF orders
  FOR VALUES FROM ('2025-01-01') TO ('2026-01-01');

CREATE TABLE orders_2026 PARTITION OF orders
  FOR VALUES FROM ('2026-01-01') TO ('2027-01-01');

Types:

Range — by date, ID range (most common)
List — by category, region, tenant
Hash — even distribution across N partitions

When: Single table > 100M rows and queries filter on the partition key.

Level 7: Sharding#

Split data across multiple database servers:

App → Shard Router
        → Shard 1 (users A-M)  → DB Server 1
        → Shard 2 (users N-Z)  → DB Server 2
        → Shard 3 (overflow)   → DB Server 3

Sharding strategies:

Hash-based — shard = hash(user_id) % num_shards (even distribution)
Range-based — shard = user_id / 1M (hot spots possible)
Geography — US users → US shard, EU users → EU shard
Tenant-based — each enterprise customer gets their own shard

When: Single server can't handle write volume, or data exceeds single server storage.

Trade-offs:

Cross-shard queries are expensive (JOINs across shards)
Rebalancing shards is painful
Application complexity increases significantly
Transaction guarantees become harder

Who uses it: Instagram (user_id based), Notion (workspace-based), Vitess (YouTube/Slack).

Decision Matrix#

Symptom	Solution
Slow queries	Add indexes, optimize queries
High read latency	Add Redis cache
CPU maxed on single server	Vertical scale or read replicas
Too many connections	PgBouncer connection pooling
Single table too large	Partitioning
Write throughput maxed	Sharding
Data sovereignty requirements	Geographic sharding
Multi-tenant isolation	Tenant-per-database (Turso, Neon)

Modern Approaches#

Serverless Databases#

Neon, PlanetScale, and Turso auto-scale without manual sharding:

Neon — Serverless Postgres with branching, scales to zero
PlanetScale — MySQL with Vitess-powered horizontal scaling
Turso — Edge SQLite with per-tenant databases

NewSQL#

CockroachDB and TiDB provide distributed SQL with automatic sharding:

Horizontal scaling with ACID transactions
No manual shard management
PostgreSQL-compatible wire protocol

Architecture Examples#

E-Commerce (10M+ orders/year)#

App → PgBouncer → PostgreSQL Primary (writes)
                → Read Replica 1 (product reads)
                → Read Replica 2 (analytics)
    → Redis (product cache, session store)
    → Elasticsearch (product search)

SaaS Multi-Tenant (1000+ tenants)#

App → Shard Router
        → Shared DB (free tier tenants)
        → Dedicated DB per enterprise tenant
    → Redis (rate limiting, caching)

Generate your database architecture →

Summary#

Don't optimize prematurely — index first, cache second
Vertical scaling works longer than you think
Read replicas handle most read-scaling needs
Connection pooling is essential for serverless
Partition before you shard — less complexity
Shard only when you must — it's a one-way door
Consider serverless DBs — they handle scaling for you

Design your database architecture at codelit.io — generate interactive diagrams with cost estimates, performance audits, and infrastructure exports.

Try it on Codelit

Chaos Mode

Simulate node failures and watch cascading impact across your architecture

Cost Estimator

See estimated AWS monthly costs for every component in your architecture

AI Architecture Review

Get an AI audit covering security gaps, bottlenecks, and scaling risks

Build this architecture →

Comments

AI agents

Build this architecture

Generate an interactive architecture for Database Scaling in seconds.

Try it in Codelit →

Database Scaling: Sharding, Replication & Partitioning Explained

Database Scaling: Sharding, Replication & Partitioning#

The Scaling Ladder#

Level 1: Query Optimization#

Level 2: Caching (Redis)#

Level 3: Vertical Scaling#

Level 4: Read Replicas#

Level 5: Connection Pooling#

Level 6: Partitioning#

Level 7: Sharding#

Decision Matrix#

Modern Approaches#

Serverless Databases#

NewSQL#

Architecture Examples#

E-Commerce (10M+ orders/year)#

SaaS Multi-Tenant (1000+ tenants)#

Summary#

Comments

Related articles

Context Engineering for Agentic Systems

AI Agent Memory Architecture

Production AI Agent Deployment Checklist

Try these templates

Netflix Video Streaming Architecture

Search Engine Architecture

Google Search Engine Architecture

Build this architecture

Database Scaling: Sharding, Replication & Partitioning Explained

Database Scaling: Sharding, Replication & Partitioning#

The Scaling Ladder#

Level 1: Query Optimization#

Level 2: Caching (Redis)#

Level 3: Vertical Scaling#

Level 4: Read Replicas#

Level 5: Connection Pooling#

Level 6: Partitioning#

Level 7: Sharding#

Decision Matrix#

Modern Approaches#

Serverless Databases#

NewSQL#

Architecture Examples#

E-Commerce (10M+ orders/year)#

SaaS Multi-Tenant (1000+ tenants)#

Summary#

Comments

Related articles

Context Engineering for Agentic Systems

AI Agent Memory Architecture

Production AI Agent Deployment Checklist

Try these templates

Netflix Video Streaming Architecture

Search Engine Architecture

Google Search Engine Architecture

Build this architecture