Horizontal vs Vertical Scaling: When to Scale Up and When to Scale Out
Horizontal vs Vertical Scaling#
Every system hits a ceiling. The question is never if you need to scale, but how. The two fundamental strategies — scale up (vertical) and scale out (horizontal) — have radically different trade-offs.
Vertical Scaling (Scale Up)#
Add more power to a single machine: more CPU, RAM, faster disks.
Before: 4 cores, 16 GB RAM, 500 GB SSD
After: 64 cores, 256 GB RAM, 2 TB NVMe
Advantages:
- Zero code changes required
- No distributed systems complexity
- Simpler debugging and monitoring
- Single point of data consistency
Limits:
- Hardware has a ceiling (you cannot buy a 10,000-core machine)
- Single point of failure
- Downtime during upgrades
- Cost grows exponentially at the top end
Horizontal Scaling (Scale Out)#
Add more machines running the same workload behind a load balancer.
Before: 1 server handling 1,000 req/s
After: 10 servers handling 10,000 req/s
Advantages:
- Near-linear capacity growth
- Built-in redundancy
- Rolling upgrades with zero downtime
- Commodity hardware keeps costs linear
Limits:
- Requires stateless (or externalized state) design
- Data consistency becomes complex
- Network overhead between nodes
- Operational complexity increases
When to Choose Each#
| Factor | Vertical | Horizontal |
|---|---|---|
| Traffic pattern | Predictable, moderate | Spiky, high-volume |
| Data model | Strong consistency needed | Eventually consistent OK |
| Team size | Small ops team | Dedicated platform team |
| Budget curve | Spend more per unit | Spend less per unit at scale |
| Downtime tolerance | Some acceptable | Zero tolerance |
Rule of thumb: Start vertical until you hit a wall, then go horizontal for the workloads that need it.
Stateless Services: The Foundation of Horizontal Scaling#
Horizontal scaling only works when any instance can handle any request. That means no local state.
// Bad: state lives on the instance
const sessions = new Map();
app.post("/login", (req, res) => {
sessions.set(req.userId, { token: "abc" });
});
// Good: state lives in external store
app.post("/login", async (req, res) => {
await redis.set(`session:${req.userId}`, { token: "abc" });
});
Move these out of the instance:
- Sessions — Redis, Memcached, or JWT tokens
- File uploads — S3, GCS, or object storage
- Caches — Shared Redis/Memcached cluster
- Job state — External queue (SQS, RabbitMQ)
Sticky Sessions: The Anti-Pattern#
Sticky sessions (session affinity) route a user to the same server every time. This undermines horizontal scaling:
- Uneven load distribution
- Failover kills the session
- Cannot scale down without disrupting users
If you must use sticky sessions temporarily, plan a migration path to externalized state.
Auto-Scaling Strategies#
Modern platforms scale instances automatically based on metrics.
CPU-Based Auto-Scaling#
The most common trigger. Simple but often laggy.
# Kubernetes HPA
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
spec:
minReplicas: 2
maxReplicas: 20
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 60
Problem: CPU spikes after requests queue up — by the time new pods start, users already see latency.
Queue-Depth Scaling#
Scale workers based on how many messages are waiting, not CPU.
metrics:
- type: External
external:
metric:
name: sqs_queue_length
target:
type: AverageValue
averageValue: 10 # 10 messages per worker
This is proactive — scale before CPU even rises.
Custom Metrics#
The best auto-scaling uses domain-specific signals:
| Metric | Use Case |
|---|---|
| Request latency p99 | API servers |
| Queue depth | Background workers |
| Active WebSocket connections | Real-time services |
| Pending database connections | Connection-pooling proxies |
| Business metric (orders/min) | E-commerce during sales |
Scaling Policies#
Prevent flapping with cooldown periods and step scaling:
Scale up: aggressive (1 minute cooldown)
Scale down: conservative (5 minute cooldown, 1 instance at a time)
Database Scaling#
Databases are the hardest component to scale horizontally.
Vertical First#
Most databases benefit enormously from vertical scaling:
- More RAM = larger buffer pool = fewer disk reads
- Faster CPU = faster query execution
- NVMe SSDs = faster random I/O
Read Replicas (Horizontal Reads)#
Writes -> Primary DB
Reads -> Replica 1, Replica 2, Replica 3
Works well when read-to-write ratio is high (typical for most apps). Accept slight replication lag for reads.
Sharding (Horizontal Writes)#
Split data across multiple databases by a shard key:
Users A-M -> Shard 1
Users N-Z -> Shard 2
Sharding is complex — cross-shard queries, rebalancing, and hotspots are real problems. Use it only when vertical scaling and read replicas are exhausted.
Connection Pooling#
Before scaling the database, scale the connection layer:
100 app instances x 10 connections = 1,000 DB connections (too many)
100 app instances -> PgBouncer (50 pooled connections) -> Database
Tools: PgBouncer, ProxySQL, RDS Proxy.
Cost Comparison#
Vertical scaling cost curve:
$100/mo -> 4 CPU, 16 GB (baseline)
$400/mo -> 16 CPU, 64 GB (4x cost for 4x power)
$2000/mo -> 64 CPU, 256 GB (20x cost for 16x power) -- diminishing returns
Horizontal scaling cost curve:
$100/mo -> 1 instance (baseline)
$400/mo -> 4 instances (4x cost for ~4x capacity)
$1000/mo -> 10 instances (10x cost for ~10x capacity) -- linear
Horizontal scaling wins on cost at scale, but vertical scaling wins on simplicity at small scale.
A Practical Scaling Playbook#
- Start vertical — upgrade your single server until costs become unreasonable
- Add read replicas — offload read traffic from the primary database
- Add caching — Redis/Memcached to reduce database load by 80%+
- Go stateless — externalize sessions, uploads, and local caches
- Add horizontal app servers — behind a load balancer with auto-scaling
- Shard the database — only when all other options are exhausted
- Add CDN — offload static and cacheable content globally
Key Takeaways#
- Vertical scaling is simple but has a ceiling; horizontal scaling is complex but unbounded
- Stateless services are the prerequisite for horizontal scaling
- Auto-scale on the metric closest to user impact, not just CPU
- Database scaling follows its own progression: vertical, read replicas, caching, then sharding
- Cost grows linearly with horizontal scaling but exponentially with vertical at the high end
Article #247 in the System Design series. Keep building: codelit.io/blog.
Try it on Codelit
Cost Estimator
See estimated AWS monthly costs for every component in your architecture
Related articles
Try these templates
WhatsApp-Scale Messaging System
End-to-end encrypted messaging with offline delivery, group chats, and media sharing at billions-of-messages scale.
9 componentsGmail-Scale Email Service
Email platform handling billions of messages with spam filtering, search indexing, attachment storage, and push notifications.
10 componentsBuild this architecture
Generate an interactive architecture for Horizontal vs Vertical Scaling in seconds.
Try it in Codelit →
Comments