Horizontal vs Vertical Scaling: When to Scale Up vs Scale Out
Horizontal vs Vertical Scaling: When to Scale Up vs Scale Out#
Your app is slow. Traffic is growing. Do you get a bigger server or add more servers? This decision shapes your entire architecture.
The Two Approaches#
Vertical Scaling (Scale Up)#
Make the server bigger — more CPU, RAM, faster SSD.
Before: 4 vCPU, 16GB RAM → After: 32 vCPU, 128GB RAM
Horizontal Scaling (Scale Out)#
Add more servers behind a load balancer.
Before: 1 server → After: 4 servers behind load balancer
Comparison#
| Factor | Vertical | Horizontal |
|---|---|---|
| Simplicity | Simple (no code changes) | Complex (stateless design needed) |
| Cost curve | Exponential (2x CPU ≠ 2x price) | Linear (2x servers ≈ 2x price) |
| Ceiling | Hardware limit (~96 vCPU, 768GB) | No theoretical limit |
| Downtime | Server restart required | Zero downtime (rolling) |
| Failure | Single point of failure | Survives server failures |
| State | State lives on the server | State must be externalized |
| Best for | Databases, simple apps | Stateless APIs, web servers |
When to Scale Vertically#
- Database — PostgreSQL, MySQL scale vertically well (read replicas are separate)
- Single-threaded workloads — faster CPU helps more than more CPUs
- Quick fix — buy time before re-architecting for horizontal
- Team < 5 — horizontal scaling adds operational complexity
- Cost is low — going from 2 vCPU to 8 vCPU is cheap
Vertical Scaling on Cloud#
| Provider | Max Instance | vCPU | RAM | Cost/mo |
|---|---|---|---|---|
| AWS | u-24tb1.metal | 448 | 24TB | ~$200K |
| Practical max | r6g.16xlarge | 64 | 512GB | ~$5K |
| Sweet spot | r6g.4xlarge | 16 | 128GB | ~$1.2K |
Rule: If your workload fits on a 16-64 vCPU machine, vertical is simpler.
When to Scale Horizontally#
- Stateless APIs — each request can go to any server
- Web servers — static content, API endpoints
- Workers — background job processors
- Traffic is spiky — auto-scale up during peaks, down during troughs
- High availability required — survive server failures
Making Your App Horizontally Scalable#
The key requirement: stateless servers.
❌ Stateful (can't scale horizontally):
Server stores sessions in memory
Server stores uploads in local filesystem
Server caches in local memory
✓ Stateless (can scale horizontally):
Sessions in Redis
Files in S3
Cache in Redis/Memcached
Config from environment variables
Auto-Scaling Patterns#
Target Tracking#
Scale based on a metric target:
Target: CPU utilization = 60%
Current: 80% (2 instances)
Action: Add 1 instance → 3 instances → CPU drops to ~53%
Step Scaling#
Different actions at different thresholds:
CPU > 70% → add 1 instance
CPU > 85% → add 3 instances
CPU < 30% → remove 1 instance
Scheduled Scaling#
Predictable traffic patterns:
Mon-Fri 9am: scale to 10 instances
Sat-Sun: scale to 3 instances
Black Friday: scale to 50 instances
Scaling Metrics#
| Metric | Scale On | Good For |
|---|---|---|
| CPU | > 60-70% | Compute-bound workloads |
| Request count | > threshold/sec | API servers |
| Queue depth | > N messages | Background workers |
| Response time | P95 > 200ms | Latency-sensitive APIs |
| Custom metric | Business-specific | ML inference, connections |
Scaling Different Components#
Stateless API Servers → Horizontal#
Load Balancer → API Server 1 (auto-scaled)
→ API Server 2
→ API Server 3
All share: Redis (sessions), PostgreSQL (data), S3 (files)
Databases → Vertical + Read Replicas#
Writes → Primary DB (vertical: 64 vCPU)
Reads → Read Replica 1
→ Read Replica 2
→ Read Replica 3
Cache → Horizontal (Cluster)#
Redis Cluster:
Shard 1: keys a-m
Shard 2: keys n-z
Each shard: primary + replica
Queue Workers → Horizontal#
SQS Queue → Worker 1 (auto-scaled by queue depth)
→ Worker 2
→ Worker N
Cost Analysis#
Small App (1000 req/sec)#
| Strategy | Setup | Monthly Cost |
|---|---|---|
| 1x large instance | Vertical | ~$300 |
| 3x small instances | Horizontal | ~$250 |
At small scale, vertical is often cheaper (no load balancer cost).
Medium App (10K req/sec)#
| Strategy | Setup | Monthly Cost |
|---|---|---|
| 1x xlarge instance | Vertical | ~$1,500 |
| 6x medium + ALB | Horizontal | ~$1,000 |
Horizontal becomes cheaper as you scale.
Large App (100K req/sec)#
Vertical isn't an option. Must scale horizontally.
Architecture Examples#
Simple SaaS (Vertical Start)#
Client → Nginx → Rails App (1 large server)
→ PostgreSQL (RDS, vertical scaling)
→ Redis (single instance)
Scale vertically until you can't. Then extract services.
Production SaaS (Horizontal)#
Client → CloudFront (CDN)
→ ALB → API (auto-scaling group, 3-20 instances)
→ Workers (auto-scaling, 2-10 by queue depth)
→ RDS Primary (writes) + 2 Read Replicas
→ ElastiCache Redis Cluster (3 shards)
→ S3 (files)
Decision Framework#
Can one server handle the load?
Yes → Scale vertically (simpler)
No → Is the workload stateless?
Yes → Scale horizontally (add servers)
No → Make it stateless first (externalize state)
Then scale horizontally
Need high availability?
Yes → Horizontal (minimum 2 servers, different AZs)
No → Vertical is fine for now
Summary#
- Start vertical — it's simpler and cheaper at small scale
- Go horizontal when you hit hardware limits, need HA, or have spiky traffic
- Stateless is the prerequisite — externalize sessions, files, cache
- Auto-scale on the right metric — CPU for compute, queue depth for workers
- Databases scale differently — vertical + read replicas, then sharding
- Cost is linear with horizontal — but adds operational complexity
Design scalable architectures at codelit.io — 110 articles, 100 product specs, 90+ templates, 29 export formats.
Try it on Codelit
Chaos Mode
Simulate node failures and watch cascading impact across your architecture
Cost Estimator
See estimated AWS monthly costs for every component in your architecture
Related articles
AI Agent Tool Use Architecture: Function Calling, ReAct Loops & Structured Outputs
6 min read
AI searchAI-Powered Search Architecture: Semantic Search, Hybrid Search, and RAG
8 min read
AI safetyAI Safety Guardrails Architecture: Input Validation, Output Filtering, and Human-in-the-Loop
8 min read
Try these templates
Netflix Video Streaming Architecture
Global video streaming platform with adaptive bitrate, CDN distribution, and recommendation engine.
10 componentsWhatsApp-Scale Messaging System
End-to-end encrypted messaging with offline delivery, group chats, and media sharing at billions-of-messages scale.
9 componentsSearch Engine Architecture
Web-scale search with crawling, indexing, ranking, and sub-second query serving.
8 componentsBuild this architecture
Generate an interactive architecture for Horizontal vs Vertical Scaling in seconds.
Try it in Codelit →
Comments