scalinghorizontal scalingvertical scalingauto-scalingsystem design

Horizontal vs Vertical Scaling: When to Scale Up and When to Scale Out

March 29, 2026 6 min readBy Codelit Team Discussion

Horizontal vs Vertical Scaling#

Every system hits a ceiling. The question is never if you need to scale, but how. The two fundamental strategies — scale up (vertical) and scale out (horizontal) — have radically different trade-offs.

Vertical Scaling (Scale Up)#

Add more power to a single machine: more CPU, RAM, faster disks.

Before: 4 cores, 16 GB RAM, 500 GB SSD
After:  64 cores, 256 GB RAM, 2 TB NVMe

Advantages:

Zero code changes required
No distributed systems complexity
Simpler debugging and monitoring
Single point of data consistency

Limits:

Hardware has a ceiling (you cannot buy a 10,000-core machine)
Single point of failure
Downtime during upgrades
Cost grows exponentially at the top end

Horizontal Scaling (Scale Out)#

Add more machines running the same workload behind a load balancer.

Before: 1 server handling 1,000 req/s
After:  10 servers handling 10,000 req/s

Advantages:

Near-linear capacity growth
Built-in redundancy
Rolling upgrades with zero downtime
Commodity hardware keeps costs linear

Limits:

Requires stateless (or externalized state) design
Data consistency becomes complex
Network overhead between nodes
Operational complexity increases

When to Choose Each#

Factor	Vertical	Horizontal
Traffic pattern	Predictable, moderate	Spiky, high-volume
Data model	Strong consistency needed	Eventually consistent OK
Team size	Small ops team	Dedicated platform team
Budget curve	Spend more per unit	Spend less per unit at scale
Downtime tolerance	Some acceptable	Zero tolerance

Rule of thumb: Start vertical until you hit a wall, then go horizontal for the workloads that need it.

Stateless Services: The Foundation of Horizontal Scaling#

Horizontal scaling only works when any instance can handle any request. That means no local state.

// Bad: state lives on the instance
const sessions = new Map();
app.post("/login", (req, res) =&gt; {
  sessions.set(req.userId, { token: "abc" });
});

// Good: state lives in external store
app.post("/login", async (req, res) =&gt; {
  await redis.set(`session:${req.userId}`, { token: "abc" });
});

Move these out of the instance:

Sessions — Redis, Memcached, or JWT tokens
File uploads — S3, GCS, or object storage
Caches — Shared Redis/Memcached cluster
Job state — External queue (SQS, RabbitMQ)

Sticky Sessions: The Anti-Pattern#

Sticky sessions (session affinity) route a user to the same server every time. This undermines horizontal scaling:

Uneven load distribution
Failover kills the session
Cannot scale down without disrupting users

If you must use sticky sessions temporarily, plan a migration path to externalized state.

Auto-Scaling Strategies#

Modern platforms scale instances automatically based on metrics.

CPU-Based Auto-Scaling#

The most common trigger. Simple but often laggy.

# Kubernetes HPA
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
spec:
  minReplicas: 2
  maxReplicas: 20
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 60

Problem: CPU spikes after requests queue up — by the time new pods start, users already see latency.

Queue-Depth Scaling#

Scale workers based on how many messages are waiting, not CPU.

metrics:
  - type: External
    external:
      metric:
        name: sqs_queue_length
      target:
        type: AverageValue
        averageValue: 10  # 10 messages per worker

This is proactive — scale before CPU even rises.

Custom Metrics#

The best auto-scaling uses domain-specific signals:

Metric	Use Case
Request latency p99	API servers
Queue depth	Background workers
Active WebSocket connections	Real-time services
Pending database connections	Connection-pooling proxies
Business metric (orders/min)	E-commerce during sales

Scaling Policies#

Prevent flapping with cooldown periods and step scaling:

Scale up:   aggressive (1 minute cooldown)
Scale down: conservative (5 minute cooldown, 1 instance at a time)

Database Scaling#

Databases are the hardest component to scale horizontally.

Vertical First#

Most databases benefit enormously from vertical scaling:

More RAM = larger buffer pool = fewer disk reads
Faster CPU = faster query execution
NVMe SSDs = faster random I/O

Read Replicas (Horizontal Reads)#

Writes -&gt; Primary DB
Reads  -&gt; Replica 1, Replica 2, Replica 3

Works well when read-to-write ratio is high (typical for most apps). Accept slight replication lag for reads.

Sharding (Horizontal Writes)#

Split data across multiple databases by a shard key:

Users A-M -&gt; Shard 1
Users N-Z -&gt; Shard 2

Sharding is complex — cross-shard queries, rebalancing, and hotspots are real problems. Use it only when vertical scaling and read replicas are exhausted.

Connection Pooling#

Before scaling the database, scale the connection layer:

100 app instances x 10 connections = 1,000 DB connections (too many)
100 app instances -&gt; PgBouncer (50 pooled connections) -&gt; Database

Tools: PgBouncer, ProxySQL, RDS Proxy.

Cost Comparison#

Vertical scaling cost curve:
$100/mo  -&gt; 4 CPU,  16 GB   (baseline)
$400/mo  -&gt; 16 CPU, 64 GB   (4x cost for 4x power)
$2000/mo -&gt; 64 CPU, 256 GB  (20x cost for 16x power)  -- diminishing returns

Horizontal scaling cost curve:
$100/mo  -&gt; 1 instance   (baseline)
$400/mo  -&gt; 4 instances  (4x cost for ~4x capacity)
$1000/mo -&gt; 10 instances (10x cost for ~10x capacity) -- linear

Horizontal scaling wins on cost at scale, but vertical scaling wins on simplicity at small scale.

A Practical Scaling Playbook#

Start vertical — upgrade your single server until costs become unreasonable
Add read replicas — offload read traffic from the primary database
Add caching — Redis/Memcached to reduce database load by 80%+
Go stateless — externalize sessions, uploads, and local caches
Add horizontal app servers — behind a load balancer with auto-scaling
Shard the database — only when all other options are exhausted
Add CDN — offload static and cacheable content globally

Key Takeaways#

Vertical scaling is simple but has a ceiling; horizontal scaling is complex but unbounded
Stateless services are the prerequisite for horizontal scaling
Auto-scale on the metric closest to user impact, not just CPU
Database scaling follows its own progression: vertical, read replicas, caching, then sharding
Cost grows linearly with horizontal scaling but exponentially with vertical at the high end

Article #247 in the System Design series. Keep building: codelit.io/blog.

Try it on Codelit

Cost Estimator

See estimated AWS monthly costs for every component in your architecture

Build this architecture →

Comments

AI search

AI-Powered Search Architecture: Semantic Search, Hybrid Search, and RAG

8 min read

AI safety

AI Safety Guardrails Architecture: Input Validation, Output Filtering, and Human-in-the-Loop

8 min read

API design

API Backward Compatibility: Ship Changes Without Breaking Consumers

6 min read

Try these templates

WhatsApp-Scale Messaging System

End-to-end encrypted messaging with offline delivery, group chats, and media sharing at billions-of-messages scale.

9 components

Gmail-Scale Email Service

Email platform handling billions of messages with spam filtering, search indexing, attachment storage, and push notifications.

10 components

Build this architecture

Generate an interactive architecture for Horizontal vs Vertical Scaling in seconds.

Try it in Codelit →

scalinghorizontal scalingvertical scalingauto-scalingsystem design

Horizontal vs Vertical Scaling: When to Scale Up and When to Scale Out

March 29, 2026 6 min readBy Codelit Team Discussion

Horizontal vs Vertical Scaling#

Vertical Scaling (Scale Up)#

Add more power to a single machine: more CPU, RAM, faster disks.

Before: 4 cores, 16 GB RAM, 500 GB SSD
After:  64 cores, 256 GB RAM, 2 TB NVMe

Advantages:

Zero code changes required
No distributed systems complexity
Simpler debugging and monitoring
Single point of data consistency

Limits:

Hardware has a ceiling (you cannot buy a 10,000-core machine)
Single point of failure
Downtime during upgrades
Cost grows exponentially at the top end

Horizontal Scaling (Scale Out)#

Add more machines running the same workload behind a load balancer.

Before: 1 server handling 1,000 req/s
After:  10 servers handling 10,000 req/s

Advantages:

Near-linear capacity growth
Built-in redundancy
Rolling upgrades with zero downtime
Commodity hardware keeps costs linear

Limits:

Requires stateless (or externalized state) design
Data consistency becomes complex
Network overhead between nodes
Operational complexity increases

When to Choose Each#

Factor	Vertical	Horizontal
Traffic pattern	Predictable, moderate	Spiky, high-volume
Data model	Strong consistency needed	Eventually consistent OK
Team size	Small ops team	Dedicated platform team
Budget curve	Spend more per unit	Spend less per unit at scale
Downtime tolerance	Some acceptable	Zero tolerance

Rule of thumb: Start vertical until you hit a wall, then go horizontal for the workloads that need it.

Stateless Services: The Foundation of Horizontal Scaling#

Horizontal scaling only works when any instance can handle any request. That means no local state.

// Bad: state lives on the instance
const sessions = new Map();
app.post("/login", (req, res) =&gt; {
  sessions.set(req.userId, { token: "abc" });
});

// Good: state lives in external store
app.post("/login", async (req, res) =&gt; {
  await redis.set(`session:${req.userId}`, { token: "abc" });
});

Move these out of the instance:

Sessions — Redis, Memcached, or JWT tokens
File uploads — S3, GCS, or object storage
Caches — Shared Redis/Memcached cluster
Job state — External queue (SQS, RabbitMQ)

Sticky Sessions: The Anti-Pattern#

Sticky sessions (session affinity) route a user to the same server every time. This undermines horizontal scaling:

Uneven load distribution
Failover kills the session
Cannot scale down without disrupting users

If you must use sticky sessions temporarily, plan a migration path to externalized state.

Auto-Scaling Strategies#

Modern platforms scale instances automatically based on metrics.

CPU-Based Auto-Scaling#

The most common trigger. Simple but often laggy.

# Kubernetes HPA
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
spec:
  minReplicas: 2
  maxReplicas: 20
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 60

Problem: CPU spikes after requests queue up — by the time new pods start, users already see latency.

Queue-Depth Scaling#

Scale workers based on how many messages are waiting, not CPU.

metrics:
  - type: External
    external:
      metric:
        name: sqs_queue_length
      target:
        type: AverageValue
        averageValue: 10  # 10 messages per worker

This is proactive — scale before CPU even rises.

Custom Metrics#

The best auto-scaling uses domain-specific signals:

Metric	Use Case
Request latency p99	API servers
Queue depth	Background workers
Active WebSocket connections	Real-time services
Pending database connections	Connection-pooling proxies
Business metric (orders/min)	E-commerce during sales

Scaling Policies#

Prevent flapping with cooldown periods and step scaling:

Scale up:   aggressive (1 minute cooldown)
Scale down: conservative (5 minute cooldown, 1 instance at a time)

Database Scaling#

Databases are the hardest component to scale horizontally.

Vertical First#

Most databases benefit enormously from vertical scaling:

More RAM = larger buffer pool = fewer disk reads
Faster CPU = faster query execution
NVMe SSDs = faster random I/O

Read Replicas (Horizontal Reads)#

Writes -&gt; Primary DB
Reads  -&gt; Replica 1, Replica 2, Replica 3

Works well when read-to-write ratio is high (typical for most apps). Accept slight replication lag for reads.

Sharding (Horizontal Writes)#

Split data across multiple databases by a shard key:

Users A-M -&gt; Shard 1
Users N-Z -&gt; Shard 2

Sharding is complex — cross-shard queries, rebalancing, and hotspots are real problems. Use it only when vertical scaling and read replicas are exhausted.

Connection Pooling#

Before scaling the database, scale the connection layer:

100 app instances x 10 connections = 1,000 DB connections (too many)
100 app instances -&gt; PgBouncer (50 pooled connections) -&gt; Database

Tools: PgBouncer, ProxySQL, RDS Proxy.

Cost Comparison#

Vertical scaling cost curve:
$100/mo  -&gt; 4 CPU,  16 GB   (baseline)
$400/mo  -&gt; 16 CPU, 64 GB   (4x cost for 4x power)
$2000/mo -&gt; 64 CPU, 256 GB  (20x cost for 16x power)  -- diminishing returns

Horizontal scaling cost curve:
$100/mo  -&gt; 1 instance   (baseline)
$400/mo  -&gt; 4 instances  (4x cost for ~4x capacity)
$1000/mo -&gt; 10 instances (10x cost for ~10x capacity) -- linear

Horizontal scaling wins on cost at scale, but vertical scaling wins on simplicity at small scale.

A Practical Scaling Playbook#

Start vertical — upgrade your single server until costs become unreasonable
Add read replicas — offload read traffic from the primary database
Add caching — Redis/Memcached to reduce database load by 80%+
Go stateless — externalize sessions, uploads, and local caches
Add horizontal app servers — behind a load balancer with auto-scaling
Shard the database — only when all other options are exhausted
Add CDN — offload static and cacheable content globally

Key Takeaways#

Vertical scaling is simple but has a ceiling; horizontal scaling is complex but unbounded
Stateless services are the prerequisite for horizontal scaling
Auto-scale on the metric closest to user impact, not just CPU
Database scaling follows its own progression: vertical, read replicas, caching, then sharding
Cost grows linearly with horizontal scaling but exponentially with vertical at the high end

Article #247 in the System Design series. Keep building: codelit.io/blog.

Try it on Codelit

Cost Estimator

See estimated AWS monthly costs for every component in your architecture

Build this architecture →

Comments

AI search

AI-Powered Search Architecture: Semantic Search, Hybrid Search, and RAG

8 min read

AI safety

AI Safety Guardrails Architecture: Input Validation, Output Filtering, and Human-in-the-Loop

8 min read

API design

API Backward Compatibility: Ship Changes Without Breaking Consumers

6 min read

Try these templates

WhatsApp-Scale Messaging System

End-to-end encrypted messaging with offline delivery, group chats, and media sharing at billions-of-messages scale.

9 components

Gmail-Scale Email Service

Email platform handling billions of messages with spam filtering, search indexing, attachment storage, and push notifications.

10 components

Build this architecture

Generate an interactive architecture for Horizontal vs Vertical Scaling in seconds.

Try it in Codelit →

Horizontal vs Vertical Scaling: When to Scale Up and When to Scale Out

Horizontal vs Vertical Scaling#

Vertical Scaling (Scale Up)#

Horizontal Scaling (Scale Out)#

When to Choose Each#

Stateless Services: The Foundation of Horizontal Scaling#

Sticky Sessions: The Anti-Pattern#

Auto-Scaling Strategies#

CPU-Based Auto-Scaling#

Queue-Depth Scaling#

Custom Metrics#

Scaling Policies#

Database Scaling#

Vertical First#

Read Replicas (Horizontal Reads)#

Sharding (Horizontal Writes)#

Connection Pooling#

Cost Comparison#

A Practical Scaling Playbook#

Key Takeaways#

Comments

Related articles

AI-Powered Search Architecture: Semantic Search, Hybrid Search, and RAG

AI Safety Guardrails Architecture: Input Validation, Output Filtering, and Human-in-the-Loop

API Backward Compatibility: Ship Changes Without Breaking Consumers

Try these templates

WhatsApp-Scale Messaging System

Gmail-Scale Email Service

Build this architecture

Horizontal vs Vertical Scaling: When to Scale Up and When to Scale Out

Horizontal vs Vertical Scaling#

Vertical Scaling (Scale Up)#

Horizontal Scaling (Scale Out)#

When to Choose Each#

Stateless Services: The Foundation of Horizontal Scaling#

Sticky Sessions: The Anti-Pattern#

Auto-Scaling Strategies#

CPU-Based Auto-Scaling#

Queue-Depth Scaling#

Custom Metrics#

Scaling Policies#

Database Scaling#

Vertical First#

Read Replicas (Horizontal Reads)#

Sharding (Horizontal Writes)#

Connection Pooling#

Cost Comparison#

A Practical Scaling Playbook#

Key Takeaways#

Comments

Related articles

AI-Powered Search Architecture: Semantic Search, Hybrid Search, and RAG

AI Safety Guardrails Architecture: Input Validation, Output Filtering, and Human-in-the-Loop

API Backward Compatibility: Ship Changes Without Breaking Consumers

Try these templates

WhatsApp-Scale Messaging System

Gmail-Scale Email Service

Build this architecture