scalinginfrastructuresystem designarchitecture

Horizontal vs Vertical Scaling: When to Scale Up vs Scale Out

March 28, 2026 6 min readBy Codelit Team Discussion

Horizontal vs Vertical Scaling: When to Scale Up vs Scale Out#

Your app is slow. Traffic is growing. Do you get a bigger server or add more servers? This decision shapes your entire architecture.

The Two Approaches#

Vertical Scaling (Scale Up)#

Make the server bigger — more CPU, RAM, faster SSD.

Before: 4 vCPU, 16GB RAM → After: 32 vCPU, 128GB RAM

Horizontal Scaling (Scale Out)#

Add more servers behind a load balancer.

Before: 1 server → After: 4 servers behind load balancer

Comparison#

Factor	Vertical	Horizontal
Simplicity	Simple (no code changes)	Complex (stateless design needed)
Cost curve	Exponential (2x CPU ≠ 2x price)	Linear (2x servers ≈ 2x price)
Ceiling	Hardware limit (~96 vCPU, 768GB)	No theoretical limit
Downtime	Server restart required	Zero downtime (rolling)
Failure	Single point of failure	Survives server failures
State	State lives on the server	State must be externalized
Best for	Databases, simple apps	Stateless APIs, web servers

When to Scale Vertically#

Database — PostgreSQL, MySQL scale vertically well (read replicas are separate)
Single-threaded workloads — faster CPU helps more than more CPUs
Quick fix — buy time before re-architecting for horizontal
Team < 5 — horizontal scaling adds operational complexity
Cost is low — going from 2 vCPU to 8 vCPU is cheap

Vertical Scaling on Cloud#

Provider	Max Instance	vCPU	RAM	Cost/mo
AWS	u-24tb1.metal	448	24TB	~$200K
Practical max	r6g.16xlarge	64	512GB	~$5K
Sweet spot	r6g.4xlarge	16	128GB	~$1.2K

Rule: If your workload fits on a 16-64 vCPU machine, vertical is simpler.

When to Scale Horizontally#

Stateless APIs — each request can go to any server
Web servers — static content, API endpoints
Workers — background job processors
Traffic is spiky — auto-scale up during peaks, down during troughs
High availability required — survive server failures

Making Your App Horizontally Scalable#

The key requirement: stateless servers.

❌ Stateful (can't scale horizontally):
  Server stores sessions in memory
  Server stores uploads in local filesystem
  Server caches in local memory

✓ Stateless (can scale horizontally):
  Sessions in Redis
  Files in S3
  Cache in Redis/Memcached
  Config from environment variables

Auto-Scaling Patterns#

Target Tracking#

Scale based on a metric target:

Target: CPU utilization = 60%
Current: 80% (2 instances)
Action: Add 1 instance → 3 instances → CPU drops to ~53%

Step Scaling#

Different actions at different thresholds:

CPU > 70% → add 1 instance
CPU > 85% → add 3 instances
CPU < 30% → remove 1 instance

Scheduled Scaling#

Predictable traffic patterns:

Mon-Fri 9am: scale to 10 instances
Sat-Sun: scale to 3 instances
Black Friday: scale to 50 instances

Scaling Metrics#

Metric	Scale On	Good For
CPU	> 60-70%	Compute-bound workloads
Request count	> threshold/sec	API servers
Queue depth	> N messages	Background workers
Response time	P95 > 200ms	Latency-sensitive APIs
Custom metric	Business-specific	ML inference, connections

Scaling Different Components#

Stateless API Servers → Horizontal#

Load Balancer → API Server 1 (auto-scaled)
              → API Server 2
              → API Server 3
All share: Redis (sessions), PostgreSQL (data), S3 (files)

Databases → Vertical + Read Replicas#

Writes → Primary DB (vertical: 64 vCPU)
Reads  → Read Replica 1
       → Read Replica 2
       → Read Replica 3

Cache → Horizontal (Cluster)#

Redis Cluster:
  Shard 1: keys a-m
  Shard 2: keys n-z
  Each shard: primary + replica

Queue Workers → Horizontal#

SQS Queue → Worker 1 (auto-scaled by queue depth)
          → Worker 2
          → Worker N

Cost Analysis#

Small App (1000 req/sec)#

Strategy	Setup	Monthly Cost
1x large instance	Vertical	~$300
3x small instances	Horizontal	~$250

At small scale, vertical is often cheaper (no load balancer cost).

Medium App (10K req/sec)#

Strategy	Setup	Monthly Cost
1x xlarge instance	Vertical	~$1,500
6x medium + ALB	Horizontal	~$1,000

Horizontal becomes cheaper as you scale.

Large App (100K req/sec)#

Vertical isn't an option. Must scale horizontally.

Architecture Examples#

Simple SaaS (Vertical Start)#

Client → Nginx → Rails App (1 large server)
                      → PostgreSQL (RDS, vertical scaling)
                      → Redis (single instance)

Scale vertically until you can't. Then extract services.

Production SaaS (Horizontal)#

Client → CloudFront (CDN)
       → ALB → API (auto-scaling group, 3-20 instances)
             → Workers (auto-scaling, 2-10 by queue depth)
       → RDS Primary (writes) + 2 Read Replicas
       → ElastiCache Redis Cluster (3 shards)
       → S3 (files)

Generate your scaling architecture →

Decision Framework#

Can one server handle the load?
  Yes → Scale vertically (simpler)
  No → Is the workload stateless?
    Yes → Scale horizontally (add servers)
    No → Make it stateless first (externalize state)
         Then scale horizontally

Need high availability?
  Yes → Horizontal (minimum 2 servers, different AZs)
  No → Vertical is fine for now

Summary#

Start vertical — it's simpler and cheaper at small scale
Go horizontal when you hit hardware limits, need HA, or have spiky traffic
Stateless is the prerequisite — externalize sessions, files, cache
Auto-scale on the right metric — CPU for compute, queue depth for workers
Databases scale differently — vertical + read replicas, then sharding
Cost is linear with horizontal — but adds operational complexity

Design scalable architectures at codelit.io — 110 articles, 100 product specs, 90+ templates, 29 export formats.

Try it on Codelit

Chaos Mode

Simulate node failures and watch cascading impact across your architecture

Cost Estimator

See estimated AWS monthly costs for every component in your architecture

Build this architecture →

Comments

AI agents

Context Engineering for Agentic Systems

2 min read

AI agents

AI Agent Memory Architecture

2 min read

AI agents

Production AI Agent Deployment Checklist

2 min read

Try these templates

Netflix Video Streaming Architecture

Global video streaming platform with adaptive bitrate, CDN distribution, and recommendation engine.

10 components

WhatsApp-Scale Messaging System

End-to-end encrypted messaging with offline delivery, group chats, and media sharing at billions-of-messages scale.

9 components

Search Engine Architecture

Web-scale search with crawling, indexing, ranking, and sub-second query serving.

8 components

Build this architecture

Generate an interactive architecture for Horizontal vs Vertical Scaling in seconds.

Try it in Codelit →

scalinginfrastructuresystem designarchitecture

Horizontal vs Vertical Scaling: When to Scale Up vs Scale Out

March 28, 2026 6 min readBy Codelit Team Discussion

Horizontal vs Vertical Scaling: When to Scale Up vs Scale Out#

Your app is slow. Traffic is growing. Do you get a bigger server or add more servers? This decision shapes your entire architecture.

The Two Approaches#

Vertical Scaling (Scale Up)#

Make the server bigger — more CPU, RAM, faster SSD.

Before: 4 vCPU, 16GB RAM → After: 32 vCPU, 128GB RAM

Horizontal Scaling (Scale Out)#

Add more servers behind a load balancer.

Before: 1 server → After: 4 servers behind load balancer

Comparison#

Factor	Vertical	Horizontal
Simplicity	Simple (no code changes)	Complex (stateless design needed)
Cost curve	Exponential (2x CPU ≠ 2x price)	Linear (2x servers ≈ 2x price)
Ceiling	Hardware limit (~96 vCPU, 768GB)	No theoretical limit
Downtime	Server restart required	Zero downtime (rolling)
Failure	Single point of failure	Survives server failures
State	State lives on the server	State must be externalized
Best for	Databases, simple apps	Stateless APIs, web servers

When to Scale Vertically#

Database — PostgreSQL, MySQL scale vertically well (read replicas are separate)
Single-threaded workloads — faster CPU helps more than more CPUs
Quick fix — buy time before re-architecting for horizontal
Team < 5 — horizontal scaling adds operational complexity
Cost is low — going from 2 vCPU to 8 vCPU is cheap

Vertical Scaling on Cloud#

Provider	Max Instance	vCPU	RAM	Cost/mo
AWS	u-24tb1.metal	448	24TB	~$200K
Practical max	r6g.16xlarge	64	512GB	~$5K
Sweet spot	r6g.4xlarge	16	128GB	~$1.2K

Rule: If your workload fits on a 16-64 vCPU machine, vertical is simpler.

When to Scale Horizontally#

Stateless APIs — each request can go to any server
Web servers — static content, API endpoints
Workers — background job processors
Traffic is spiky — auto-scale up during peaks, down during troughs
High availability required — survive server failures

Making Your App Horizontally Scalable#

The key requirement: stateless servers.

❌ Stateful (can't scale horizontally):
  Server stores sessions in memory
  Server stores uploads in local filesystem
  Server caches in local memory

✓ Stateless (can scale horizontally):
  Sessions in Redis
  Files in S3
  Cache in Redis/Memcached
  Config from environment variables

Auto-Scaling Patterns#

Target Tracking#

Scale based on a metric target:

Target: CPU utilization = 60%
Current: 80% (2 instances)
Action: Add 1 instance → 3 instances → CPU drops to ~53%

Step Scaling#

Different actions at different thresholds:

CPU > 70% → add 1 instance
CPU > 85% → add 3 instances
CPU < 30% → remove 1 instance

Scheduled Scaling#

Predictable traffic patterns:

Mon-Fri 9am: scale to 10 instances
Sat-Sun: scale to 3 instances
Black Friday: scale to 50 instances

Scaling Metrics#

Metric	Scale On	Good For
CPU	> 60-70%	Compute-bound workloads
Request count	> threshold/sec	API servers
Queue depth	> N messages	Background workers
Response time	P95 > 200ms	Latency-sensitive APIs
Custom metric	Business-specific	ML inference, connections

Scaling Different Components#

Stateless API Servers → Horizontal#

Load Balancer → API Server 1 (auto-scaled)
              → API Server 2
              → API Server 3
All share: Redis (sessions), PostgreSQL (data), S3 (files)

Databases → Vertical + Read Replicas#

Writes → Primary DB (vertical: 64 vCPU)
Reads  → Read Replica 1
       → Read Replica 2
       → Read Replica 3

Cache → Horizontal (Cluster)#

Redis Cluster:
  Shard 1: keys a-m
  Shard 2: keys n-z
  Each shard: primary + replica

Queue Workers → Horizontal#

SQS Queue → Worker 1 (auto-scaled by queue depth)
          → Worker 2
          → Worker N

Cost Analysis#

Small App (1000 req/sec)#

Strategy	Setup	Monthly Cost
1x large instance	Vertical	~$300
3x small instances	Horizontal	~$250

At small scale, vertical is often cheaper (no load balancer cost).

Medium App (10K req/sec)#

Strategy	Setup	Monthly Cost
1x xlarge instance	Vertical	~$1,500
6x medium + ALB	Horizontal	~$1,000

Horizontal becomes cheaper as you scale.

Large App (100K req/sec)#

Vertical isn't an option. Must scale horizontally.

Architecture Examples#

Simple SaaS (Vertical Start)#

Client → Nginx → Rails App (1 large server)
                      → PostgreSQL (RDS, vertical scaling)
                      → Redis (single instance)

Scale vertically until you can't. Then extract services.

Production SaaS (Horizontal)#

Client → CloudFront (CDN)
       → ALB → API (auto-scaling group, 3-20 instances)
             → Workers (auto-scaling, 2-10 by queue depth)
       → RDS Primary (writes) + 2 Read Replicas
       → ElastiCache Redis Cluster (3 shards)
       → S3 (files)

Generate your scaling architecture →

Decision Framework#

Can one server handle the load?
  Yes → Scale vertically (simpler)
  No → Is the workload stateless?
    Yes → Scale horizontally (add servers)
    No → Make it stateless first (externalize state)
         Then scale horizontally

Need high availability?
  Yes → Horizontal (minimum 2 servers, different AZs)
  No → Vertical is fine for now

Summary#

Start vertical — it's simpler and cheaper at small scale
Go horizontal when you hit hardware limits, need HA, or have spiky traffic
Stateless is the prerequisite — externalize sessions, files, cache
Auto-scale on the right metric — CPU for compute, queue depth for workers
Databases scale differently — vertical + read replicas, then sharding
Cost is linear with horizontal — but adds operational complexity

Design scalable architectures at codelit.io — 110 articles, 100 product specs, 90+ templates, 29 export formats.

Try it on Codelit

Chaos Mode

Simulate node failures and watch cascading impact across your architecture

Cost Estimator

See estimated AWS monthly costs for every component in your architecture

Build this architecture →

Comments

AI agents

Build this architecture

Generate an interactive architecture for Horizontal vs Vertical Scaling in seconds.

Try it in Codelit →

Horizontal vs Vertical Scaling: When to Scale Up vs Scale Out

Horizontal vs Vertical Scaling: When to Scale Up vs Scale Out#

The Two Approaches#

Vertical Scaling (Scale Up)#

Horizontal Scaling (Scale Out)#

Comparison#

When to Scale Vertically#

Vertical Scaling on Cloud#

When to Scale Horizontally#

Making Your App Horizontally Scalable#

Auto-Scaling Patterns#

Target Tracking#

Step Scaling#

Scheduled Scaling#

Scaling Metrics#

Scaling Different Components#

Stateless API Servers → Horizontal#

Databases → Vertical + Read Replicas#

Cache → Horizontal (Cluster)#

Queue Workers → Horizontal#

Cost Analysis#

Small App (1000 req/sec)#

Medium App (10K req/sec)#

Large App (100K req/sec)#

Architecture Examples#

Simple SaaS (Vertical Start)#

Production SaaS (Horizontal)#

Decision Framework#

Summary#

Comments

Related articles

Context Engineering for Agentic Systems

AI Agent Memory Architecture

Production AI Agent Deployment Checklist

Try these templates

Netflix Video Streaming Architecture

WhatsApp-Scale Messaging System

Search Engine Architecture

Build this architecture

Horizontal vs Vertical Scaling: When to Scale Up vs Scale Out

Horizontal vs Vertical Scaling: When to Scale Up vs Scale Out#

The Two Approaches#

Vertical Scaling (Scale Up)#

Horizontal Scaling (Scale Out)#

Comparison#

When to Scale Vertically#

Vertical Scaling on Cloud#

When to Scale Horizontally#

Making Your App Horizontally Scalable#

Auto-Scaling Patterns#

Target Tracking#

Step Scaling#

Scheduled Scaling#

Scaling Metrics#

Scaling Different Components#

Stateless API Servers → Horizontal#

Databases → Vertical + Read Replicas#

Cache → Horizontal (Cluster)#

Queue Workers → Horizontal#

Cost Analysis#

Small App (1000 req/sec)#

Medium App (10K req/sec)#

Large App (100K req/sec)#

Architecture Examples#

Simple SaaS (Vertical Start)#

Production SaaS (Horizontal)#

Decision Framework#

Summary#

Comments

Related articles

Context Engineering for Agentic Systems

AI Agent Memory Architecture

Production AI Agent Deployment Checklist

Try these templates

Netflix Video Streaming Architecture

WhatsApp-Scale Messaging System

Search Engine Architecture

Build this architecture