Capacity Planning: A Practical Guide for Engineers
Every outage has a root cause, and running out of capacity is one of the most preventable. Capacity planning is the discipline of forecasting resource needs — compute, memory, storage, network — and ensuring your infrastructure can handle both current load and projected growth without over-provisioning or under-provisioning.
Why Capacity Planning Matters#
Under-provision and your service degrades or crashes during traffic spikes. Over-provision and you burn budget on idle resources. Good capacity planning hits the sweet spot: enough headroom to absorb surges, lean enough to keep costs rational.
Capacity planning feeds into:
- Reliability — SLOs depend on having enough resources to serve traffic within latency targets.
- Cost management — Cloud bills scale with provisioned resources, not just consumed ones.
- Incident prevention — Disk-full, OOM-killed, and connection-pool-exhausted failures are all capacity problems.
Step 1: Traffic Estimation#
Start with demand. You cannot size infrastructure without understanding the load it must handle.
Baseline Metrics#
Collect current traffic data across dimensions:
- Requests per second (RPS) — broken down by endpoint or service.
- Bandwidth — ingress and egress in Mbps or Gbps.
- Concurrent connections — WebSocket, HTTP keep-alive, database connections.
- Payload sizes — average and P99 request/response body sizes.
Growth Projections#
Estimate future traffic by combining:
- Historical trends — month-over-month or quarter-over-quarter growth rates.
- Business forecasts — product launches, marketing campaigns, geographic expansion.
- Seasonal patterns — Black Friday spikes, end-of-quarter surges, holiday dips.
A common rule of thumb: design for 3x current peak to cover organic growth and unexpected spikes for the next 12 months.
Step 2: Server Sizing#
With traffic estimates in hand, translate requests into resource requirements.
CPU#
Profile your application to understand CPU cost per request. If a single core handles 500 RPS and you expect 10,000 RPS at peak, you need at least 20 cores — before accounting for headroom.
cores_needed = peak_rps / rps_per_core
cores_provisioned = cores_needed * headroom_factor (typically 1.3 - 1.5)
Memory#
Memory sizing depends on:
- Per-request memory — stack frames, buffers, deserialized payloads.
- In-process caches — application-level caches consume resident memory.
- Connection overhead — each open connection (HTTP, gRPC, WebSocket) carries a memory cost.
- Runtime overhead — GC pressure in JVM or Go, interpreter state in Python.
Network#
Calculate bandwidth needs from RPS multiplied by average response size. Do not forget internal traffic: service-to-service calls, database queries, cache lookups, and replication streams often exceed external traffic.
Step 3: Storage Growth#
Storage is the resource most likely to creep up silently and cause an outage at 3 AM.
Estimating Growth#
daily_growth = new_records_per_day * avg_record_size
monthly_growth = daily_growth * 30
annual_storage = current_size + (monthly_growth * 12)
Factor in:
- Indexes — secondary indexes can double effective storage.
- Replication — a replication factor of 3 triples raw storage needs.
- Backups and snapshots — retained snapshots add to total storage cost.
- Compaction overhead — LSM-based databases need temporary space during compaction (see article #275).
Retention Policies#
Not all data needs to live forever. Define TTLs for logs, events, and transient records. Tier cold data to cheaper storage (S3, GCS) and keep hot data on fast disks (NVMe, EBS io2).
Step 4: Database Capacity#
Databases are often the first bottleneck. Capacity planning for databases involves:
Connection Pools#
Most databases have a hard connection limit. A PostgreSQL instance defaults to 100 connections. With connection pooling (PgBouncer, ProxySQL), you can multiplex thousands of application connections into a smaller pool of database connections.
pool_size = (max_db_connections * 0.8) / number_of_app_instances
Query Throughput#
Benchmark your critical queries under load. Measure queries per second (QPS) and latency at P50, P95, and P99. Identify whether your bottleneck is CPU (complex queries), I/O (large scans), or memory (working set exceeds buffer pool).
Read Replicas#
If reads dominate, scale horizontally with read replicas. A common pattern: one primary for writes, N replicas for reads. Each replica adds read capacity linearly but introduces replication lag as a trade-off.
Step 5: Caching Layer Sizing#
Caches (Redis, Memcached) absorb read load and reduce database pressure. Sizing a cache requires:
- Working set size — the subset of data accessed frequently. Use key-access distributions (Zipfian is common) to estimate.
- Hit rate targets — a 95% hit rate means only 5% of requests fall through to the database. Model the impact on database load.
- Memory per key — key size + value size + overhead (Redis adds roughly 80-100 bytes per key for internal bookkeeping).
- Eviction headroom — provision 10-20% more memory than the working set to avoid excessive evictions.
cache_memory = working_set_keys * (avg_key_size + avg_value_size + overhead)
cache_memory_provisioned = cache_memory * 1.2
Step 6: Load Testing for Capacity#
Estimates are hypotheses. Load testing validates them.
Approaches#
- Synthetic load tests — tools like k6, Locust, or Gatling simulate traffic patterns against staging or production (with traffic shadowing).
- Stress tests — push beyond expected peak to find the breaking point.
- Soak tests — run at sustained high load for hours to catch memory leaks, connection leaks, and GC degradation.
What to Measure#
- Throughput (RPS) at target latency SLOs.
- Error rate under load.
- Resource utilization (CPU, memory, disk I/O, network) at each load tier.
- The exact RPS where latency exceeds SLO — this is your capacity ceiling.
Step 7: Auto-Scaling Policies#
In cloud environments, auto-scaling turns capacity planning from a static exercise into a dynamic one.
Scaling Triggers#
| Metric | Scale-out threshold | Scale-in threshold |
|---|---|---|
| CPU utilization | 70% for 3 min | 30% for 10 min |
| Request latency P95 | exceeds SLO | well below SLO |
| Queue depth | growing for 2 min | near zero for 5 min |
| Memory utilization | 80% | 50% |
Scaling Policies#
- Target tracking — maintain a metric at a target value (e.g., keep average CPU at 60%).
- Step scaling — add N instances when metric crosses threshold A, add M more at threshold B.
- Scheduled scaling — pre-scale before predictable traffic events (marketing emails, product launches).
- Cooldown periods — prevent thrashing by enforcing minimum time between scaling actions (typically 3-5 minutes).
Step 8: Cost Forecasting#
Capacity planning and cost planning are inseparable in the cloud.
Building a Cost Model#
- Map each service to its resource footprint (instance type, storage, bandwidth).
- Multiply by projected instance count at each growth milestone.
- Apply pricing (on-demand, reserved, spot) to each resource.
- Add data transfer, managed service fees, and support costs.
Optimization Levers#
- Reserved instances or savings plans — commit to 1-3 year terms for 30-60% savings on stable baseline load.
- Spot or preemptible instances — use for stateless, fault-tolerant workloads at 60-90% discount.
- Right-sizing — match instance types to actual utilization. A c5.xlarge at 80% CPU is better than a c5.4xlarge at 20%.
- Autoscale aggressively — do not pay for idle capacity during off-peak hours.
Step 9: Headroom Planning#
Headroom is the buffer between current utilization and maximum capacity. It absorbs unexpected spikes without triggering incidents.
How Much Headroom#
- Compute: 30-50% headroom above average utilization.
- Storage: 6 months of projected growth beyond current usage.
- Database connections: 20% of max connections kept free.
- Network: 40% headroom on link capacity.
Headroom Review Cadence#
Review capacity quarterly. Compare actuals against projections. Adjust growth rates, retire unused resources, and re-forecast costs. Treat capacity planning as a living process, not a one-time spreadsheet.
Putting It All Together#
A capacity planning checklist:
- Collect baseline traffic and resource metrics.
- Project growth for the next 6-12 months.
- Size compute, memory, storage, and network independently.
- Validate estimates with load tests.
- Configure auto-scaling with appropriate triggers and cooldowns.
- Build a cost model and identify optimization opportunities.
- Maintain headroom and review quarterly.
Capacity planning is not glamorous, but it is the difference between a system that gracefully handles 10x traffic and one that pages your on-call engineer at 2 AM. Invest the time upfront, and your future self will thank you.
That is article #276 on Codelit. Explore the full archive for deep dives on distributed systems, infrastructure, algorithms, and software engineering fundamentals. New articles every week.
Try it on Codelit
Cost Estimator
See estimated AWS monthly costs for every component in your architecture
Related articles
Build this architecture
Generate an interactive architecture for Capacity Planning in seconds.
Try it in Codelit →
Comments