System Design Cheat Sheet: Quick Reference for Interviews and Beyond
System design interviews test your ability to think through trade-offs at scale. This cheat sheet gives you the key numbers, components, patterns, and checklists you need — all in one place.
Numbers Every Engineer Should Know#
These latency and throughput numbers help you make quick back-of-envelope estimates.
| Operation | Latency |
|---|---|
| L1 cache reference | 0.5 ns |
| L2 cache reference | 7 ns |
| Main memory reference | 100 ns |
| SSD random read | 150 us |
| HDD random read | 10 ms |
| Round trip within same datacenter | 0.5 ms |
| Round trip CA to Netherlands | 150 ms |
Throughput rules of thumb:
- A single server can handle ~10K–50K concurrent connections (with async I/O)
- A single PostgreSQL instance handles ~5K–10K transactions/second
- Redis handles ~100K operations/second on a single node
- A single SSD delivers ~100K–200K random IOPS
- 1 Gbps network = ~125 MB/s throughput
Data size estimates:
- 1 million users with 1 KB profile = 1 GB
- 1 billion rows at 100 bytes each = 100 GB
- 10 million images at 200 KB each = 2 TB
- 1 million daily active users generating 10 requests each = 10M requests/day = ~115 requests/second
Common Components#
Load Balancer (LB)#
Distributes traffic across multiple servers. Enables horizontal scaling and eliminates single points of failure.
- L4 (transport): Routes based on IP and port. Fast, no inspection.
- L7 (application): Routes based on HTTP headers, URL, cookies. More flexible.
- Algorithms: Round robin, least connections, weighted, IP hash, consistent hashing.
- Tools: NGINX, HAProxy, AWS ALB/NLB, Envoy.
Cache#
Reduces latency and database load by storing frequently accessed data in memory.
- Cache-aside: App checks cache first, loads from DB on miss, writes to cache.
- Write-through: App writes to cache and DB simultaneously.
- Write-behind: App writes to cache; cache async-writes to DB.
- Eviction: LRU (most common), LFU, TTL-based.
- Tools: Redis, Memcached, Varnish (HTTP cache), CDN edge cache.
Database (DB)#
Choose based on access patterns, consistency needs, and scale requirements.
- Relational (SQL): Strong consistency, ACID, complex queries. PostgreSQL, MySQL.
- Document (NoSQL): Flexible schema, horizontal scale. MongoDB, DynamoDB.
- Wide-column: High write throughput, time-series. Cassandra, ScyllaDB.
- Graph: Relationship-heavy queries. Neo4j, Amazon Neptune.
- Key-value: Simple lookups, extreme speed. Redis, DynamoDB.
Message Queue#
Decouples producers from consumers. Enables async processing and absorbs traffic spikes.
- At-most-once: Fire and forget. Fast but may lose messages.
- At-least-once: Retry until acknowledged. May duplicate.
- Exactly-once: Hardest to achieve. Usually involves idempotency.
- Tools: Kafka (high throughput, log-based), RabbitMQ (flexible routing), SQS (managed), NATS (lightweight).
CDN (Content Delivery Network)#
Caches static assets at edge locations close to users. Reduces latency for global audiences.
- Cache HTML, CSS, JS, images, videos at edge PoPs
- Typical hit ratio: 90–99% for static content
- Origin shield reduces load on your origin server
- Tools: CloudFront, Cloudflare, Fastly, Akamai
Step-by-Step Framework#
Use this framework for every system design question.
Step 1: Clarify Requirements (3–5 minutes)#
- What are the core features? (functional requirements)
- What are the scale expectations? (users, requests/sec, data volume)
- What are the non-functional requirements? (latency, availability, consistency)
- What is NOT in scope?
Step 2: Back-of-Envelope Estimation (3–5 minutes)#
- Daily active users and requests per user
- Read-to-write ratio
- Storage needs over 5 years
- Bandwidth requirements
- Peak vs average traffic (typically 3–5x average)
Step 3: High-Level Design (10–15 minutes)#
- Draw the major components: clients, LB, app servers, cache, DB, queue, CDN
- Show the data flow for core operations
- Identify the primary data model and storage choices
- Call out the API endpoints
Step 4: Deep Dive (10–15 minutes)#
- Pick 2–3 components to detail based on the interviewer's interest
- Discuss trade-offs for each decision
- Address bottlenecks and how to mitigate them
- Show how the design handles failure
Step 5: Wrap Up (3–5 minutes)#
- Summarize trade-offs made
- Discuss monitoring and alerting
- Mention future improvements
Common Patterns Checklist#
Use these patterns as building blocks. Check which ones apply to your design.
- Read replicas — scale read-heavy workloads by replicating the database
- Sharding — partition data across multiple databases by a shard key
- CQRS — separate read and write models for different optimization
- Event sourcing — store state as a sequence of events, not current state
- Saga pattern — manage distributed transactions across services
- Circuit breaker — prevent cascading failures by failing fast
- Bulkhead — isolate components so one failure does not sink the ship
- Consistent hashing — distribute data across nodes with minimal redistribution on changes
- Fan-out on write — precompute feeds/timelines at write time (Twitter model)
- Fan-out on read — compute feeds at read time, cheaper for low-read users
- Rate limiting — protect services from abuse (token bucket, sliding window)
- Idempotency — make operations safe to retry without side effects
Scaling Checklist#
When the interviewer asks "how would you scale this?", walk through these layers:
Vertical scaling (scale up):
- Bigger machines, more RAM, faster SSDs
- Simple but has a ceiling
Horizontal scaling (scale out):
- Stateless app servers behind a load balancer
- Database read replicas for read-heavy workloads
- Sharding for write-heavy or large datasets
- Cache layer (Redis/Memcached) to reduce DB load
- CDN for static content
- Message queues to decouple and buffer writes
- Async processing for non-critical paths
Data layer scaling:
- Connection pooling (PgBouncer, ProxySQL)
- Query optimization and proper indexing
- Denormalization for read performance
- Partitioning large tables by date or range
- Archive cold data to object storage
Infrastructure:
- Multi-region deployment for availability and latency
- Auto-scaling groups based on CPU/memory/request metrics
- Blue-green or canary deployments for safe rollouts
Monitoring Checklist#
Every system design should address observability. Cover these areas:
The Four Golden Signals (Google SRE):
- Latency — response time for successful and failed requests
- Traffic — requests per second, concurrent connections
- Errors — 5xx rate, failed health checks, timeout rate
- Saturation — CPU, memory, disk, connection pool utilization
What to monitor:
- Application metrics: request rate, error rate, p50/p95/p99 latency
- Infrastructure metrics: CPU, memory, disk I/O, network
- Database metrics: query time, connection count, replication lag, cache hit ratio
- Queue metrics: depth, consumer lag, processing time
- Business metrics: signups, orders, revenue per minute
Alerting principles:
- Alert on symptoms (high error rate), not causes (high CPU)
- Use severity levels: page for critical, ticket for warning
- Avoid alert fatigue — every alert should be actionable
Tools: Prometheus + Grafana, Datadog, New Relic, AWS CloudWatch, PagerDuty for on-call.
Quick Reference Card#
Requirement --> Component
Static content --> CDN
Session/state --> Redis
Async processing --> Message Queue (Kafka/SQS)
Search --> Elasticsearch/OpenSearch
File storage --> S3/GCS/Blob Storage
Real-time updates --> WebSocket / SSE
Rate limiting --> API Gateway / Redis
Auth --> OAuth2 / JWT at gateway
Notifications --> Queue + worker + push service
Analytics --> Event stream + data warehouse
That wraps up article #280 on Codelit. If you found this useful, explore our growing library of 280 articles covering system design, infrastructure, and software engineering — browse all posts here.
Try it on Codelit
Chaos Mode
Simulate node failures and watch cascading impact across your architecture
AI Architecture Review
Get an AI audit covering security gaps, bottlenecks, and scaling risks
90+ Templates
Practice with real-world architectures — Uber, Netflix, Slack, and more
Related articles
Try these templates
Uber Real-Time Location System
Handles 5M+ GPS pings per second using H3 hexagonal geospatial indexing.
6 componentsE-Commerce Checkout System
Production checkout flow with Stripe payments, inventory management, and fraud detection.
11 componentsNotification System
Multi-channel notification platform with preferences, templating, and delivery tracking.
9 componentsBuild this architecture
Generate an interactive architecture for System Design Cheat Sheet in seconds.
Try it in Codelit →
Comments