12 Design Patterns for Building Scalable Systems
12 Design Patterns for Building Scalable Systems#
These are the patterns you see in every large-scale system — from Netflix to Stripe to Uber. Each solves a specific scaling challenge. Learn when to use each and when to avoid it.
1. CQRS (Command Query Responsibility Segregation)#
Separate read and write models:
Writes: API → Command Handler → Write DB (normalized)
↓ events
Reads: API → Query Handler → Read DB (denormalized views)
When: Reads and writes have very different patterns (e.g., 1000:1 read-to-write ratio). Each can scale independently.
Example: E-commerce product catalog. Writes are rare (admin updates), reads are massive (millions of shoppers).
2. Event Sourcing#
Store events, not state:
Event Store:
OrderCreated { items: [...], total: 100 }
PaymentProcessed { amount: 100 }
ItemShipped { tracking: "XY123" }
Current state = replay all events
When: You need a complete audit trail, time-travel debugging, or the ability to rebuild state from scratch.
Example: Financial systems, order processing, compliance-heavy domains.
3. Saga Pattern#
Manage distributed transactions without 2PC:
Choreography:
Order → Payment → Inventory → Shipping
(each service reacts to the previous event)
Orchestration:
Saga Coordinator tells each service what to do
On failure → compensating actions (refund, unreserve)
When: Transactions span multiple services and you need eventual consistency with rollback capability.
4. Circuit Breaker#
Fail fast instead of waiting for broken services:
Closed → errors exceed threshold → Open (fail fast for 30s)
Open → timer expires → Half-Open (try one request)
Half-Open → success → Closed | failure → Open
When: Calling external services or downstream microservices that may be unreliable.
5. Bulkhead#
Isolate failures by compartmentalizing resources:
Thread Pool A (20) → Payment Service
Thread Pool B (10) → Email Service (if slow, only Pool B affected)
Thread Pool C (5) → Analytics (non-critical)
When: Different services have different criticality levels. A slow analytics call shouldn't block payments.
6. Sidecar Pattern#
Deploy helper functionality alongside your service:
┌─────────────────────┐
│ Your Service │
│ (business logic) │
│ │
│ ┌──────────────────┐ │
│ │ Sidecar (Envoy) │ │
│ │ - mTLS │ │
│ │ - Logging │ │
│ │ - Metrics │ │
│ │ - Retries │ │
│ └──────────────────┘ │
└─────────────────────┘
When: You want cross-cutting concerns (security, observability) without changing application code. Foundation of service mesh (Istio, Linkerd).
7. Strangler Fig#
Incrementally migrate from monolith to microservices:
Phase 1: All traffic → Monolith
Phase 2: /users → New User Service | rest → Monolith
Phase 3: /orders → New Order Service | /users → User Service | rest → Monolith
Phase N: Monolith is empty → delete it
When: Migrating a legacy system. Never rewrite from scratch — strangle it gradually.
8. API Gateway / BFF#
Single entry point for all clients:
Mobile → Mobile BFF → aggregates 3 services → optimized response
Web → Web BFF → aggregates 5 services → full response
When: Multiple clients need different views of the same data. Reduces client-side complexity and network calls.
9. Database per Service#
Each microservice owns its data:
User Service → Users DB (PostgreSQL)
Order Service → Orders DB (PostgreSQL)
Search Service → Search Index (Elasticsearch)
Cache Service → Cache (Redis)
When: Services need independent scaling and deployment. No shared database = no coupling.
Trade-off: Cross-service queries require API calls or events, not JOINs.
10. Outbox Pattern#
Reliably publish events when database changes:
Transaction:
1. INSERT INTO orders (id, ...) VALUES (...)
2. INSERT INTO outbox (event_type, payload) VALUES ('OrderCreated', {...})
Background worker:
Poll outbox → publish to Kafka → mark as published
When: You need to update a database AND publish an event atomically. Prevents the dual-write problem.
11. Materialized View#
Pre-compute query results for fast reads:
Write: Order created → event → Materializer
→ Updates "dashboard_summary" table:
total_orders: 1542
revenue_today: $45,230
avg_order_value: $29.33
Read: SELECT * FROM dashboard_summary (1ms, no JOINs)
When: Complex queries are too slow to run in real-time. Pre-compute and serve from a denormalized view.
12. Backpressure#
Slow down producers when consumers can't keep up:
Without backpressure:
Producer (1000/sec) → Queue (growing!) → Consumer (100/sec) → OOM crash
With backpressure:
Producer → Queue (near capacity) → signal producer to slow down
Producer (100/sec) → Queue (stable) → Consumer (100/sec) ✓
When: Producers can generate data faster than consumers can process. Essential for streaming systems.
Pattern Decision Matrix#
| Problem | Pattern |
|---|---|
| Read/write have different needs | CQRS |
| Need complete audit trail | Event Sourcing |
| Transaction across services | Saga |
| Downstream service unreliable | Circuit Breaker |
| Isolate critical from non-critical | Bulkhead |
| Cross-cutting concerns (TLS, logging) | Sidecar |
| Migrating from monolith | Strangler Fig |
| Multiple clients, different needs | BFF / API Gateway |
| Services need independence | Database per Service |
| DB update + event atomically | Outbox |
| Slow complex queries | Materialized View |
| Producers faster than consumers | Backpressure |
Architecture: Putting It Together#
Scalable E-Commerce Platform#
Mobile App → API Gateway (BFF pattern)
Web App → API Gateway
API Gateway → Circuit Breaker → Product Service (CQRS)
→ Write Model (PostgreSQL)
→ Read Model (Elasticsearch)
→ Circuit Breaker → Order Service (Saga + Event Sourcing)
→ Event Store
→ Outbox → Kafka
→ Circuit Breaker → Payment Service (Bulkhead isolated)
Kafka → Materialized View updater → Dashboard DB
→ Analytics pipeline → ClickHouse
Service Mesh (Sidecar): mTLS, retries, observability
Anti-Patterns#
- Using all 12 patterns at once — start simple, add patterns when you feel pain
- Event sourcing everywhere — most services don't need it (CRUD is fine)
- Saga without compensation — if you can't rollback, don't use a saga
- CQRS for simple CRUD — adds complexity with no benefit for simple apps
- Premature microservices — start monolith, extract when teams grow
Summary#
Start with the simplest architecture that works. Add patterns when you hit specific scaling problems:
- First scaling pain: Add caching + read replicas
- Service independence: Database per service + API gateway
- Reliability: Circuit breaker + bulkhead + retries
- Async processing: Event-driven + outbox + Kafka
- Complex domains: CQRS + event sourcing + sagas
- Migration: Strangler fig (never rewrite)
Design scalable architectures at codelit.io — 100 product specs, 90+ templates, 7 audit tools, 29 export formats. From specs to production.
Try it on Codelit
Chaos Mode
Simulate node failures and watch cascading impact across your architecture
Related articles
AI Agent Tool Use Architecture: Function Calling, ReAct Loops & Structured Outputs
6 min read
AI searchAI-Powered Search Architecture: Semantic Search, Hybrid Search, and RAG
8 min read
AI safetyAI Safety Guardrails Architecture: Input Validation, Output Filtering, and Human-in-the-Loop
8 min read
Try these templates
Scalable SaaS Application
Modern SaaS with microservices, event-driven processing, and multi-tenant architecture.
10 componentsNetflix Video Streaming Architecture
Global video streaming platform with adaptive bitrate, CDN distribution, and recommendation engine.
10 componentsSearch Engine Architecture
Web-scale search with crawling, indexing, ranking, and sub-second query serving.
8 componentsBuild this architecture
Generate an interactive architecture for 12 Design Patterns for Building Scalable Systems in seconds.
Try it in Codelit →
Comments