The Complete System Design Reference: 350-Article Library, Learning Roadmap & Interview Strategy
This is article number 350. What started as a handful of system design explainers has grown into a comprehensive library covering every major area of distributed systems, software architecture, and infrastructure. This capstone article organizes the entire collection into a structured learning roadmap, interview strategy, and practice plan.
The Library at a Glance#
350 articles across 10 major categories. Below is a guided tour of the library with top picks from each category.
1. Fundamentals of Distributed Systems#
The building blocks: CAP theorem, consistency models, failure modes, and distributed consensus.
- CAP Theorem Explained — Consistency, Availability, Partition Tolerance
- Consensus Algorithms — Raft, Paxos, and Practical BFT
- Distributed Clocks — Lamport Timestamps and Vector Clocks
- Failure Detection — Heartbeats, Phi Accrual, and SWIM
- Consistency Models — Eventual, Causal, Strong, and Linearizability
- The Two Generals Problem and Its Real-World Implications
- Split-Brain Resolution Strategies
Start here if you are new to distributed systems. These concepts appear in every system design interview.
2. Databases and Storage#
From relational databases to distributed key-value stores, time-series databases, and data lakes.
- SQL vs NoSQL — When to Choose What
- Database Sharding Strategies — Hash, Range, and Directory-Based
- Write-Ahead Logging and Crash Recovery
- LSM Trees vs B-Trees — Storage Engine Internals
- Distributed Transactions — Two-Phase Commit and Saga Pattern
- Time-Series Databases — InfluxDB, TimescaleDB, and Prometheus
- Data Lake Architecture — Medallion Pattern and Lakehouse
- Change Data Capture with Debezium
Key insight: most system design questions revolve around data modeling and storage tradeoffs. Master this category thoroughly.
3. Caching and Performance#
Caching layers, eviction policies, cache invalidation, and CDN architecture.
- Caching Strategies — Cache-Aside, Write-Through, Write-Behind
- Redis Deep Dive — Data Structures, Persistence, and Clustering
- Cache Invalidation — The Hardest Problem in Computer Science
- CDN Architecture — Edge Caching, Origin Shielding, and Purging
- Bloom Filters for Cache Optimization
- Connection Pooling and Keep-Alive Strategies
Interview tip: always discuss caching in your design. It shows you think about latency and cost.
4. Networking and Protocols#
HTTP, gRPC, WebSockets, DNS, load balancing, and service mesh.
- HTTP/2 and HTTP/3 — Multiplexing, Header Compression, and QUIC
- gRPC — Protocol Buffers, Streaming, and Deadlines
- WebSocket Architecture for Real-Time Systems
- DNS Architecture — Resolution, Caching, and GeoDNS
- Load Balancing — L4 vs L7, Consistent Hashing, and Health Checks
- Service Mesh — Istio, Linkerd, and Sidecar Proxy Pattern
- API Gateway Patterns — Rate Limiting, Auth, and Routing
5. Messaging and Event-Driven Architecture#
Queues, streams, event sourcing, and pub/sub at scale.
- Apache Kafka — Partitions, Consumer Groups, and Exactly-Once
- Event Sourcing and CQRS — When and How
- Message Queue Comparison — RabbitMQ, SQS, Kafka, and Pulsar
- Dead Letter Queues and Retry Strategies
- Saga Pattern for Distributed Transactions
- Schema Registry and Schema Evolution
- Idempotency in Event-Driven Systems
Key insight: event-driven architecture is the backbone of modern microservices. Every senior engineer should be fluent here.
6. System Design Interviews#
Concrete system designs: URL shortener to global-scale social networks.
- Designing a URL Shortener — End to End
- Designing a Chat System — WhatsApp Scale
- Designing a News Feed — Facebook and Twitter
- Designing a Rate Limiter — Token Bucket to Sliding Window
- Designing a Notification System — Push, Email, and SMS
- Designing a Search Autocomplete System
- Designing a Video Streaming Platform — Netflix Architecture
- Designing a Ride-Sharing Platform — Uber and Lyft
- Designing a Distributed File Storage — Google Drive
- Designing a Payment System — Stripe Architecture
Practice plan: work through two designs per week. Sketch the architecture, identify bottlenecks, then read the article to compare.
7. Infrastructure and DevOps#
Containers, orchestration, CI/CD, infrastructure as code, and cloud-native patterns.
- Kubernetes Architecture — Pods, Services, and the Control Plane
- Container Networking — CNI, Service Discovery, and DNS
- CI/CD Pipeline Design — GitHub Actions, ArgoCD, and Flux
- Infrastructure as Code — Terraform, Pulumi, and CDK
- GitOps — Principles, Tools, and Production Patterns
- Blue-Green and Canary Deployment Strategies
- Feature Flags and Progressive Rollouts
8. Observability and Reliability#
Monitoring, tracing, logging, SLOs, incident response, and chaos engineering.
- The Three Pillars of Observability — Logs, Metrics, and Traces
- OpenTelemetry Instrumentation Guide
- SLOs, SLIs, and Error Budgets — A Practical Guide
- Distributed Tracing — Jaeger, Zipkin, and Tempo
- Chaos Engineering — Principles, Tools, and Game Days
- On-Call Best Practices and Incident Response
- Alerting Strategy — Signal vs Noise
9. Security and Authentication#
Auth protocols, encryption, zero trust, and secure system design.
- OAuth 2.0 and OpenID Connect — The Complete Guide
- JWT — Structure, Signing, Validation, and Common Pitfalls
- Zero Trust Architecture — Beyond the Perimeter
- API Security — OWASP Top 10 for APIs
- Encryption at Rest and in Transit
- Secrets Management — Vault, AWS Secrets Manager, and SOPS
- Rate Limiting and DDoS Mitigation
10. Architecture Patterns and Principles#
Microservices, monoliths, domain-driven design, and emerging patterns.
- Microservices vs Monolith — A Practical Decision Framework
- Domain-Driven Design — Bounded Contexts and Aggregates
- Strangler Fig Pattern — Incremental Migration
- CQRS — Command Query Responsibility Segregation
- Hexagonal Architecture — Ports and Adapters
- Cell-Based Architecture for Blast Radius Reduction
- Multi-Tenancy Patterns — Shared vs Isolated
The Learning Roadmap#
Phase 1: Foundations (Weeks 1-4)#
Focus on categories 1, 2, and 3. Build a solid mental model of distributed systems, understand storage tradeoffs, and learn caching patterns. These three areas form the foundation for every system design discussion.
Phase 2: Communication and Events (Weeks 5-8)#
Move to categories 4 and 5. Understand how services communicate — synchronous (HTTP, gRPC) vs asynchronous (Kafka, queues). This is where most architectural decisions diverge.
Phase 3: Real Designs (Weeks 9-14)#
Work through category 6 systematically. For each design problem: spend 45 minutes sketching your own solution before reading the article. Compare your approach and note what you missed.
Phase 4: Production Engineering (Weeks 15-18)#
Cover categories 7, 8, and 9. These topics separate junior from senior engineers. Understanding deployment, observability, and security shows production maturity.
Phase 5: Architecture Mastery (Weeks 19-20)#
Finish with category 10. At this point you have enough context to appreciate the tradeoffs between architectural styles.
Interview Strategy#
Before the Interview#
- Pick 10 system designs from category 6 and practice them end to end
- Build a template: requirements, estimation, API design, data model, high-level architecture, deep dive, bottlenecks
- Prepare tradeoff discussions — interviewers care more about why than what
During the Interview#
- Clarify requirements — spend the first 3-5 minutes asking questions
- Start with the happy path — get a working design on the board before optimizing
- Quantify — back-of-envelope calculations show engineering rigor
- Discuss tradeoffs explicitly — "We could use X, which gives us Y but costs Z"
- Address failure modes — what happens when a node goes down, a network partitions, or traffic spikes 10x
Common Mistakes#
- Jumping into the solution without gathering requirements
- Over-engineering for scale that does not exist
- Ignoring data consistency requirements
- Forgetting about operational concerns (monitoring, deployment, rollback)
Practice Plan#
| Week | Focus | Activity |
|---|---|---|
| 1-2 | Fundamentals | Read 5 articles daily from categories 1-3 |
| 3-4 | Deep dives | Pick 3 topics and write your own summaries |
| 5-8 | Design practice | 2 full system designs per week (45 min each) |
| 9-10 | Mock interviews | Practice with a partner using random problems |
| 11-12 | Weak areas | Revisit topics you struggled with |
What Comes Next#
350 articles is a milestone, not a finish line. Distributed systems continue to evolve — new consensus protocols, new database architectures, new infrastructure primitives. The library will keep growing.
If you have read even a fraction of these articles and practiced the designs, you are well-prepared — not just for interviews, but for building real systems at scale.
350 articles on system design at codelit.io/blog.
Try it on Codelit
Chaos Mode
Simulate node failures and watch cascading impact across your architecture
90+ Templates
Practice with real-world architectures — Uber, Netflix, Slack, and more
Related articles
Try these templates
Uber Real-Time Location System
Handles 5M+ GPS pings per second using H3 hexagonal geospatial indexing.
6 componentsE-Commerce Checkout System
Production checkout flow with Stripe payments, inventory management, and fraud detection.
11 componentsNotification System
Multi-channel notification platform with preferences, templating, and delivery tracking.
9 componentsBuild this architecture
Generate an interactive architecture for The Complete System Design Reference in seconds.
Try it in Codelit →
Comments