system designsystem design encyclopediaarchitecturedistributed systemsreference

The System Design Encyclopedia: 250 Articles Covering Every Core Topic

March 29, 2026 11 min readBy Codelit Team Discussion

The System Design Encyclopedia#

250 articles. What started as a single post about load balancing has grown into a comprehensive system design library. This milestone article is your organized reference — every topic grouped by category so you can find exactly what you need.

By the Numbers#

Category	Articles	Coverage
Fundamentals	18	Core concepts every engineer needs
Distributed Systems	20	Consensus, messaging, failure handling
Architecture Patterns	19	Structural approaches for large systems
Interview Questions	18	Classic design problems with solutions
Infrastructure	18	Platform, deployment, and operations
Security	12	Auth, encryption, and threat mitigation
Data	15	Storage, processing, and pipelines
AI/ML Systems	14	Production ML and LLM infrastructure

Every article includes practical examples, trade-off analysis, and production recommendations. No filler — each piece targets a specific concept you will encounter in real systems or interviews.

Fundamentals#

The building blocks every engineer should know cold.

Load Balancing — distribute traffic across servers using round-robin, least connections, consistent hashing, and L4/L7 strategies
Caching — reduce latency and database load with Redis, Memcached, CDN caching, and cache invalidation patterns
Horizontal vs Vertical Scaling — when to scale up a single machine versus scaling out to many
CAP Theorem — the fundamental trade-off between consistency, availability, and partition tolerance
Latency and Throughput — measuring, benchmarking, and optimizing system performance
DNS and Networking — how requests travel from browser to server and back
API Design — REST, GraphQL, gRPC, and WebSocket patterns for clean interfaces
Rate Limiting — protecting systems from abuse with token bucket, sliding window, and distributed rate limiters
Idempotency — designing operations that are safe to retry without side effects
Pagination — cursor-based, offset, and keyset pagination for large result sets
Back-of-the-Envelope Estimation — quick math to validate system design decisions
Proxies and Reverse Proxies — forwarding requests, SSL termination, and traffic shaping
Content Delivery Networks — edge caching, cache invalidation, and global content distribution
Hashing Algorithms — MD5, SHA, and when to use cryptographic vs non-cryptographic hashes
TCP vs UDP — reliable vs fast delivery and when each protocol matters
HTTP/2 and HTTP/3 — multiplexing, server push, QUIC, and modern protocol improvements
Serialization Formats — JSON, Protocol Buffers, Avro, MessagePack, and trade-offs
Webhooks — push-based integrations, retry logic, and security considerations

Distributed Systems#

The hard problems that emerge when you add a network between components.

Consensus Algorithms — Raft, Paxos, and how distributed nodes agree on state
Service Discovery — how microservices find each other with Consul, etcd, ZooKeeper, and Kubernetes DNS
Distributed Transactions — two-phase commit, saga patterns, and eventual consistency
Event-Driven Architecture — using events to decouple services with Kafka, RabbitMQ, and SNS/SQS
Message Queues — reliable async communication between services
Leader Election — choosing a coordinator in a distributed cluster
Consistent Hashing — distributing data across nodes with minimal redistribution on changes
Vector Clocks and CRDTs — tracking causality and resolving conflicts without coordination
Gossip Protocols — how nodes share state in large decentralized clusters
Circuit Breakers — preventing cascading failures when downstream services degrade
Distributed Locking — coordinating exclusive access across multiple nodes with Redlock and ZooKeeper
Write-Ahead Logs — durability and replication through append-only log structures
Bulkhead Pattern — isolating failures to prevent system-wide outages
Backpressure — handling overload by signaling producers to slow down
Quorum Reads and Writes — tunable consistency with R + W > N guarantees
Crashing vs Byzantine Failures — failure models and what your system should tolerate
Cluster Membership — detecting joins, leaves, and failures in dynamic clusters
Partitioned Logs — Kafka-style ordered, durable, partitioned event streams
Anti-Entropy and Merkle Trees — detecting and repairing data inconsistencies between replicas
Conflict Resolution — last-write-wins, merge functions, and application-level strategies

Architecture Patterns#

Structural approaches for organizing large systems.

Microservices vs Monolith — when to split and when to stay together
CQRS — separating read and write models for performance and scalability
Event Sourcing — storing state as a sequence of events instead of current snapshots
Domain-Driven Design — bounded contexts, aggregates, and ubiquitous language
Hexagonal Architecture — ports and adapters for testable, framework-independent code
Strangler Fig Pattern — incrementally migrating from monolith to microservices
Sidecar and Ambassador Patterns — extending service functionality without code changes
API Gateway — centralized entry point for routing, auth, rate limiting, and transformation
BFF (Backend for Frontend) — tailored APIs for different client types
Saga Pattern — managing distributed transactions through orchestration or choreography
Cell-Based Architecture — isolating blast radius with independent, self-contained cells
Multi-Tenancy — sharing infrastructure between tenants with proper isolation
Feature Flags — decoupling deployment from release with progressive rollouts
Clean Architecture — dependency inversion and layered boundaries for maintainable code
Modular Monolith — monolith structure with clear module boundaries as a stepping stone
Outbox Pattern — reliable event publishing from transactional databases
Throttling and Debouncing — controlling request frequency at the application layer
Plugin Architecture — extensible systems with runtime-loadable modules
Service Mesh — infrastructure-layer networking with Istio, Linkerd, and Consul Connect

Interview Questions#

System design problems commonly asked in technical interviews.

Design a URL Shortener — hashing, base62 encoding, read-heavy optimization
Design a Chat System — WebSockets, message ordering, presence, and offline delivery
Design a Rate Limiter — algorithms, distributed coordination, and edge cases
Design a Notification System — multi-channel delivery, templating, preferences, and retries
Design a News Feed — fan-out on write vs read, ranking, and caching strategies
Design a Search Autocomplete — trie data structures, ranking, and latency optimization
Design a File Storage System — chunking, deduplication, metadata, and CDN distribution
Design a Metrics and Monitoring System — time-series storage, aggregation, and alerting
Design a Payment System — idempotency, state machines, reconciliation, and PCI compliance
Design a Video Streaming Platform — transcoding, adaptive bitrate, CDN, and DRM
Design a Ride-Sharing Service — geospatial indexing, matching, pricing, and ETA
Design a Distributed Cache — partitioning, eviction, replication, and consistency
Design a Web Crawler — politeness, deduplication, frontier management, and distributed crawling
Design a Ticket Booking System — seat locking, race conditions, overbooking prevention
Design a Social Graph — friend-of-friend queries, graph storage, and privacy controls
Design a Location-Based Service — geohashing, proximity search, and real-time tracking
Design a Collaborative Editor — operational transforms, CRDTs, and real-time sync
Design an Ad Serving System — auction mechanics, targeting, real-time bidding, and analytics

Infrastructure#

The platform layer that keeps everything running.

Kubernetes — container orchestration, pod networking, and autoscaling
CI/CD Pipelines — automated build, test, and deploy workflows
Infrastructure as Code — Terraform, Pulumi, and declarative infrastructure management
Container Networking — overlay networks, service mesh, and network policies
Observability — logs, metrics, traces, and the three pillars of understanding production
Chaos Engineering — intentionally breaking things to build resilience
Blue-Green and Canary Deployments — safe release strategies with instant rollback
Database Migration Strategies — zero-downtime schema changes and data migrations
Auto-Scaling — CPU, queue depth, and custom metric-based scaling policies
Connection Pooling — PgBouncer, ProxySQL, and managing database connections at scale
Edge Computing — moving compute closer to users for latency-sensitive workloads
GitOps — using Git as the single source of truth for infrastructure state
Service Level Objectives — defining SLIs, SLOs, and SLAs with error budgets
Incident Management — on-call rotations, runbooks, postmortems, and blameless culture
Load Testing — stress testing with k6, Locust, and Gatling to find breaking points
DNS and Traffic Management — weighted routing, failover, and geo-based DNS strategies
Serverless Architecture — Lambda, Cloud Functions, and event-driven compute without servers
Multi-Region Deployment — active-active, active-passive, and data replication across regions

Security#

Protecting systems, data, and users.

Zero Trust Architecture — never trust, always verify — identity-based security for every request
OAuth 2.0 and OIDC — modern authentication and authorization flows
API Security — protecting APIs with authentication, encryption, and input validation
Secrets Management — Vault, AWS Secrets Manager, and rotating credentials safely
DDoS Protection — rate limiting, WAF, and traffic scrubbing at scale
mTLS — mutual TLS for service-to-service encryption and authentication
RBAC and ABAC — role-based and attribute-based access control models
Supply Chain Security — securing dependencies, container images, and build pipelines
Data Encryption — at rest, in transit, and application-layer encryption patterns
CORS and CSP — browser security headers and cross-origin resource policies
Penetration Testing — methodologies, tools, and integrating security into CI/CD
JWT Security — token signing, rotation, revocation, and common pitfalls

Data#

Storage, processing, and movement of data at scale.

Data Partitioning and Sharding — hash, range, directory, and geo sharding strategies
Database Replication — leader-follower, multi-leader, and leaderless replication
SQL vs NoSQL — choosing the right data model for your access patterns
Time-Series Databases — storing and querying metrics, IoT, and financial data
Data Lakes and Warehouses — centralized analytics storage with Snowflake, BigQuery, and Delta Lake
Change Data Capture — streaming database changes with Debezium and Kafka Connect
ETL and Data Pipelines — batch and streaming data transformation workflows
Graph Databases — modeling relationships with Neo4j, Neptune, and Dgraph
Bloom Filters and Probabilistic Data Structures — space-efficient membership testing
LSM Trees and B-Trees — the storage engine foundations behind modern databases
Data Governance — lineage, cataloging, quality, and compliance at scale
Object Storage — S3, GCS, MinIO, and designing for unstructured data at petabyte scale
Full-Text Search — Elasticsearch, OpenSearch, and inverted index architectures
Data Versioning — tracking dataset changes for reproducibility and rollback
Stream Processing — Flink, Spark Streaming, and real-time event transformation

AI/ML Systems#

The infrastructure behind machine learning in production.

ML System Design — training pipelines, feature stores, model serving, and monitoring
RAG Architecture — retrieval-augmented generation for grounded LLM applications
Vector Databases — storing and querying embeddings with Pinecone, Weaviate, and pgvector
Feature Stores — centralized feature management for training and serving consistency
Model Serving — real-time inference, batching, A/B testing, and canary rollouts
LLM Infrastructure — hosting, fine-tuning, prompt management, and cost optimization
AI Gateway Patterns — routing, caching, fallback, and rate limiting for AI APIs
Embedding Pipelines — generating, storing, and indexing vector embeddings at scale
ML Observability — monitoring model performance, drift detection, and retraining triggers
GPU Infrastructure — scheduling, multi-tenancy, and cost optimization for training workloads
Data Labeling Pipelines — human-in-the-loop, active learning, and quality assurance
A/B Testing for ML — experiment design, statistical significance, and model comparison
Prompt Engineering Patterns — chain-of-thought, few-shot, and structured output techniques
AI Agent Architecture — tool use, planning loops, memory, and orchestration frameworks

How to Use This Encyclopedia#

If you are preparing for interviews: Start with Fundamentals, then work through the Interview Questions section. Use the Architecture Patterns and Distributed Systems categories to deepen your answers.

If you are building production systems: Jump to the specific topic you need. Each article includes practical code examples, trade-off analysis, and real-world recommendations.

If you are learning system design from scratch: Read Fundamentals front to back, then branch into whichever category interests you most.

Recommended Learning Paths#

Path 1: Interview Prep (4-6 weeks)#

Fundamentals (week 1-2) — load balancing, caching, CAP theorem, API design
Architecture Patterns (week 3) — microservices, CQRS, event sourcing
Distributed Systems (week 4) — consensus, consistent hashing, circuit breakers
Interview Questions (week 5-6) — practice end-to-end designs with trade-off discussions

Path 2: Production Engineering (ongoing)#

Infrastructure — Kubernetes, CI/CD, observability, auto-scaling
Security — zero trust, mTLS, secrets management
Data — partitioning, replication, CDC, stream processing
Distributed Systems — deep dive into failure modes and recovery

Path 3: AI/ML Engineering#

Fundamentals — API design, caching, rate limiting
Data — vector databases, search, stream processing
AI/ML Systems — RAG, model serving, embedding pipelines, AI gateways
Infrastructure — GPU scheduling, serverless, observability

What Comes Next#

250 articles is a milestone, not a finish line. System design evolves as infrastructure evolves — new patterns emerge, old patterns get refined, and the community keeps pushing the boundaries of what distributed systems can do.

The next 250 will go deeper: more production war stories, more code-level implementations, more diagrams, and more coverage of the AI/ML infrastructure wave reshaping how we build systems.

250 articles on system design at codelit.io/blog.

Try it on Codelit

Chaos Mode

Simulate node failures and watch cascading impact across your architecture

Build this architecture →

Comments

AI agents

Context Engineering for Agentic Systems

2 min read

AI agents

AI Agent Memory Architecture

2 min read

AI agents

Production AI Agent Deployment Checklist

2 min read

Try these templates

Uber Real-Time Location System

Handles 5M+ GPS pings per second using H3 hexagonal geospatial indexing.

6 components

Netflix Video Streaming Architecture

Global video streaming platform with adaptive bitrate, CDN distribution, and recommendation engine.

10 components

E-Commerce Checkout System

Production checkout flow with Stripe payments, inventory management, and fraud detection.

11 components

Build this architecture

Generate an interactive architecture for The System Design Encyclopedia in seconds.

Try it in Codelit →

system designsystem design encyclopediaarchitecturedistributed systemsreference

The System Design Encyclopedia: 250 Articles Covering Every Core Topic

March 29, 2026 11 min readBy Codelit Team Discussion

The System Design Encyclopedia#

By the Numbers#

Category	Articles	Coverage
Fundamentals	18	Core concepts every engineer needs
Distributed Systems	20	Consensus, messaging, failure handling
Architecture Patterns	19	Structural approaches for large systems
Interview Questions	18	Classic design problems with solutions
Infrastructure	18	Platform, deployment, and operations
Security	12	Auth, encryption, and threat mitigation
Data	15	Storage, processing, and pipelines
AI/ML Systems	14	Production ML and LLM infrastructure

Every article includes practical examples, trade-off analysis, and production recommendations. No filler — each piece targets a specific concept you will encounter in real systems or interviews.

Fundamentals#

The building blocks every engineer should know cold.

Load Balancing — distribute traffic across servers using round-robin, least connections, consistent hashing, and L4/L7 strategies
Caching — reduce latency and database load with Redis, Memcached, CDN caching, and cache invalidation patterns
Horizontal vs Vertical Scaling — when to scale up a single machine versus scaling out to many
CAP Theorem — the fundamental trade-off between consistency, availability, and partition tolerance
Latency and Throughput — measuring, benchmarking, and optimizing system performance
DNS and Networking — how requests travel from browser to server and back
API Design — REST, GraphQL, gRPC, and WebSocket patterns for clean interfaces
Rate Limiting — protecting systems from abuse with token bucket, sliding window, and distributed rate limiters
Idempotency — designing operations that are safe to retry without side effects
Pagination — cursor-based, offset, and keyset pagination for large result sets
Back-of-the-Envelope Estimation — quick math to validate system design decisions
Proxies and Reverse Proxies — forwarding requests, SSL termination, and traffic shaping
Content Delivery Networks — edge caching, cache invalidation, and global content distribution
Hashing Algorithms — MD5, SHA, and when to use cryptographic vs non-cryptographic hashes
TCP vs UDP — reliable vs fast delivery and when each protocol matters
HTTP/2 and HTTP/3 — multiplexing, server push, QUIC, and modern protocol improvements
Serialization Formats — JSON, Protocol Buffers, Avro, MessagePack, and trade-offs
Webhooks — push-based integrations, retry logic, and security considerations

Distributed Systems#

The hard problems that emerge when you add a network between components.

Consensus Algorithms — Raft, Paxos, and how distributed nodes agree on state
Service Discovery — how microservices find each other with Consul, etcd, ZooKeeper, and Kubernetes DNS
Distributed Transactions — two-phase commit, saga patterns, and eventual consistency
Event-Driven Architecture — using events to decouple services with Kafka, RabbitMQ, and SNS/SQS
Message Queues — reliable async communication between services
Leader Election — choosing a coordinator in a distributed cluster
Consistent Hashing — distributing data across nodes with minimal redistribution on changes
Vector Clocks and CRDTs — tracking causality and resolving conflicts without coordination
Gossip Protocols — how nodes share state in large decentralized clusters
Circuit Breakers — preventing cascading failures when downstream services degrade
Distributed Locking — coordinating exclusive access across multiple nodes with Redlock and ZooKeeper
Write-Ahead Logs — durability and replication through append-only log structures
Bulkhead Pattern — isolating failures to prevent system-wide outages
Backpressure — handling overload by signaling producers to slow down
Quorum Reads and Writes — tunable consistency with R + W > N guarantees
Crashing vs Byzantine Failures — failure models and what your system should tolerate
Cluster Membership — detecting joins, leaves, and failures in dynamic clusters
Partitioned Logs — Kafka-style ordered, durable, partitioned event streams
Anti-Entropy and Merkle Trees — detecting and repairing data inconsistencies between replicas
Conflict Resolution — last-write-wins, merge functions, and application-level strategies

Architecture Patterns#

Structural approaches for organizing large systems.

Microservices vs Monolith — when to split and when to stay together
CQRS — separating read and write models for performance and scalability
Event Sourcing — storing state as a sequence of events instead of current snapshots
Domain-Driven Design — bounded contexts, aggregates, and ubiquitous language
Hexagonal Architecture — ports and adapters for testable, framework-independent code
Strangler Fig Pattern — incrementally migrating from monolith to microservices
Sidecar and Ambassador Patterns — extending service functionality without code changes
API Gateway — centralized entry point for routing, auth, rate limiting, and transformation
BFF (Backend for Frontend) — tailored APIs for different client types
Saga Pattern — managing distributed transactions through orchestration or choreography
Cell-Based Architecture — isolating blast radius with independent, self-contained cells
Multi-Tenancy — sharing infrastructure between tenants with proper isolation
Feature Flags — decoupling deployment from release with progressive rollouts
Clean Architecture — dependency inversion and layered boundaries for maintainable code
Modular Monolith — monolith structure with clear module boundaries as a stepping stone
Outbox Pattern — reliable event publishing from transactional databases
Throttling and Debouncing — controlling request frequency at the application layer
Plugin Architecture — extensible systems with runtime-loadable modules
Service Mesh — infrastructure-layer networking with Istio, Linkerd, and Consul Connect

Interview Questions#

System design problems commonly asked in technical interviews.

Design a URL Shortener — hashing, base62 encoding, read-heavy optimization
Design a Chat System — WebSockets, message ordering, presence, and offline delivery
Design a Rate Limiter — algorithms, distributed coordination, and edge cases
Design a Notification System — multi-channel delivery, templating, preferences, and retries
Design a News Feed — fan-out on write vs read, ranking, and caching strategies
Design a Search Autocomplete — trie data structures, ranking, and latency optimization
Design a File Storage System — chunking, deduplication, metadata, and CDN distribution
Design a Metrics and Monitoring System — time-series storage, aggregation, and alerting
Design a Payment System — idempotency, state machines, reconciliation, and PCI compliance
Design a Video Streaming Platform — transcoding, adaptive bitrate, CDN, and DRM
Design a Ride-Sharing Service — geospatial indexing, matching, pricing, and ETA
Design a Distributed Cache — partitioning, eviction, replication, and consistency
Design a Web Crawler — politeness, deduplication, frontier management, and distributed crawling
Design a Ticket Booking System — seat locking, race conditions, overbooking prevention
Design a Social Graph — friend-of-friend queries, graph storage, and privacy controls
Design a Location-Based Service — geohashing, proximity search, and real-time tracking
Design a Collaborative Editor — operational transforms, CRDTs, and real-time sync
Design an Ad Serving System — auction mechanics, targeting, real-time bidding, and analytics

Infrastructure#

The platform layer that keeps everything running.

Kubernetes — container orchestration, pod networking, and autoscaling
CI/CD Pipelines — automated build, test, and deploy workflows
Infrastructure as Code — Terraform, Pulumi, and declarative infrastructure management
Container Networking — overlay networks, service mesh, and network policies
Observability — logs, metrics, traces, and the three pillars of understanding production
Chaos Engineering — intentionally breaking things to build resilience
Blue-Green and Canary Deployments — safe release strategies with instant rollback
Database Migration Strategies — zero-downtime schema changes and data migrations
Auto-Scaling — CPU, queue depth, and custom metric-based scaling policies
Connection Pooling — PgBouncer, ProxySQL, and managing database connections at scale
Edge Computing — moving compute closer to users for latency-sensitive workloads
GitOps — using Git as the single source of truth for infrastructure state
Service Level Objectives — defining SLIs, SLOs, and SLAs with error budgets
Incident Management — on-call rotations, runbooks, postmortems, and blameless culture
Load Testing — stress testing with k6, Locust, and Gatling to find breaking points
DNS and Traffic Management — weighted routing, failover, and geo-based DNS strategies
Serverless Architecture — Lambda, Cloud Functions, and event-driven compute without servers
Multi-Region Deployment — active-active, active-passive, and data replication across regions

Security#

Protecting systems, data, and users.

Zero Trust Architecture — never trust, always verify — identity-based security for every request
OAuth 2.0 and OIDC — modern authentication and authorization flows
API Security — protecting APIs with authentication, encryption, and input validation
Secrets Management — Vault, AWS Secrets Manager, and rotating credentials safely
DDoS Protection — rate limiting, WAF, and traffic scrubbing at scale
mTLS — mutual TLS for service-to-service encryption and authentication
RBAC and ABAC — role-based and attribute-based access control models
Supply Chain Security — securing dependencies, container images, and build pipelines
Data Encryption — at rest, in transit, and application-layer encryption patterns
CORS and CSP — browser security headers and cross-origin resource policies
Penetration Testing — methodologies, tools, and integrating security into CI/CD
JWT Security — token signing, rotation, revocation, and common pitfalls

Data#

Storage, processing, and movement of data at scale.

Data Partitioning and Sharding — hash, range, directory, and geo sharding strategies
Database Replication — leader-follower, multi-leader, and leaderless replication
SQL vs NoSQL — choosing the right data model for your access patterns
Time-Series Databases — storing and querying metrics, IoT, and financial data
Data Lakes and Warehouses — centralized analytics storage with Snowflake, BigQuery, and Delta Lake
Change Data Capture — streaming database changes with Debezium and Kafka Connect
ETL and Data Pipelines — batch and streaming data transformation workflows
Graph Databases — modeling relationships with Neo4j, Neptune, and Dgraph
Bloom Filters and Probabilistic Data Structures — space-efficient membership testing
LSM Trees and B-Trees — the storage engine foundations behind modern databases
Data Governance — lineage, cataloging, quality, and compliance at scale
Object Storage — S3, GCS, MinIO, and designing for unstructured data at petabyte scale
Full-Text Search — Elasticsearch, OpenSearch, and inverted index architectures
Data Versioning — tracking dataset changes for reproducibility and rollback
Stream Processing — Flink, Spark Streaming, and real-time event transformation

AI/ML Systems#

The infrastructure behind machine learning in production.

ML System Design — training pipelines, feature stores, model serving, and monitoring
RAG Architecture — retrieval-augmented generation for grounded LLM applications
Vector Databases — storing and querying embeddings with Pinecone, Weaviate, and pgvector
Feature Stores — centralized feature management for training and serving consistency
Model Serving — real-time inference, batching, A/B testing, and canary rollouts
LLM Infrastructure — hosting, fine-tuning, prompt management, and cost optimization
AI Gateway Patterns — routing, caching, fallback, and rate limiting for AI APIs
Embedding Pipelines — generating, storing, and indexing vector embeddings at scale
ML Observability — monitoring model performance, drift detection, and retraining triggers
GPU Infrastructure — scheduling, multi-tenancy, and cost optimization for training workloads
Data Labeling Pipelines — human-in-the-loop, active learning, and quality assurance
A/B Testing for ML — experiment design, statistical significance, and model comparison
Prompt Engineering Patterns — chain-of-thought, few-shot, and structured output techniques
AI Agent Architecture — tool use, planning loops, memory, and orchestration frameworks

How to Use This Encyclopedia#

If you are building production systems: Jump to the specific topic you need. Each article includes practical code examples, trade-off analysis, and real-world recommendations.

If you are learning system design from scratch: Read Fundamentals front to back, then branch into whichever category interests you most.

Recommended Learning Paths#

Path 1: Interview Prep (4-6 weeks)#

Fundamentals (week 1-2) — load balancing, caching, CAP theorem, API design
Architecture Patterns (week 3) — microservices, CQRS, event sourcing
Distributed Systems (week 4) — consensus, consistent hashing, circuit breakers
Interview Questions (week 5-6) — practice end-to-end designs with trade-off discussions

Path 2: Production Engineering (ongoing)#

Infrastructure — Kubernetes, CI/CD, observability, auto-scaling
Security — zero trust, mTLS, secrets management
Data — partitioning, replication, CDC, stream processing
Distributed Systems — deep dive into failure modes and recovery

Path 3: AI/ML Engineering#

Fundamentals — API design, caching, rate limiting
Data — vector databases, search, stream processing
AI/ML Systems — RAG, model serving, embedding pipelines, AI gateways
Infrastructure — GPU scheduling, serverless, observability

What Comes Next#

The next 250 will go deeper: more production war stories, more code-level implementations, more diagrams, and more coverage of the AI/ML infrastructure wave reshaping how we build systems.

250 articles on system design at codelit.io/blog.

Try it on Codelit

Chaos Mode

Simulate node failures and watch cascading impact across your architecture

Build this architecture →

Comments

AI agents

Build this architecture

Generate an interactive architecture for The System Design Encyclopedia in seconds.

Try it in Codelit →

The System Design Encyclopedia: 250 Articles Covering Every Core Topic

The System Design Encyclopedia#

By the Numbers#

Fundamentals#

Distributed Systems#

Architecture Patterns#

Interview Questions#

Infrastructure#

Security#

Data#

AI/ML Systems#

How to Use This Encyclopedia#

Recommended Learning Paths#

Path 1: Interview Prep (4-6 weeks)#

Path 2: Production Engineering (ongoing)#

Path 3: AI/ML Engineering#

What Comes Next#

Comments

Related articles

Context Engineering for Agentic Systems

AI Agent Memory Architecture

Production AI Agent Deployment Checklist

Try these templates

Uber Real-Time Location System

Netflix Video Streaming Architecture

E-Commerce Checkout System

Build this architecture

The System Design Encyclopedia: 250 Articles Covering Every Core Topic

The System Design Encyclopedia#

By the Numbers#

Fundamentals#

Distributed Systems#

Architecture Patterns#

Interview Questions#

Infrastructure#

Security#

Data#

AI/ML Systems#

How to Use This Encyclopedia#

Recommended Learning Paths#

Path 1: Interview Prep (4-6 weeks)#

Path 2: Production Engineering (ongoing)#

Path 3: AI/ML Engineering#

What Comes Next#

Comments

Related articles

Context Engineering for Agentic Systems

AI Agent Memory Architecture

Production AI Agent Deployment Checklist

Try these templates

Uber Real-Time Location System

Netflix Video Streaming Architecture

E-Commerce Checkout System

Build this architecture