Design a Key-Value Store — From Simple Hash Map to Distributed Database
Start simple, then distribute#
A key-value store is the simplest database: given a key, return a value. But scaling it to handle millions of requests per second across multiple data centers is one of the hardest distributed systems problems.
This is a classic system design interview question because it covers partitioning, replication, consistency, and failure handling.
Single-server key-value store#
Start with a hash map in memory:
store = {}
def put(key, value):
store[key] = value
def get(key):
return store.get(key)
This handles thousands of requests per second on a single machine. But it has limits:
- Memory — can't store more data than RAM
- Availability — one server crash = total data loss
- Throughput — one CPU handles all requests
Distributing across servers#
Partitioning (sharding)#
Split the key space across N servers using consistent hashing:
hash("user:123") % N → Server 2
hash("user:456") % N → Server 0
Each server owns a range of the hash ring. Use consistent hashing so adding/removing servers only moves ~1/N of keys.
Replication#
Copy each key to R replicas on different servers for durability:
- Leader-based: One primary handles writes, replicates to followers
- Leaderless: Any replica can accept writes (Dynamo-style)
Consistency models#
| Model | Guarantee | Latency | Used by |
|---|---|---|---|
| Strong | Read always returns latest write | High (quorum) | etcd, ZooKeeper |
| Eventual | Reads may be stale temporarily | Low | DynamoDB, Cassandra |
| Causal | Respects happens-before ordering | Medium | MongoDB |
Quorum reads/writes: With N replicas, W write acks, R read acks:
- W + R > N → strong consistency
- W=1, R=1 → fastest but weakest
Write path#
- Client sends
PUT(key, value)to any node (coordinator) - Coordinator hashes key → determines responsible partition
- Coordinator forwards to primary of that partition
- Primary writes to local storage + write-ahead log
- Primary replicates to R-1 followers
- After W acks received → success returned to client
Read path#
- Client sends
GET(key)to coordinator - Coordinator routes to partition owning that key
- Reads from R replicas (or just one, depending on consistency level)
- Returns value (or the most recent version if reading multiple replicas)
Conflict resolution#
With leaderless replication, two clients can write different values for the same key simultaneously.
Last-writer-wins (LWW): Attach a timestamp. Latest timestamp wins. Simple but can lose data.
Vector clocks: Track causal relationships. Detect conflicts and let the application resolve them.
CRDTs: Conflict-free replicated data types. Data structures that can be merged without conflicts (counters, sets, registers).
Storage engine#
Two main approaches for on-disk storage:
LSM Tree (Log-Structured Merge): Write-optimized. All writes go to an in-memory table (memtable), then flush to sorted files (SSTables). Compaction merges files periodically. Used by LevelDB, RocksDB, Cassandra.
B-Tree: Read-optimized. Traditional balanced tree on disk. In-place updates. Used by PostgreSQL, MySQL, etcd.
Failure handling#
- Node failure — replicas serve reads; replication catches up when node recovers
- Network partition — choose availability (AP) or consistency (CP) per CAP theorem
- Data corruption — checksums on every block; replicas provide redundancy
- Hotspot keys — cache popular keys; split hot partitions
Real-world examples#
| System | Model | Consistency | Storage |
|---|---|---|---|
| Redis | In-memory | Eventual (async replication) | Hash map + RDB/AOF |
| DynamoDB | Leaderless | Configurable | LSM Tree |
| etcd | Leader-based | Strong (Raft) | B-Tree (BoltDB) |
| Cassandra | Leaderless | Configurable (quorum) | LSM Tree |
Visualize your key-value store#
See how partitioning, replication, and coordination connect — try Codelit to generate an interactive diagram of a distributed key-value store.
Key takeaways#
- Consistent hashing for partitioning — minimal key movement on scaling
- Replication factor R for durability — typically R=3
- Quorum W+R>N for strong consistency, W=R=1 for speed
- LSM Trees for write-heavy, B-Trees for read-heavy workloads
- Vector clocks or CRDTs for conflict resolution in leaderless systems
- CAP theorem forces a choice: consistency or availability during partitions
Try it on Codelit
Chaos Mode
Simulate node failures and watch cascading impact across your architecture
AI Architecture Review
Get an AI audit covering security gaps, bottlenecks, and scaling risks
90+ Templates
Practice with real-world architectures — Uber, Netflix, Slack, and more
Related articles
Try these templates
Simple Ride-Sharing MVP
A basic ride-sharing app with core components for rider-driver matching.
5 componentsDistributed Rate Limiter
API rate limiting with sliding window, token bucket, and per-user quotas.
7 componentsDistributed Key-Value Store
Redis/DynamoDB-like distributed KV store with consistent hashing, replication, and tunable consistency.
8 componentsBuild this architecture
Generate an interactive architecture for Design a Key in seconds.
Try it in Codelit →
Comments