system-designinterviewdatabase

Design a Key-Value Store — From Simple Hash Map to Distributed Database

March 24, 2026 4 min readBy Mo Discussion

Start simple, then distribute#

A key-value store is the simplest database: given a key, return a value. But scaling it to handle millions of requests per second across multiple data centers is one of the hardest distributed systems problems.

This is a classic system design interview question because it covers partitioning, replication, consistency, and failure handling.

Single-server key-value store#

Start with a hash map in memory:

store = {}

def put(key, value):
    store[key] = value

def get(key):
    return store.get(key)

This handles thousands of requests per second on a single machine. But it has limits:

Memory — can't store more data than RAM
Availability — one server crash = total data loss
Throughput — one CPU handles all requests

Distributing across servers#

Partitioning (sharding)#

Split the key space across N servers using consistent hashing:

hash("user:123") % N → Server 2
hash("user:456") % N → Server 0

Each server owns a range of the hash ring. Use consistent hashing so adding/removing servers only moves ~1/N of keys.

Replication#

Copy each key to R replicas on different servers for durability:

Leader-based: One primary handles writes, replicates to followers
Leaderless: Any replica can accept writes (Dynamo-style)

Consistency models#

Model	Guarantee	Latency	Used by
Strong	Read always returns latest write	High (quorum)	etcd, ZooKeeper
Eventual	Reads may be stale temporarily	Low	DynamoDB, Cassandra
Causal	Respects happens-before ordering	Medium	MongoDB

Quorum reads/writes: With N replicas, W write acks, R read acks:

W + R > N → strong consistency
W=1, R=1 → fastest but weakest

Write path#

Client sends PUT(key, value) to any node (coordinator)
Coordinator hashes key → determines responsible partition
Coordinator forwards to primary of that partition
Primary writes to local storage + write-ahead log
Primary replicates to R-1 followers
After W acks received → success returned to client

Read path#

Client sends GET(key) to coordinator
Coordinator routes to partition owning that key
Reads from R replicas (or just one, depending on consistency level)
Returns value (or the most recent version if reading multiple replicas)

Conflict resolution#

With leaderless replication, two clients can write different values for the same key simultaneously.

Last-writer-wins (LWW): Attach a timestamp. Latest timestamp wins. Simple but can lose data.

Vector clocks: Track causal relationships. Detect conflicts and let the application resolve them.

CRDTs: Conflict-free replicated data types. Data structures that can be merged without conflicts (counters, sets, registers).

Storage engine#

Two main approaches for on-disk storage:

LSM Tree (Log-Structured Merge): Write-optimized. All writes go to an in-memory table (memtable), then flush to sorted files (SSTables). Compaction merges files periodically. Used by LevelDB, RocksDB, Cassandra.

B-Tree: Read-optimized. Traditional balanced tree on disk. In-place updates. Used by PostgreSQL, MySQL, etcd.

Failure handling#

Node failure — replicas serve reads; replication catches up when node recovers
Network partition — choose availability (AP) or consistency (CP) per CAP theorem
Data corruption — checksums on every block; replicas provide redundancy
Hotspot keys — cache popular keys; split hot partitions

Real-world examples#

System	Model	Consistency	Storage
Redis	In-memory	Eventual (async replication)	Hash map + RDB/AOF
DynamoDB	Leaderless	Configurable	LSM Tree
etcd	Leader-based	Strong (Raft)	B-Tree (BoltDB)
Cassandra	Leaderless	Configurable (quorum)	LSM Tree

Visualize your key-value store#

See how partitioning, replication, and coordination connect — try Codelit to generate an interactive diagram of a distributed key-value store.

Key takeaways#

Consistent hashing for partitioning — minimal key movement on scaling
Replication factor R for durability — typically R=3
Quorum W+R>N for strong consistency, W=R=1 for speed
LSM Trees for write-heavy, B-Trees for read-heavy workloads
Vector clocks or CRDTs for conflict resolution in leaderless systems
CAP theorem forces a choice: consistency or availability during partitions

{ }

Explore the Netflix architecture interactively

Try it →

Try it on Codelit

Chaos Mode

Simulate node failures and watch cascading impact across your architecture

AI Architecture Review

Get an AI audit covering security gaps, bottlenecks, and scaling risks

90+ Templates

Practice with real-world architectures — Uber, Netflix, Slack, and more

Build this architecture →

Comments

api design

Batch API Endpoints — Patterns for Bulk Operations, Partial Success, and Idempotency

8 min read

system design

Circuit Breaker Implementation — State Machine, Failure Counting, Fallbacks, and Resilience4j

7 min read

api

API-First Design Methodology — Design Before You Implement

7 min read

Try these templates

Simple Ride-Sharing MVP

A basic ride-sharing app with core components for rider-driver matching.

5 components

Distributed Rate Limiter

API rate limiting with sliding window, token bucket, and per-user quotas.

7 components

Distributed Key-Value Store

Redis/DynamoDB-like distributed KV store with consistent hashing, replication, and tunable consistency.

8 components

Build this architecture

Generate an interactive architecture for Design a Key in seconds.

Try it in Codelit →

system-designinterviewdatabase

Design a Key-Value Store — From Simple Hash Map to Distributed Database

March 24, 2026 4 min readBy Mo Discussion

Start simple, then distribute#

This is a classic system design interview question because it covers partitioning, replication, consistency, and failure handling.

Single-server key-value store#

Start with a hash map in memory:

store = {}

def put(key, value):
    store[key] = value

def get(key):
    return store.get(key)

This handles thousands of requests per second on a single machine. But it has limits:

Memory — can't store more data than RAM
Availability — one server crash = total data loss
Throughput — one CPU handles all requests

Distributing across servers#

Partitioning (sharding)#

Split the key space across N servers using consistent hashing:

hash("user:123") % N → Server 2
hash("user:456") % N → Server 0

Each server owns a range of the hash ring. Use consistent hashing so adding/removing servers only moves ~1/N of keys.

Replication#

Copy each key to R replicas on different servers for durability:

Leader-based: One primary handles writes, replicates to followers
Leaderless: Any replica can accept writes (Dynamo-style)

Consistency models#

Model	Guarantee	Latency	Used by
Strong	Read always returns latest write	High (quorum)	etcd, ZooKeeper
Eventual	Reads may be stale temporarily	Low	DynamoDB, Cassandra
Causal	Respects happens-before ordering	Medium	MongoDB

Quorum reads/writes: With N replicas, W write acks, R read acks:

W + R > N → strong consistency
W=1, R=1 → fastest but weakest

Write path#

Client sends PUT(key, value) to any node (coordinator)
Coordinator hashes key → determines responsible partition
Coordinator forwards to primary of that partition
Primary writes to local storage + write-ahead log
Primary replicates to R-1 followers
After W acks received → success returned to client

Read path#

Client sends GET(key) to coordinator
Coordinator routes to partition owning that key
Reads from R replicas (or just one, depending on consistency level)
Returns value (or the most recent version if reading multiple replicas)

Conflict resolution#

With leaderless replication, two clients can write different values for the same key simultaneously.

Last-writer-wins (LWW): Attach a timestamp. Latest timestamp wins. Simple but can lose data.

Vector clocks: Track causal relationships. Detect conflicts and let the application resolve them.

CRDTs: Conflict-free replicated data types. Data structures that can be merged without conflicts (counters, sets, registers).

Storage engine#

Two main approaches for on-disk storage:

B-Tree: Read-optimized. Traditional balanced tree on disk. In-place updates. Used by PostgreSQL, MySQL, etcd.

Failure handling#

Node failure — replicas serve reads; replication catches up when node recovers
Network partition — choose availability (AP) or consistency (CP) per CAP theorem
Data corruption — checksums on every block; replicas provide redundancy
Hotspot keys — cache popular keys; split hot partitions

Real-world examples#

System	Model	Consistency	Storage
Redis	In-memory	Eventual (async replication)	Hash map + RDB/AOF
DynamoDB	Leaderless	Configurable	LSM Tree
etcd	Leader-based	Strong (Raft)	B-Tree (BoltDB)
Cassandra	Leaderless	Configurable (quorum)	LSM Tree

Visualize your key-value store#

See how partitioning, replication, and coordination connect — try Codelit to generate an interactive diagram of a distributed key-value store.

Key takeaways#

Consistent hashing for partitioning — minimal key movement on scaling
Replication factor R for durability — typically R=3
Quorum W+R>N for strong consistency, W=R=1 for speed
LSM Trees for write-heavy, B-Trees for read-heavy workloads
Vector clocks or CRDTs for conflict resolution in leaderless systems
CAP theorem forces a choice: consistency or availability during partitions

{ }

Explore the Netflix architecture interactively

Try it →

Try it on Codelit

Chaos Mode

Simulate node failures and watch cascading impact across your architecture

AI Architecture Review

Get an AI audit covering security gaps, bottlenecks, and scaling risks

90+ Templates

Practice with real-world architectures — Uber, Netflix, Slack, and more

Build this architecture →

Comments

api design

Build this architecture

Generate an interactive architecture for Design a Key in seconds.

Try it in Codelit →

Design a Key-Value Store — From Simple Hash Map to Distributed Database

Start simple, then distribute#

Single-server key-value store#

Distributing across servers#

Partitioning (sharding)#

Replication#

Consistency models#

Write path#

Read path#

Conflict resolution#

Storage engine#

Failure handling#

Real-world examples#

Visualize your key-value store#

Key takeaways#

Comments

Related articles

Batch API Endpoints — Patterns for Bulk Operations, Partial Success, and Idempotency

Circuit Breaker Implementation — State Machine, Failure Counting, Fallbacks, and Resilience4j

API-First Design Methodology — Design Before You Implement

Try these templates

Simple Ride-Sharing MVP

Distributed Rate Limiter

Distributed Key-Value Store

Build this architecture

Design a Key-Value Store — From Simple Hash Map to Distributed Database

Start simple, then distribute#

Single-server key-value store#

Distributing across servers#

Partitioning (sharding)#

Replication#

Consistency models#

Write path#

Read path#

Conflict resolution#

Storage engine#

Failure handling#

Real-world examples#

Visualize your key-value store#

Key takeaways#

Comments

Related articles

Batch API Endpoints — Patterns for Bulk Operations, Partial Success, and Idempotency

Circuit Breaker Implementation — State Machine, Failure Counting, Fallbacks, and Resilience4j

API-First Design Methodology — Design Before You Implement

Try these templates

Simple Ride-Sharing MVP

Distributed Rate Limiter

Distributed Key-Value Store

Build this architecture