Distributed Cache Architecture: From Redis Cluster to Cache Stampedes
A cache turns expensive computations or slow I/O into fast lookups. A distributed cache spreads that capability across multiple nodes so it can scale beyond a single machine's memory and survive individual node failures. Getting the architecture right means understanding topologies, data distribution, failure modes, and eviction strategies.
Redis Cluster#
Redis Cluster is the most widely deployed distributed cache. It partitions data across multiple primary nodes using 16,384 hash slots.
┌──────────┐ ┌──────────┐ ┌──────────┐
│ Primary A│ │ Primary B│ │ Primary C│
│ slots │ │ slots │ │ slots │
│ 0-5460 │ │ 5461-10922│ │10923-16383│
├──────────┤ ├──────────┤ ├──────────┤
│ Replica │ │ Replica │ │ Replica │
│ A1 │ │ B1 │ │ C1 │
└──────────┘ └──────────┘ └──────────┘
Key characteristics:
- Automatic sharding — keys are assigned to slots via
CRC16(key) mod 16384. - Failover — if a primary fails, its replica is promoted automatically.
- Client-side routing — clients learn the slot map and route directly to the correct node, avoiding an extra hop.
- Resharding — slots can be migrated between nodes with zero downtime.
Redis Cluster does not support multi-key operations across different slots unless you use hash tags (e.g., {user:123}:profile and {user:123}:settings land on the same slot).
Memcached#
Memcached is a simpler, multi-threaded cache focused purely on key-value storage. Unlike Redis, it has no built-in clustering — distribution is handled entirely by the client using consistent hashing.
Client ──▶ hash(key) ──▶ pick node ──▶ Memcached node
Strengths over Redis:
- Multi-threaded — uses all CPU cores out of the box (Redis is single-threaded per shard).
- Memory efficiency — slab allocator avoids fragmentation for uniform-size values.
- Simplicity — no persistence, no replication, no scripting. Just fast gets and sets.
Trade-offs:
- No data structures beyond strings.
- No built-in replication — a node failure means cache misses until repopulation.
- No persistence — restart means cold cache.
Cache Topologies#
Look-Aside (Cache-Aside)#
The application manages the cache explicitly:
- Check the cache.
- On miss, query the database.
- Write the result to the cache.
App ──▶ Cache (miss) ──▶ Database ──▶ App writes to Cache
This is the most common topology. The application has full control over what gets cached, when it expires, and how stale data is handled.
Inline (Read-Through / Write-Through)#
The cache sits between the application and the database. The application always talks to the cache, which handles database interaction transparently.
App ──▶ Cache ──▶ Database
(read-through: cache fetches on miss)
(write-through: cache writes to DB on set)
Benefits: simpler application code, guaranteed consistency between cache and DB. Drawback: write latency increases because every write hits the database synchronously.
Write-Behind (Write-Back)#
Like write-through, but the cache buffers writes and flushes to the database asynchronously.
App ──▶ Cache ──(async)──▶ Database
This improves write latency at the cost of durability — if the cache node crashes before flushing, buffered writes are lost.
Embedded Cache#
The cache lives inside the application process (e.g., Caffeine in Java, lru_cache in Python). No network hop, but limited to a single instance's memory and not shared across replicas.
Use embedded caches for hot, read-heavy, small datasets — configuration values, feature flags, or compiled templates.
Consistent Hashing#
Distributing keys evenly across cache nodes is critical. Naive hash(key) mod N breaks catastrophically when nodes are added or removed — nearly every key remaps.
Consistent hashing arranges nodes on a virtual ring. Each key hashes to a position on the ring and is assigned to the next node clockwise.
Node A
/ \
Node D Node B
\ /
Node C
Key "user:42" ──▶ hash position ──▶ lands between A and B ──▶ assigned to B
When a node is removed, only its keys remap — to the next node on the ring. All other mappings remain stable. Virtual nodes (multiple positions per physical node) improve balance.
Jump Consistent Hashing#
An alternative that produces perfectly balanced distribution with zero memory overhead. It uses a deterministic algorithm instead of a ring, but does not support arbitrary node removal — only appending or removing the last node.
Cache Stampede#
A cache stampede (thundering herd) occurs when a popular key expires and hundreds of concurrent requests all miss the cache simultaneously, overwhelming the database.
Mitigation Strategies#
- Locking — the first request to miss acquires a lock; others wait or get a stale value.
if cache.get(key) is None:
if cache.acquire_lock(key, ttl=5):
value = database.query(key)
cache.set(key, value, ttl=300)
cache.release_lock(key)
else:
# Wait and retry, or return stale
value = cache.get_stale(key)
-
Probabilistic early expiration — each read recalculates whether to refresh before the TTL expires, with probability increasing as expiry approaches. This is the XFetch algorithm.
-
Background refresh — a separate process refreshes popular keys before they expire, so the cache never goes cold.
-
Stale-while-revalidate — serve the expired value immediately while refreshing in the background.
Hot Key Problem#
A hot key is a single key receiving disproportionate traffic — a viral post, a flash sale item, or a global configuration value. Even a distributed cache can bottleneck because one key maps to one node.
Solutions#
- Local caching — cache hot keys in application memory with a short TTL (1-5 seconds). This absorbs the majority of reads without touching the distributed cache.
- Key replication — create multiple copies of the key with suffixes (
product:123:r1,product:123:r2, ...) spread across different nodes. Clients randomly pick a replica. - Read replicas — Redis allows reading from replicas, distributing read load across the replica set.
- Dedicated hot-key nodes — route known hot keys to beefier nodes provisioned for the traffic.
Replication#
Redis Replication#
Redis uses asynchronous replication by default. The primary streams its write-ahead log to replicas. This means:
- Replicas may lag behind the primary by a few milliseconds.
- If the primary fails before replication, those writes are lost.
WAITcommand can enforce synchronous replication for critical writes (at the cost of latency).
Multi-Region Replication#
For global applications, cache data must be available in multiple regions. Options:
- Active-passive — one region is the writer; others are read replicas. Simple but adds cross-region write latency.
- Active-active — each region accepts writes. Requires conflict resolution (last-writer-wins or CRDTs). Redis Enterprise supports this natively.
Eviction Policies#
When cache memory is full, the eviction policy determines which keys to remove.
| Policy | Description | Best For |
|---|---|---|
| LRU (Least Recently Used) | Evicts the key not accessed for the longest time | General-purpose workloads |
| LFU (Least Frequently Used) | Evicts the key accessed least often over time | Workloads with stable popularity |
| TTL-based | Evicts keys closest to expiration | Time-sensitive data |
| Random | Evicts a random key | When access patterns are uniform |
| allkeys-lru | LRU across all keys (Redis default recommendation) | Mixed workloads with no explicit TTL |
| volatile-lru | LRU only among keys with a TTL set | When some keys must never be evicted |
Redis implements approximated LRU/LFU — it samples a configurable number of keys (default 5) and evicts the best candidate among them. This is cheaper than true LRU while producing nearly identical results.
Tuning Tips#
- Set
maxmemory-samplesto 10 for more accurate eviction at the cost of slightly higher CPU. - Use
volatile-ttlwhen you want natural expiration order. - Monitor
evicted_keysmetric — if eviction rate is high, your cache is undersized.
Key Takeaways#
Distributed caching is not just "put Redis in front of the database." The topology (look-aside vs inline), distribution strategy (consistent hashing), failure modes (stampede, hot key), replication model, and eviction policy all interact to determine whether your cache improves performance or introduces new bottlenecks.
Design your cache layer with the same rigor you apply to your database layer — because under load, the cache is your database.
Want to sharpen your system design skills? Explore 324 more articles on Codelit.dev covering distributed systems, architecture patterns, and real-world engineering.
Try it on Codelit
Chaos Mode
Simulate node failures and watch cascading impact across your architecture
Related articles
Try these templates
Distributed Rate Limiter
API rate limiting with sliding window, token bucket, and per-user quotas.
7 componentsDistributed Key-Value Store
Redis/DynamoDB-like distributed KV store with consistent hashing, replication, and tunable consistency.
8 componentsElasticsearch Search Cluster
Distributed search and analytics engine with inverted indexes, sharding, replication, and the ELK stack.
10 componentsBuild this architecture
Generate an interactive Distributed Cache Architecture in seconds.
Try it in Codelit →
Comments