distributed cacheRedis ClusterMemcachedconsistent hashingcache stampedeeviction policiessystem designcaching patterns

Distributed Cache Architecture: From Redis Cluster to Cache Stampedes

March 29, 2026 7 min readBy Codelit Team Discussion

A cache turns expensive computations or slow I/O into fast lookups. A distributed cache spreads that capability across multiple nodes so it can scale beyond a single machine's memory and survive individual node failures. Getting the architecture right means understanding topologies, data distribution, failure modes, and eviction strategies.

Redis Cluster#

Redis Cluster is the most widely deployed distributed cache. It partitions data across multiple primary nodes using 16,384 hash slots.

┌──────────┐  ┌──────────┐  ┌──────────┐
│ Primary A│  │ Primary B│  │ Primary C│
│ slots    │  │ slots    │  │ slots    │
│ 0-5460   │  │ 5461-10922│ │10923-16383│
├──────────┤  ├──────────┤  ├──────────┤
│ Replica  │  │ Replica  │  │ Replica  │
│ A1       │  │ B1       │  │ C1       │
└──────────┘  └──────────┘  └──────────┘

Key characteristics:

Automatic sharding — keys are assigned to slots via CRC16(key) mod 16384.
Failover — if a primary fails, its replica is promoted automatically.
Client-side routing — clients learn the slot map and route directly to the correct node, avoiding an extra hop.
Resharding — slots can be migrated between nodes with zero downtime.

Redis Cluster does not support multi-key operations across different slots unless you use hash tags (e.g., {user:123}:profile and {user:123}:settings land on the same slot).

Memcached#

Memcached is a simpler, multi-threaded cache focused purely on key-value storage. Unlike Redis, it has no built-in clustering — distribution is handled entirely by the client using consistent hashing.

Client ──▶ hash(key) ──▶ pick node ──▶ Memcached node

Strengths over Redis:

Multi-threaded — uses all CPU cores out of the box (Redis is single-threaded per shard).
Memory efficiency — slab allocator avoids fragmentation for uniform-size values.
Simplicity — no persistence, no replication, no scripting. Just fast gets and sets.

Trade-offs:

No data structures beyond strings.
No built-in replication — a node failure means cache misses until repopulation.
No persistence — restart means cold cache.

Cache Topologies#

Look-Aside (Cache-Aside)#

The application manages the cache explicitly:

Check the cache.
On miss, query the database.
Write the result to the cache.

App ──▶ Cache (miss) ──▶ Database ──▶ App writes to Cache

This is the most common topology. The application has full control over what gets cached, when it expires, and how stale data is handled.

Inline (Read-Through / Write-Through)#

The cache sits between the application and the database. The application always talks to the cache, which handles database interaction transparently.

App ──▶ Cache ──▶ Database
        (read-through: cache fetches on miss)
        (write-through: cache writes to DB on set)

Benefits: simpler application code, guaranteed consistency between cache and DB. Drawback: write latency increases because every write hits the database synchronously.

Write-Behind (Write-Back)#

Like write-through, but the cache buffers writes and flushes to the database asynchronously.

App ──▶ Cache ──(async)──▶ Database

This improves write latency at the cost of durability — if the cache node crashes before flushing, buffered writes are lost.

Embedded Cache#

The cache lives inside the application process (e.g., Caffeine in Java, lru_cache in Python). No network hop, but limited to a single instance's memory and not shared across replicas.

Use embedded caches for hot, read-heavy, small datasets — configuration values, feature flags, or compiled templates.

Consistent Hashing#

Distributing keys evenly across cache nodes is critical. Naive hash(key) mod N breaks catastrophically when nodes are added or removed — nearly every key remaps.

Consistent hashing arranges nodes on a virtual ring. Each key hashes to a position on the ring and is assigned to the next node clockwise.

        Node A
       /      \
    Node D    Node B
       \      /
        Node C

Key "user:42" ──▶ hash position ──▶ lands between A and B ──▶ assigned to B

When a node is removed, only its keys remap — to the next node on the ring. All other mappings remain stable. Virtual nodes (multiple positions per physical node) improve balance.

Jump Consistent Hashing#

An alternative that produces perfectly balanced distribution with zero memory overhead. It uses a deterministic algorithm instead of a ring, but does not support arbitrary node removal — only appending or removing the last node.

Cache Stampede#

A cache stampede (thundering herd) occurs when a popular key expires and hundreds of concurrent requests all miss the cache simultaneously, overwhelming the database.

Mitigation Strategies#

Locking — the first request to miss acquires a lock; others wait or get a stale value.

if cache.get(key) is None:
    if cache.acquire_lock(key, ttl=5):
        value = database.query(key)
        cache.set(key, value, ttl=300)
        cache.release_lock(key)
    else:
        # Wait and retry, or return stale
        value = cache.get_stale(key)

Probabilistic early expiration — each read recalculates whether to refresh before the TTL expires, with probability increasing as expiry approaches. This is the XFetch algorithm.
Background refresh — a separate process refreshes popular keys before they expire, so the cache never goes cold.
Stale-while-revalidate — serve the expired value immediately while refreshing in the background.

Hot Key Problem#

A hot key is a single key receiving disproportionate traffic — a viral post, a flash sale item, or a global configuration value. Even a distributed cache can bottleneck because one key maps to one node.

Solutions#

Local caching — cache hot keys in application memory with a short TTL (1-5 seconds). This absorbs the majority of reads without touching the distributed cache.
Key replication — create multiple copies of the key with suffixes (product:123:r1, product:123:r2, ...) spread across different nodes. Clients randomly pick a replica.
Read replicas — Redis allows reading from replicas, distributing read load across the replica set.
Dedicated hot-key nodes — route known hot keys to beefier nodes provisioned for the traffic.

Replication#

Redis Replication#

Redis uses asynchronous replication by default. The primary streams its write-ahead log to replicas. This means:

Replicas may lag behind the primary by a few milliseconds.
If the primary fails before replication, those writes are lost.
WAIT command can enforce synchronous replication for critical writes (at the cost of latency).

Multi-Region Replication#

For global applications, cache data must be available in multiple regions. Options:

Active-passive — one region is the writer; others are read replicas. Simple but adds cross-region write latency.
Active-active — each region accepts writes. Requires conflict resolution (last-writer-wins or CRDTs). Redis Enterprise supports this natively.

Eviction Policies#

When cache memory is full, the eviction policy determines which keys to remove.

Policy	Description	Best For
LRU (Least Recently Used)	Evicts the key not accessed for the longest time	General-purpose workloads
LFU (Least Frequently Used)	Evicts the key accessed least often over time	Workloads with stable popularity
TTL-based	Evicts keys closest to expiration	Time-sensitive data
Random	Evicts a random key	When access patterns are uniform
allkeys-lru	LRU across all keys (Redis default recommendation)	Mixed workloads with no explicit TTL
volatile-lru	LRU only among keys with a TTL set	When some keys must never be evicted

Redis implements approximated LRU/LFU — it samples a configurable number of keys (default 5) and evicts the best candidate among them. This is cheaper than true LRU while producing nearly identical results.

Tuning Tips#

Set maxmemory-samples to 10 for more accurate eviction at the cost of slightly higher CPU.
Use volatile-ttl when you want natural expiration order.
Monitor evicted_keys metric — if eviction rate is high, your cache is undersized.

Key Takeaways#

Distributed caching is not just "put Redis in front of the database." The topology (look-aside vs inline), distribution strategy (consistent hashing), failure modes (stampede, hot key), replication model, and eviction policy all interact to determine whether your cache improves performance or introduces new bottlenecks.

Design your cache layer with the same rigor you apply to your database layer — because under load, the cache is your database.

Want to sharpen your system design skills? Explore 324 more articles on Codelit.dev covering distributed systems, architecture patterns, and real-world engineering.

Try it on Codelit

Chaos Mode

Simulate node failures and watch cascading impact across your architecture

Build this architecture →

Comments

AI search

AI-Powered Search Architecture: Semantic Search, Hybrid Search, and RAG

8 min read

AI safety

AI Safety Guardrails Architecture: Input Validation, Output Filtering, and Human-in-the-Loop

8 min read

API design

API Backward Compatibility: Ship Changes Without Breaking Consumers

6 min read

Try these templates

Distributed Rate Limiter

API rate limiting with sliding window, token bucket, and per-user quotas.

7 components

Distributed Key-Value Store

Redis/DynamoDB-like distributed KV store with consistent hashing, replication, and tunable consistency.

8 components

Elasticsearch Search Cluster

Distributed search and analytics engine with inverted indexes, sharding, replication, and the ELK stack.

10 components

Build this architecture

Generate an interactive Distributed Cache Architecture in seconds.

Try it in Codelit →

distributed cacheRedis ClusterMemcachedconsistent hashingcache stampedeeviction policiessystem designcaching patterns

Distributed Cache Architecture: From Redis Cluster to Cache Stampedes

March 29, 2026 7 min readBy Codelit Team Discussion

Redis Cluster#

Redis Cluster is the most widely deployed distributed cache. It partitions data across multiple primary nodes using 16,384 hash slots.

┌──────────┐  ┌──────────┐  ┌──────────┐
│ Primary A│  │ Primary B│  │ Primary C│
│ slots    │  │ slots    │  │ slots    │
│ 0-5460   │  │ 5461-10922│ │10923-16383│
├──────────┤  ├──────────┤  ├──────────┤
│ Replica  │  │ Replica  │  │ Replica  │
│ A1       │  │ B1       │  │ C1       │
└──────────┘  └──────────┘  └──────────┘

Key characteristics:

Automatic sharding — keys are assigned to slots via CRC16(key) mod 16384.
Failover — if a primary fails, its replica is promoted automatically.
Client-side routing — clients learn the slot map and route directly to the correct node, avoiding an extra hop.
Resharding — slots can be migrated between nodes with zero downtime.

Redis Cluster does not support multi-key operations across different slots unless you use hash tags (e.g., {user:123}:profile and {user:123}:settings land on the same slot).

Memcached#

Client ──▶ hash(key) ──▶ pick node ──▶ Memcached node

Strengths over Redis:

Multi-threaded — uses all CPU cores out of the box (Redis is single-threaded per shard).
Memory efficiency — slab allocator avoids fragmentation for uniform-size values.
Simplicity — no persistence, no replication, no scripting. Just fast gets and sets.

Trade-offs:

No data structures beyond strings.
No built-in replication — a node failure means cache misses until repopulation.
No persistence — restart means cold cache.

Cache Topologies#

Look-Aside (Cache-Aside)#

The application manages the cache explicitly:

Check the cache.
On miss, query the database.
Write the result to the cache.

App ──▶ Cache (miss) ──▶ Database ──▶ App writes to Cache

This is the most common topology. The application has full control over what gets cached, when it expires, and how stale data is handled.

Inline (Read-Through / Write-Through)#

The cache sits between the application and the database. The application always talks to the cache, which handles database interaction transparently.

App ──▶ Cache ──▶ Database
        (read-through: cache fetches on miss)
        (write-through: cache writes to DB on set)

Benefits: simpler application code, guaranteed consistency between cache and DB. Drawback: write latency increases because every write hits the database synchronously.

Write-Behind (Write-Back)#

Like write-through, but the cache buffers writes and flushes to the database asynchronously.

App ──▶ Cache ──(async)──▶ Database

This improves write latency at the cost of durability — if the cache node crashes before flushing, buffered writes are lost.

Embedded Cache#

The cache lives inside the application process (e.g., Caffeine in Java, lru_cache in Python). No network hop, but limited to a single instance's memory and not shared across replicas.

Use embedded caches for hot, read-heavy, small datasets — configuration values, feature flags, or compiled templates.

Consistent Hashing#

Distributing keys evenly across cache nodes is critical. Naive hash(key) mod N breaks catastrophically when nodes are added or removed — nearly every key remaps.

Consistent hashing arranges nodes on a virtual ring. Each key hashes to a position on the ring and is assigned to the next node clockwise.

        Node A
       /      \
    Node D    Node B
       \      /
        Node C

Key "user:42" ──▶ hash position ──▶ lands between A and B ──▶ assigned to B

When a node is removed, only its keys remap — to the next node on the ring. All other mappings remain stable. Virtual nodes (multiple positions per physical node) improve balance.

Jump Consistent Hashing#

Cache Stampede#

A cache stampede (thundering herd) occurs when a popular key expires and hundreds of concurrent requests all miss the cache simultaneously, overwhelming the database.

Mitigation Strategies#

Locking — the first request to miss acquires a lock; others wait or get a stale value.

if cache.get(key) is None:
    if cache.acquire_lock(key, ttl=5):
        value = database.query(key)
        cache.set(key, value, ttl=300)
        cache.release_lock(key)
    else:
        # Wait and retry, or return stale
        value = cache.get_stale(key)

Probabilistic early expiration — each read recalculates whether to refresh before the TTL expires, with probability increasing as expiry approaches. This is the XFetch algorithm.
Background refresh — a separate process refreshes popular keys before they expire, so the cache never goes cold.
Stale-while-revalidate — serve the expired value immediately while refreshing in the background.

Hot Key Problem#

Solutions#

Local caching — cache hot keys in application memory with a short TTL (1-5 seconds). This absorbs the majority of reads without touching the distributed cache.
Key replication — create multiple copies of the key with suffixes (product:123:r1, product:123:r2, ...) spread across different nodes. Clients randomly pick a replica.
Read replicas — Redis allows reading from replicas, distributing read load across the replica set.
Dedicated hot-key nodes — route known hot keys to beefier nodes provisioned for the traffic.

Replication#

Redis Replication#

Redis uses asynchronous replication by default. The primary streams its write-ahead log to replicas. This means:

Replicas may lag behind the primary by a few milliseconds.
If the primary fails before replication, those writes are lost.
WAIT command can enforce synchronous replication for critical writes (at the cost of latency).

Multi-Region Replication#

For global applications, cache data must be available in multiple regions. Options:

Active-passive — one region is the writer; others are read replicas. Simple but adds cross-region write latency.
Active-active — each region accepts writes. Requires conflict resolution (last-writer-wins or CRDTs). Redis Enterprise supports this natively.

Eviction Policies#

When cache memory is full, the eviction policy determines which keys to remove.

Policy	Description	Best For
LRU (Least Recently Used)	Evicts the key not accessed for the longest time	General-purpose workloads
LFU (Least Frequently Used)	Evicts the key accessed least often over time	Workloads with stable popularity
TTL-based	Evicts keys closest to expiration	Time-sensitive data
Random	Evicts a random key	When access patterns are uniform
allkeys-lru	LRU across all keys (Redis default recommendation)	Mixed workloads with no explicit TTL
volatile-lru	LRU only among keys with a TTL set	When some keys must never be evicted

Tuning Tips#

Set maxmemory-samples to 10 for more accurate eviction at the cost of slightly higher CPU.
Use volatile-ttl when you want natural expiration order.
Monitor evicted_keys metric — if eviction rate is high, your cache is undersized.

Key Takeaways#

Design your cache layer with the same rigor you apply to your database layer — because under load, the cache is your database.

Want to sharpen your system design skills? Explore 324 more articles on Codelit.dev covering distributed systems, architecture patterns, and real-world engineering.

Try it on Codelit

Chaos Mode

Simulate node failures and watch cascading impact across your architecture

Build this architecture →

Comments

AI search

AI-Powered Search Architecture: Semantic Search, Hybrid Search, and RAG

8 min read

AI safety

AI Safety Guardrails Architecture: Input Validation, Output Filtering, and Human-in-the-Loop

8 min read

API design

API Backward Compatibility: Ship Changes Without Breaking Consumers

6 min read

Build this architecture

Generate an interactive Distributed Cache Architecture in seconds.

Try it in Codelit →

Distributed Cache Architecture: From Redis Cluster to Cache Stampedes

Redis Cluster#

Memcached#

Cache Topologies#

Look-Aside (Cache-Aside)#

Inline (Read-Through / Write-Through)#

Write-Behind (Write-Back)#

Embedded Cache#

Consistent Hashing#

Jump Consistent Hashing#

Cache Stampede#

Mitigation Strategies#

Hot Key Problem#

Solutions#

Replication#

Redis Replication#

Multi-Region Replication#

Eviction Policies#

Tuning Tips#

Key Takeaways#

Comments

Related articles

AI-Powered Search Architecture: Semantic Search, Hybrid Search, and RAG

AI Safety Guardrails Architecture: Input Validation, Output Filtering, and Human-in-the-Loop

API Backward Compatibility: Ship Changes Without Breaking Consumers

Try these templates

Distributed Rate Limiter

Distributed Key-Value Store

Elasticsearch Search Cluster

Build this architecture

Distributed Cache Architecture: From Redis Cluster to Cache Stampedes

Redis Cluster#

Memcached#

Cache Topologies#

Look-Aside (Cache-Aside)#

Inline (Read-Through / Write-Through)#

Write-Behind (Write-Back)#

Embedded Cache#

Consistent Hashing#

Jump Consistent Hashing#

Cache Stampede#

Mitigation Strategies#

Hot Key Problem#

Solutions#

Replication#

Redis Replication#

Multi-Region Replication#

Eviction Policies#

Tuning Tips#

Key Takeaways#

Comments

Related articles

AI-Powered Search Architecture: Semantic Search, Hybrid Search, and RAG

AI Safety Guardrails Architecture: Input Validation, Output Filtering, and Human-in-the-Loop

API Backward Compatibility: Ship Changes Without Breaking Consumers

Try these templates

Distributed Rate Limiter

Distributed Key-Value Store

Elasticsearch Search Cluster

Build this architecture