Cache Invalidation Strategies: TTL, Event-Driven, Tags & More
Cache Invalidation Strategies#
There are only two hard things in Computer Science: cache invalidation and naming things. Here is a practical guide to getting the first one right.
Why Invalidation Matters#
Caching without a clear invalidation strategy leads to stale data, confused users, and debugging nightmares. Every cache entry needs an answer to: when does this stop being correct?
1. TTL-Based Invalidation#
The simplest approach: every cached value expires after a fixed duration.
SET user:42:profile "{...}" EX 300 # expires in 5 minutes
Pros: Simple, predictable, self-healing. Cons: Data can be stale for up to the full TTL window. Too short wastes cache; too long serves outdated data.
When to use: Read-heavy data that tolerates brief staleness (product listings, public profiles).
Adaptive TTL#
Vary TTL based on how frequently data changes:
def adaptive_ttl(last_modified, base_ttl=300):
age = time.time() - last_modified
if age > 86400: # not modified in 24h
return base_ttl * 4
elif age > 3600: # not modified in 1h
return base_ttl * 2
return base_ttl
2. Event-Driven Invalidation#
Invalidate the cache the moment data changes by publishing events.
User updates profile
-> publish "user:42:updated"
-> cache subscriber deletes user:42:profile
Pros: Near-zero staleness. Cons: Requires event infrastructure (Kafka, Redis Pub/Sub, SNS). Adds coupling between write path and cache layer.
When to use: Consistency-critical data (inventory counts, account balances).
3. Write-Through Invalidation#
Every write updates the cache and the database in a single operation.
def update_user(user_id, data):
db.update("users", user_id, data)
cache.set(f"user:{user_id}:profile", serialize(data), ex=3600)
Pros: Cache is always fresh immediately after writes. Cons: Write latency increases. If the cache write fails, you get inconsistency.
Variant — Write-Behind (Write-Back): Write to cache first, asynchronously flush to DB. Faster writes but risk of data loss.
4. Cache-Aside (Lazy Loading) Invalidation#
The application manages the cache explicitly:
def get_user(user_id):
cached = cache.get(f"user:{user_id}")
if cached:
return deserialize(cached)
user = db.query("SELECT * FROM users WHERE id = %s", user_id)
cache.set(f"user:{user_id}", serialize(user), ex=600)
return user
def update_user(user_id, data):
db.update("users", user_id, data)
cache.delete(f"user:{user_id}") # invalidate, don't update
Key insight: On write, delete the cache entry rather than updating it. The next read will repopulate from the source of truth.
5. Tag-Based Invalidation (Surrogate-Key)#
Assign tags to cached responses so you can invalidate groups at once.
Cache-Tag: product, product:42, category:electronics
When a product updates, purge everything tagged product:42. When a category changes, purge category:electronics.
CDN support: Fastly (Surrogate-Key), Cloudflare (Cache-Tag), Varnish (xkey).
# Fastly purge by surrogate key
curl -X POST "https://api.fastly.com/service/{id}/purge/product:42" \
-H "Fastly-Key: $TOKEN"
When to use: Pages or responses that aggregate multiple entities.
6. Versioned Keys#
Embed a version number in the cache key itself:
user:42:profile:v17
When data changes, increment the version. Old keys expire naturally via TTL. No explicit deletion required.
def get_user(user_id):
version = db.query("SELECT cache_version FROM users WHERE id = %s", user_id)
key = f"user:{user_id}:profile:v{version}"
cached = cache.get(key)
if cached:
return deserialize(cached)
# ...fetch and cache with this key
Pros: No race conditions, no explicit invalidation. Cons: Stale versions linger until TTL expires (memory overhead).
7. Purge APIs#
Expose an explicit endpoint to invalidate specific cache entries:
POST /api/admin/cache/purge
{ "patterns": ["user:42:*", "feed:home"] }
Useful for manual intervention and CI/CD pipelines (deploy triggers purge).
Guard it: Purge endpoints must be authenticated and rate-limited.
8. Stale-While-Revalidate#
Serve the stale cached value immediately while fetching a fresh copy in the background.
Cache-Control: max-age=60, stale-while-revalidate=300
This means: serve from cache for 60 seconds. After that, serve stale for up to 300 more seconds while revalidating asynchronously.
def get_with_swr(key, fetch_fn, ttl=60, swr_window=300):
entry = cache.get_with_metadata(key)
if entry and not entry.expired:
return entry.value
if entry and entry.age < ttl + swr_window:
# Serve stale, refresh in background
background_refresh(key, fetch_fn, ttl)
return entry.value
# Cache miss — blocking fetch
value = fetch_fn()
cache.set(key, value, ex=ttl)
return value
Pros: Users never wait for cache misses. Cons: Briefly serves stale data during revalidation.
9. The Dogpile Effect (Cache Stampede)#
When a popular cache key expires, hundreds of requests simultaneously hit the database.
Solutions#
Locking: Only one request recomputes; others wait or get stale data.
def get_with_lock(key, fetch_fn, ttl=300):
value = cache.get(key)
if value:
return value
lock_key = f"lock:{key}"
if cache.set(lock_key, "1", nx=True, ex=10): # acquire lock
value = fetch_fn()
cache.set(key, value, ex=ttl)
cache.delete(lock_key)
return value
else:
time.sleep(0.05)
return cache.get(key) # retry after brief wait
Probabilistic early expiration: Randomly refresh before TTL expires.
import random, math
def should_recompute(entry, beta=1.0):
remaining = entry.expiry - time.time()
jitter = beta * math.log(random.random()) * -1
return remaining < jitter
Pre-warming: Refresh popular keys on a schedule before they expire.
Choosing the Right Strategy#
| Strategy | Consistency | Complexity | Best For |
|---|---|---|---|
| TTL | Eventual | Low | General caching |
| Event-driven | Strong | High | Critical data |
| Write-through | Strong | Medium | Write-heavy |
| Cache-aside + delete | Eventual | Low | Read-heavy |
| Tag-based | On-demand | Medium | CDN / aggregated pages |
| Versioned keys | Strong | Low | Immutable-style data |
| Stale-while-revalidate | Eventual | Medium | User-facing latency |
Combining Strategies#
Production systems rarely use just one approach:
CDN layer -> Tag-based + stale-while-revalidate
App cache -> Cache-aside + event-driven invalidation
Session cache -> TTL (short, 15 min)
Config cache -> Write-through + TTL (long, 1h)
Key Takeaways#
- Delete on write is safer than update on write for cache-aside.
- TTL is your safety net even when using event-driven invalidation.
- Dogpile protection is essential for high-traffic keys.
- Tag-based invalidation simplifies CDN cache management dramatically.
- Stale-while-revalidate gives users instant responses without sacrificing freshness.
This is article #266 in the Codelit engineering series. Explore more backend architecture, system design, and performance guides at codelit.io.
Try it on Codelit
Chaos Mode
Simulate node failures and watch cascading impact across your architecture
AI Architecture Review
Get an AI audit covering security gaps, bottlenecks, and scaling risks
Related articles
AI Agent Tool Use Architecture: Function Calling, ReAct Loops & Structured Outputs
6 min read
AI searchAI-Powered Search Architecture: Semantic Search, Hybrid Search, and RAG
8 min read
AI safetyAI Safety Guardrails Architecture: Input Validation, Output Filtering, and Human-in-the-Loop
8 min read
Try these templates
Netflix Video Streaming Architecture
Global video streaming platform with adaptive bitrate, CDN distribution, and recommendation engine.
10 componentsSearch Engine Architecture
Web-scale search with crawling, indexing, ranking, and sub-second query serving.
8 componentsMultiplayer Game Backend
Real-time multiplayer game server with matchmaking, state sync, leaderboards, and anti-cheat.
8 componentsBuild this architecture
Generate an interactive architecture for Cache Invalidation Strategies in seconds.
Try it in Codelit →
Comments