API Gateway Caching Strategies: From Cache Headers to CDN Integration
API latency compounds across every microservice hop. A response that takes 200ms from the origin can be served in under 5ms from a cache layer. API gateway caching sits at the edge of your backend, intercepting repeated requests before they reach your services — reducing load, cutting costs, and improving perceived performance for every client.
HTTP Cache Headers Primer#
Before configuring gateway caching, understand the headers that control it. These headers travel with every HTTP response and instruct caches on what to store and for how long.
Cache-Control#
The most important header. It accepts multiple directives:
Cache-Control: public, max-age=300, stale-while-revalidate=60
| Directive | Meaning |
|---|---|
public | Any cache (CDN, gateway, browser) may store the response |
private | Only the end-user's browser may cache; shared caches must not |
max-age=N | Response is fresh for N seconds |
s-maxage=N | Overrides max-age for shared caches (gateway, CDN) |
no-cache | Cache may store, but must revalidate before serving |
no-store | Do not cache at all — sensitive data |
stale-while-revalidate=N | Serve stale content for N seconds while fetching a fresh copy in the background |
stale-if-error=N | Serve stale content for N seconds if the origin returns a 5xx |
ETag and Last-Modified#
These enable conditional requests. The gateway stores the response and the ETag. On the next request, the client sends If-None-Match: "etag-value". If the resource has not changed, the origin returns 304 Not Modified with no body — saving bandwidth and serialization time.
# Origin response
HTTP/1.1 200 OK
ETag: "a1b2c3"
Cache-Control: public, max-age=0, must-revalidate
# Client revalidation
GET /api/products/42
If-None-Match: "a1b2c3"
# Origin response (unchanged)
HTTP/1.1 304 Not Modified
Surrogate Headers#
Some caching proxies support Surrogate-Control for instructions that should not leak to the browser:
Surrogate-Control: max-age=3600
Cache-Control: no-cache
The gateway caches for one hour. The browser always revalidates. This is a powerful pattern for gateway-level caching without affecting client-side behavior.
Gateway-Level Caching#
An API gateway sits between clients and backend services. Adding a cache layer here provides a single control point for your entire API surface.
How It Works#
Client → API Gateway → Cache Layer → Backend Service
↓
Cache Hit? → Return cached response (skip backend)
Cache Miss? → Forward to backend, cache response, return
Benefits#
- Reduced backend load — Identical requests served from memory, not from your database.
- Consistent latency — Cache hits return in single-digit milliseconds regardless of backend complexity.
- Cost savings — Fewer compute cycles per request on your origin servers.
- Resilience — Stale-while-revalidate and stale-if-error keep your API available during origin outages.
What to Cache#
Not every endpoint should be cached. Use this decision framework:
| Endpoint Type | Cacheable? | TTL Guidance |
|---|---|---|
| Public catalog / product listings | Yes | 60–300s |
| User-specific data (profile, cart) | Conditional | Short TTL, vary on auth token |
| Search results | Yes | 30–120s |
| Authentication endpoints | No | Never cache tokens |
| Write operations (POST, PUT, DELETE) | No | Invalidate related caches |
| Webhooks / callbacks | No | Side-effect-driven |
Cache Keys#
The cache key determines whether two requests are considered identical. Get it wrong and you serve stale or incorrect data.
Default Key#
Most gateways default to method + URL + query string:
GET /api/products?category=electronics&page=2
Custom Key Components#
Add or remove components to fine-tune cache behavior:
cache_key = hash(
request.method,
request.path,
request.query_params_sorted,
request.headers["Accept-Language"],
request.headers["X-API-Version"]
)
Common Pitfalls#
- Query parameter ordering —
/api?a=1&b=2and/api?b=2&a=1should hit the same cache entry. Sort parameters before hashing. - Authorization leaking — Never include raw auth tokens in the cache key. Use a user-tier or role identifier instead.
- Ignoring Accept headers — A client requesting
application/xmlshould not receive a cachedapplication/jsonresponse.
Vary Headers#
The Vary header tells the cache which request headers affect the response content:
Vary: Accept-Encoding, Accept-Language
This instructs the cache to store separate entries for each unique combination of Accept-Encoding and Accept-Language.
Best Practices#
- Keep the
Varyset small. Each additional header multiplies cache entries. - Normalize header values before caching.
Accept-Encoding: gzipandAccept-Encoding: gzip, deflateshould map to the same variant if your gateway always serves gzip. - Avoid
Vary: *— it effectively disables caching. - Use
Vary: Authorizationonly when you genuinely serve different content per user. Prefer role-based or tier-based cache segmentation.
Cache Invalidation at the Gateway#
Cache invalidation is famously one of the two hard problems in computer science. At the gateway level, you have several strategies:
TTL-Based Expiry#
The simplest approach. Set a max-age and let entries expire naturally. Works well for data that changes on a predictable schedule.
Purge API#
Most gateways expose an API to purge specific cache entries:
# Purge a single URL
curl -X PURGE https://api.example.com/api/products/42
# Purge by tag / surrogate key
curl -X POST https://gateway.example.com/purge \
-H "Surrogate-Key: product-42 category-electronics"
Tag-based purging is powerful: when a product updates, purge every cached response tagged with that product ID — including listing pages that contain it.
Event-Driven Invalidation#
Connect your cache to a message bus. When a service writes to the database, it publishes an event. The gateway cache subscribes and purges relevant entries:
Product Service → "product.updated" event → Message Bus → Gateway Cache Purge
Stale-While-Revalidate#
Not strictly invalidation, but an effective strategy. Serve the stale response immediately and refresh asynchronously. The user sees fast responses; the cache stays reasonably fresh.
CDN Integration#
A CDN extends your gateway cache to edge locations worldwide. The architecture layers naturally:
Client → CDN Edge → API Gateway Cache → Backend Service
Configuration Pattern#
Set headers at the origin so the CDN and gateway each behave correctly:
Cache-Control: public, s-maxage=300, max-age=60, stale-while-revalidate=30
Surrogate-Control: max-age=600
Surrogate-Key: product-42 category-electronics
- The CDN caches for 600s (via Surrogate-Control).
- The gateway caches for 300s (via s-maxage).
- The browser caches for 60s (via max-age).
- Everyone serves stale for 30s during revalidation.
Cache Hierarchy#
Design your cache layers with decreasing TTLs as you move closer to the origin:
CDN Edge → TTL 600s (highest, closest to user)
API Gateway → TTL 300s
Application → TTL 60s (lowest, closest to database)
This ensures the CDN absorbs the most traffic while the gateway and application layers maintain fresher data.
Tools and Implementations#
Varnish#
A high-performance HTTP accelerator designed specifically for caching:
sub vcl_recv {
if (req.method == "PURGE") {
if (req.http.X-Purge-Token == "secret") {
return (purge);
}
return (synth(403, "Forbidden"));
}
}
sub vcl_backend_response {
if (beresp.http.Cache-Control) {
set beresp.ttl = 300s;
set beresp.grace = 60s;
}
}
Varnish supports VCL (Varnish Configuration Language) for fine-grained control, surrogate keys for tag-based purging, and grace mode for stale-while-revalidate behavior.
NGINX Proxy Cache#
NGINX can act as both API gateway and cache layer:
proxy_cache_path /var/cache/nginx levels=1:2
keys_zone=api_cache:50m max_size=1g
inactive=10m use_temp_path=off;
server {
location /api/ {
proxy_cache api_cache;
proxy_cache_key "$request_method$uri$is_args$args";
proxy_cache_valid 200 5m;
proxy_cache_valid 404 1m;
proxy_cache_use_stale error timeout updating;
proxy_cache_background_update on;
add_header X-Cache-Status $upstream_cache_status;
proxy_pass http://backend;
}
}
The X-Cache-Status header returns HIT, MISS, or STALE — invaluable for debugging.
Kong Cache Plugin#
Kong's built-in proxy cache plugin is configured declaratively:
plugins:
- name: proxy-cache
config:
strategy: memory
content_type:
- application/json
cache_ttl: 300
cache_control: true
vary_headers:
- Accept-Encoding
memory:
dictionary_name: kong_cache
Kong also supports Redis-backed caching for multi-node deployments where cache entries must be shared across gateway instances.
Cache Observability#
You cannot manage what you cannot measure. Instrument your cache layer:
- Hit rate — Target above 80% for cacheable endpoints. Low hit rates indicate poor key design or overly short TTLs.
- Latency percentiles — Compare p50/p99 for cache hits vs misses.
- Eviction rate — High evictions mean your cache is undersized.
- Stale-serve rate — How often stale-while-revalidate kicks in.
- Purge frequency — Excessive purging negates the value of caching.
Conclusion#
API gateway caching is one of the highest-leverage performance optimizations available. By mastering HTTP cache headers, designing precise cache keys, implementing targeted invalidation, and layering CDN and gateway caches, you can dramatically reduce latency and backend load while maintaining data freshness. Start with conservative TTLs, measure your hit rates, and expand caching as you gain confidence.
This is article #364 on Codelit.io — your deep-dive resource for system design, backend engineering, and infrastructure patterns. Explore more at codelit.io.
Try it on Codelit
AI Architecture Review
Get an AI audit covering security gaps, bottlenecks, and scaling risks
Related articles
Try these templates
API Gateway Platform
Kong/AWS API Gateway-like platform with routing, auth, rate limiting, transformation, and developer portal.
8 componentsGraphQL API Gateway
Federated GraphQL gateway aggregating multiple microservice schemas with caching, auth, and rate limiting.
10 componentsMicroservices with API Gateway
Microservices architecture with API gateway, service discovery, circuit breakers, and distributed tracing.
10 componentsBuild this architecture
Generate an interactive architecture for API Gateway Caching Strategies in seconds.
Try it in Codelit →
Comments