API cachingcache invalidationHTTP cache headersAPI gatewayCDNVarnishNGINXKongsystem designperformance

API Gateway Caching Strategies: From Cache Headers to CDN Integration

March 29, 2026 8 min readBy Codelit Team Discussion

API latency compounds across every microservice hop. A response that takes 200ms from the origin can be served in under 5ms from a cache layer. API gateway caching sits at the edge of your backend, intercepting repeated requests before they reach your services — reducing load, cutting costs, and improving perceived performance for every client.

HTTP Cache Headers Primer#

Before configuring gateway caching, understand the headers that control it. These headers travel with every HTTP response and instruct caches on what to store and for how long.

Cache-Control#

The most important header. It accepts multiple directives:

Cache-Control: public, max-age=300, stale-while-revalidate=60

Directive	Meaning
`public`	Any cache (CDN, gateway, browser) may store the response
`private`	Only the end-user's browser may cache; shared caches must not
`max-age=N`	Response is fresh for N seconds
`s-maxage=N`	Overrides `max-age` for shared caches (gateway, CDN)
`no-cache`	Cache may store, but must revalidate before serving
`no-store`	Do not cache at all — sensitive data
`stale-while-revalidate=N`	Serve stale content for N seconds while fetching a fresh copy in the background
`stale-if-error=N`	Serve stale content for N seconds if the origin returns a 5xx

ETag and Last-Modified#

These enable conditional requests. The gateway stores the response and the ETag. On the next request, the client sends If-None-Match: "etag-value". If the resource has not changed, the origin returns 304 Not Modified with no body — saving bandwidth and serialization time.

# Origin response
HTTP/1.1 200 OK
ETag: "a1b2c3"
Cache-Control: public, max-age=0, must-revalidate

# Client revalidation
GET /api/products/42
If-None-Match: "a1b2c3"

# Origin response (unchanged)
HTTP/1.1 304 Not Modified

Surrogate Headers#

Some caching proxies support Surrogate-Control for instructions that should not leak to the browser:

Surrogate-Control: max-age=3600
Cache-Control: no-cache

The gateway caches for one hour. The browser always revalidates. This is a powerful pattern for gateway-level caching without affecting client-side behavior.

Gateway-Level Caching#

An API gateway sits between clients and backend services. Adding a cache layer here provides a single control point for your entire API surface.

How It Works#

Client → API Gateway → Cache Layer → Backend Service
                ↓
         Cache Hit? → Return cached response (skip backend)
         Cache Miss? → Forward to backend, cache response, return

Benefits#

Reduced backend load — Identical requests served from memory, not from your database.
Consistent latency — Cache hits return in single-digit milliseconds regardless of backend complexity.
Cost savings — Fewer compute cycles per request on your origin servers.
Resilience — Stale-while-revalidate and stale-if-error keep your API available during origin outages.

What to Cache#

Not every endpoint should be cached. Use this decision framework:

Endpoint Type	Cacheable?	TTL Guidance
Public catalog / product listings	Yes	60–300s
User-specific data (profile, cart)	Conditional	Short TTL, vary on auth token
Search results	Yes	30–120s
Authentication endpoints	No	Never cache tokens
Write operations (POST, PUT, DELETE)	No	Invalidate related caches
Webhooks / callbacks	No	Side-effect-driven

Cache Keys#

The cache key determines whether two requests are considered identical. Get it wrong and you serve stale or incorrect data.

Default Key#

Most gateways default to method + URL + query string:

GET /api/products?category=electronics&page=2

Custom Key Components#

Add or remove components to fine-tune cache behavior:

cache_key = hash(
    request.method,
    request.path,
    request.query_params_sorted,
    request.headers["Accept-Language"],
    request.headers["X-API-Version"]
)

Common Pitfalls#

Query parameter ordering — /api?a=1&b=2 and /api?b=2&a=1 should hit the same cache entry. Sort parameters before hashing.
Authorization leaking — Never include raw auth tokens in the cache key. Use a user-tier or role identifier instead.
Ignoring Accept headers — A client requesting application/xml should not receive a cached application/json response.

Vary Headers#

The Vary header tells the cache which request headers affect the response content:

Vary: Accept-Encoding, Accept-Language

This instructs the cache to store separate entries for each unique combination of Accept-Encoding and Accept-Language.

Best Practices#

Keep the Vary set small. Each additional header multiplies cache entries.
Normalize header values before caching. Accept-Encoding: gzip and Accept-Encoding: gzip, deflate should map to the same variant if your gateway always serves gzip.
Avoid Vary: * — it effectively disables caching.
Use Vary: Authorization only when you genuinely serve different content per user. Prefer role-based or tier-based cache segmentation.

Cache Invalidation at the Gateway#

Cache invalidation is famously one of the two hard problems in computer science. At the gateway level, you have several strategies:

TTL-Based Expiry#

The simplest approach. Set a max-age and let entries expire naturally. Works well for data that changes on a predictable schedule.

Purge API#

Most gateways expose an API to purge specific cache entries:

# Purge a single URL
curl -X PURGE https://api.example.com/api/products/42

# Purge by tag / surrogate key
curl -X POST https://gateway.example.com/purge \
  -H "Surrogate-Key: product-42 category-electronics"

Tag-based purging is powerful: when a product updates, purge every cached response tagged with that product ID — including listing pages that contain it.

Event-Driven Invalidation#

Connect your cache to a message bus. When a service writes to the database, it publishes an event. The gateway cache subscribes and purges relevant entries:

Product Service → "product.updated" event → Message Bus → Gateway Cache Purge

Stale-While-Revalidate#

Not strictly invalidation, but an effective strategy. Serve the stale response immediately and refresh asynchronously. The user sees fast responses; the cache stays reasonably fresh.

CDN Integration#

A CDN extends your gateway cache to edge locations worldwide. The architecture layers naturally:

Client → CDN Edge → API Gateway Cache → Backend Service

Configuration Pattern#

Set headers at the origin so the CDN and gateway each behave correctly:

Cache-Control: public, s-maxage=300, max-age=60, stale-while-revalidate=30
Surrogate-Control: max-age=600
Surrogate-Key: product-42 category-electronics

The CDN caches for 600s (via Surrogate-Control).
The gateway caches for 300s (via s-maxage).
The browser caches for 60s (via max-age).
Everyone serves stale for 30s during revalidation.

Cache Hierarchy#

Design your cache layers with decreasing TTLs as you move closer to the origin:

CDN Edge      → TTL 600s (highest, closest to user)
API Gateway   → TTL 300s
Application   → TTL 60s (lowest, closest to database)

This ensures the CDN absorbs the most traffic while the gateway and application layers maintain fresher data.

Tools and Implementations#

Varnish#

A high-performance HTTP accelerator designed specifically for caching:

sub vcl_recv {
    if (req.method == "PURGE") {
        if (req.http.X-Purge-Token == "secret") {
            return (purge);
        }
        return (synth(403, "Forbidden"));
    }
}

sub vcl_backend_response {
    if (beresp.http.Cache-Control) {
        set beresp.ttl = 300s;
        set beresp.grace = 60s;
    }
}

Varnish supports VCL (Varnish Configuration Language) for fine-grained control, surrogate keys for tag-based purging, and grace mode for stale-while-revalidate behavior.

NGINX Proxy Cache#

NGINX can act as both API gateway and cache layer:

proxy_cache_path /var/cache/nginx levels=1:2
    keys_zone=api_cache:50m max_size=1g
    inactive=10m use_temp_path=off;

server {
    location /api/ {
        proxy_cache api_cache;
        proxy_cache_key "$request_method$uri$is_args$args";
        proxy_cache_valid 200 5m;
        proxy_cache_valid 404 1m;
        proxy_cache_use_stale error timeout updating;
        proxy_cache_background_update on;
        add_header X-Cache-Status $upstream_cache_status;
        proxy_pass http://backend;
    }
}

The X-Cache-Status header returns HIT, MISS, or STALE — invaluable for debugging.

Kong Cache Plugin#

Kong's built-in proxy cache plugin is configured declaratively:

plugins:
  - name: proxy-cache
    config:
      strategy: memory
      content_type:
        - application/json
      cache_ttl: 300
      cache_control: true
      vary_headers:
        - Accept-Encoding
      memory:
        dictionary_name: kong_cache

Kong also supports Redis-backed caching for multi-node deployments where cache entries must be shared across gateway instances.

Cache Observability#

You cannot manage what you cannot measure. Instrument your cache layer:

Hit rate — Target above 80% for cacheable endpoints. Low hit rates indicate poor key design or overly short TTLs.
Latency percentiles — Compare p50/p99 for cache hits vs misses.
Eviction rate — High evictions mean your cache is undersized.
Stale-serve rate — How often stale-while-revalidate kicks in.
Purge frequency — Excessive purging negates the value of caching.

Conclusion#

API gateway caching is one of the highest-leverage performance optimizations available. By mastering HTTP cache headers, designing precise cache keys, implementing targeted invalidation, and layering CDN and gateway caches, you can dramatically reduce latency and backend load while maintaining data freshness. Start with conservative TTLs, measure your hit rates, and expand caching as you gain confidence.

This is article #364 on Codelit.io — your deep-dive resource for system design, backend engineering, and infrastructure patterns. Explore more at codelit.io.

Try it on Codelit

AI Architecture Review

Get an AI audit covering security gaps, bottlenecks, and scaling risks

Build this architecture →

Comments

AI search

AI-Powered Search Architecture: Semantic Search, Hybrid Search, and RAG

8 min read

AI safety

AI Safety Guardrails Architecture: Input Validation, Output Filtering, and Human-in-the-Loop

8 min read

API design

API Backward Compatibility: Ship Changes Without Breaking Consumers

6 min read

Try these templates

API Gateway Platform

Kong/AWS API Gateway-like platform with routing, auth, rate limiting, transformation, and developer portal.

8 components

GraphQL API Gateway

Federated GraphQL gateway aggregating multiple microservice schemas with caching, auth, and rate limiting.

10 components

Microservices with API Gateway

Microservices architecture with API gateway, service discovery, circuit breakers, and distributed tracing.

10 components

Build this architecture

Generate an interactive architecture for API Gateway Caching Strategies in seconds.

Try it in Codelit →

API cachingcache invalidationHTTP cache headersAPI gatewayCDNVarnishNGINXKongsystem designperformance

API Gateway Caching Strategies: From Cache Headers to CDN Integration

March 29, 2026 8 min readBy Codelit Team Discussion

HTTP Cache Headers Primer#

Before configuring gateway caching, understand the headers that control it. These headers travel with every HTTP response and instruct caches on what to store and for how long.

Cache-Control#

The most important header. It accepts multiple directives:

Cache-Control: public, max-age=300, stale-while-revalidate=60

Directive	Meaning
`public`	Any cache (CDN, gateway, browser) may store the response
`private`	Only the end-user's browser may cache; shared caches must not
`max-age=N`	Response is fresh for N seconds
`s-maxage=N`	Overrides `max-age` for shared caches (gateway, CDN)
`no-cache`	Cache may store, but must revalidate before serving
`no-store`	Do not cache at all — sensitive data
`stale-while-revalidate=N`	Serve stale content for N seconds while fetching a fresh copy in the background
`stale-if-error=N`	Serve stale content for N seconds if the origin returns a 5xx

ETag and Last-Modified#

# Origin response
HTTP/1.1 200 OK
ETag: "a1b2c3"
Cache-Control: public, max-age=0, must-revalidate

# Client revalidation
GET /api/products/42
If-None-Match: "a1b2c3"

# Origin response (unchanged)
HTTP/1.1 304 Not Modified

Surrogate Headers#

Some caching proxies support Surrogate-Control for instructions that should not leak to the browser:

Surrogate-Control: max-age=3600
Cache-Control: no-cache

The gateway caches for one hour. The browser always revalidates. This is a powerful pattern for gateway-level caching without affecting client-side behavior.

Gateway-Level Caching#

An API gateway sits between clients and backend services. Adding a cache layer here provides a single control point for your entire API surface.

How It Works#

Client → API Gateway → Cache Layer → Backend Service
                ↓
         Cache Hit? → Return cached response (skip backend)
         Cache Miss? → Forward to backend, cache response, return

Benefits#

Reduced backend load — Identical requests served from memory, not from your database.
Consistent latency — Cache hits return in single-digit milliseconds regardless of backend complexity.
Cost savings — Fewer compute cycles per request on your origin servers.
Resilience — Stale-while-revalidate and stale-if-error keep your API available during origin outages.

What to Cache#

Not every endpoint should be cached. Use this decision framework:

Endpoint Type	Cacheable?	TTL Guidance
Public catalog / product listings	Yes	60–300s
User-specific data (profile, cart)	Conditional	Short TTL, vary on auth token
Search results	Yes	30–120s
Authentication endpoints	No	Never cache tokens
Write operations (POST, PUT, DELETE)	No	Invalidate related caches
Webhooks / callbacks	No	Side-effect-driven

Cache Keys#

The cache key determines whether two requests are considered identical. Get it wrong and you serve stale or incorrect data.

Default Key#

Most gateways default to method + URL + query string:

GET /api/products?category=electronics&page=2

Custom Key Components#

Add or remove components to fine-tune cache behavior:

cache_key = hash(
    request.method,
    request.path,
    request.query_params_sorted,
    request.headers["Accept-Language"],
    request.headers["X-API-Version"]
)

Common Pitfalls#

Query parameter ordering — /api?a=1&b=2 and /api?b=2&a=1 should hit the same cache entry. Sort parameters before hashing.
Authorization leaking — Never include raw auth tokens in the cache key. Use a user-tier or role identifier instead.
Ignoring Accept headers — A client requesting application/xml should not receive a cached application/json response.

Vary Headers#

The Vary header tells the cache which request headers affect the response content:

Vary: Accept-Encoding, Accept-Language

This instructs the cache to store separate entries for each unique combination of Accept-Encoding and Accept-Language.

Best Practices#

Keep the Vary set small. Each additional header multiplies cache entries.
Normalize header values before caching. Accept-Encoding: gzip and Accept-Encoding: gzip, deflate should map to the same variant if your gateway always serves gzip.
Avoid Vary: * — it effectively disables caching.
Use Vary: Authorization only when you genuinely serve different content per user. Prefer role-based or tier-based cache segmentation.

Cache Invalidation at the Gateway#

Cache invalidation is famously one of the two hard problems in computer science. At the gateway level, you have several strategies:

TTL-Based Expiry#

The simplest approach. Set a max-age and let entries expire naturally. Works well for data that changes on a predictable schedule.

Purge API#

Most gateways expose an API to purge specific cache entries:

# Purge a single URL
curl -X PURGE https://api.example.com/api/products/42

# Purge by tag / surrogate key
curl -X POST https://gateway.example.com/purge \
  -H "Surrogate-Key: product-42 category-electronics"

Tag-based purging is powerful: when a product updates, purge every cached response tagged with that product ID — including listing pages that contain it.

Event-Driven Invalidation#

Connect your cache to a message bus. When a service writes to the database, it publishes an event. The gateway cache subscribes and purges relevant entries:

Product Service → "product.updated" event → Message Bus → Gateway Cache Purge

Stale-While-Revalidate#

Not strictly invalidation, but an effective strategy. Serve the stale response immediately and refresh asynchronously. The user sees fast responses; the cache stays reasonably fresh.

CDN Integration#

A CDN extends your gateway cache to edge locations worldwide. The architecture layers naturally:

Client → CDN Edge → API Gateway Cache → Backend Service

Configuration Pattern#

Set headers at the origin so the CDN and gateway each behave correctly:

Cache-Control: public, s-maxage=300, max-age=60, stale-while-revalidate=30
Surrogate-Control: max-age=600
Surrogate-Key: product-42 category-electronics

The CDN caches for 600s (via Surrogate-Control).
The gateway caches for 300s (via s-maxage).
The browser caches for 60s (via max-age).
Everyone serves stale for 30s during revalidation.

Cache Hierarchy#

Design your cache layers with decreasing TTLs as you move closer to the origin:

CDN Edge      → TTL 600s (highest, closest to user)
API Gateway   → TTL 300s
Application   → TTL 60s (lowest, closest to database)

This ensures the CDN absorbs the most traffic while the gateway and application layers maintain fresher data.

Tools and Implementations#

Varnish#

A high-performance HTTP accelerator designed specifically for caching:

sub vcl_recv {
    if (req.method == "PURGE") {
        if (req.http.X-Purge-Token == "secret") {
            return (purge);
        }
        return (synth(403, "Forbidden"));
    }
}

sub vcl_backend_response {
    if (beresp.http.Cache-Control) {
        set beresp.ttl = 300s;
        set beresp.grace = 60s;
    }
}

Varnish supports VCL (Varnish Configuration Language) for fine-grained control, surrogate keys for tag-based purging, and grace mode for stale-while-revalidate behavior.

NGINX Proxy Cache#

NGINX can act as both API gateway and cache layer:

proxy_cache_path /var/cache/nginx levels=1:2
    keys_zone=api_cache:50m max_size=1g
    inactive=10m use_temp_path=off;

server {
    location /api/ {
        proxy_cache api_cache;
        proxy_cache_key "$request_method$uri$is_args$args";
        proxy_cache_valid 200 5m;
        proxy_cache_valid 404 1m;
        proxy_cache_use_stale error timeout updating;
        proxy_cache_background_update on;
        add_header X-Cache-Status $upstream_cache_status;
        proxy_pass http://backend;
    }
}

The X-Cache-Status header returns HIT, MISS, or STALE — invaluable for debugging.

Kong Cache Plugin#

Kong's built-in proxy cache plugin is configured declaratively:

plugins:
  - name: proxy-cache
    config:
      strategy: memory
      content_type:
        - application/json
      cache_ttl: 300
      cache_control: true
      vary_headers:
        - Accept-Encoding
      memory:
        dictionary_name: kong_cache

Kong also supports Redis-backed caching for multi-node deployments where cache entries must be shared across gateway instances.

Cache Observability#

You cannot manage what you cannot measure. Instrument your cache layer:

Hit rate — Target above 80% for cacheable endpoints. Low hit rates indicate poor key design or overly short TTLs.
Latency percentiles — Compare p50/p99 for cache hits vs misses.
Eviction rate — High evictions mean your cache is undersized.
Stale-serve rate — How often stale-while-revalidate kicks in.
Purge frequency — Excessive purging negates the value of caching.

Conclusion#

This is article #364 on Codelit.io — your deep-dive resource for system design, backend engineering, and infrastructure patterns. Explore more at codelit.io.

Try it on Codelit

AI Architecture Review

Get an AI audit covering security gaps, bottlenecks, and scaling risks

Build this architecture →

Comments

AI search

AI-Powered Search Architecture: Semantic Search, Hybrid Search, and RAG

8 min read

AI safety

AI Safety Guardrails Architecture: Input Validation, Output Filtering, and Human-in-the-Loop

8 min read

API design

API Backward Compatibility: Ship Changes Without Breaking Consumers

6 min read

Build this architecture

Generate an interactive architecture for API Gateway Caching Strategies in seconds.

Try it in Codelit →

API Gateway Caching Strategies: From Cache Headers to CDN Integration

HTTP Cache Headers Primer#

Cache-Control#

ETag and Last-Modified#

Surrogate Headers#

Gateway-Level Caching#

How It Works#

Benefits#

What to Cache#

Cache Keys#

Default Key#

Custom Key Components#

Common Pitfalls#

Vary Headers#

Best Practices#

Cache Invalidation at the Gateway#

TTL-Based Expiry#

Purge API#

Event-Driven Invalidation#

Stale-While-Revalidate#

CDN Integration#

Configuration Pattern#

Cache Hierarchy#

Tools and Implementations#

Varnish#

NGINX Proxy Cache#

Kong Cache Plugin#

Cache Observability#

Conclusion#

Comments

Related articles

AI-Powered Search Architecture: Semantic Search, Hybrid Search, and RAG

AI Safety Guardrails Architecture: Input Validation, Output Filtering, and Human-in-the-Loop

API Backward Compatibility: Ship Changes Without Breaking Consumers

Try these templates

API Gateway Platform

GraphQL API Gateway

Microservices with API Gateway

Build this architecture

API Gateway Caching Strategies: From Cache Headers to CDN Integration

HTTP Cache Headers Primer#

Cache-Control#

ETag and Last-Modified#

Surrogate Headers#

Gateway-Level Caching#

How It Works#

Benefits#

What to Cache#

Cache Keys#

Default Key#

Custom Key Components#

Common Pitfalls#

Vary Headers#

Best Practices#

Cache Invalidation at the Gateway#

TTL-Based Expiry#

Purge API#

Event-Driven Invalidation#

Stale-While-Revalidate#

CDN Integration#

Configuration Pattern#

Cache Hierarchy#

Tools and Implementations#

Varnish#

NGINX Proxy Cache#

Kong Cache Plugin#

Cache Observability#

Conclusion#

Comments

Related articles

AI-Powered Search Architecture: Semantic Search, Hybrid Search, and RAG

AI Safety Guardrails Architecture: Input Validation, Output Filtering, and Human-in-the-Loop

API Backward Compatibility: Ship Changes Without Breaking Consumers

Try these templates

API Gateway Platform

GraphQL API Gateway

Microservices with API Gateway

Build this architecture