API gatewayrequest routingrate limitingcircuit breakerload sheddingKongTykAWS API Gatewaysystem designmicroservices

API Gateway Patterns: A Deep Dive into Production Architecture

March 29, 2026 7 min readBy Codelit Team Discussion

An API gateway sits between clients and your backend services. It absorbs cross-cutting concerns — routing, security, throttling, transformation — so individual services stay focused on business logic. In a microservices architecture, the gateway is often the single entry point for all external traffic.

Request Routing#

The gateway's most fundamental job is routing requests to the correct backend service based on path, headers, method, or query parameters.

Client ──▶ /api/orders/*   ──▶ Order Service
       ──▶ /api/users/*    ──▶ User Service
       ──▶ /api/payments/* ──▶ Payment Service

Path-Based Routing#

The simplest model. Each URL prefix maps to a service:

# Kong declarative config
services:
  - name: order-service
    url: http://orders.internal:8080
    routes:
      - paths: ["/api/orders"]
  - name: user-service
    url: http://users.internal:8080
    routes:
      - paths: ["/api/users"]

Header-Based and Weighted Routing#

For canary deployments or A/B testing, route a percentage of traffic — or traffic with specific headers — to a different backend version:

┌────────────┐
│  Gateway   │──── 95% ────▶ Service v1
│            │──── 5%  ────▶ Service v2
└────────────┘

This lets you validate new releases under real traffic before a full rollout.

Rate Limiting#

Rate limiting protects backends from abuse, ensures fair usage, and prevents cascading failures during traffic spikes.

Common Algorithms#

Algorithm	Description	Best For
Token bucket	Tokens refill at a fixed rate; each request consumes one	Bursty traffic with sustained limits
Sliding window	Counts requests in a rolling time window	Smooth, predictable throttling
Fixed window	Counts requests in discrete time intervals	Simple implementation
Leaky bucket	Processes requests at a constant rate, queuing excess	Smoothing request flow

Multi-Tier Limits#

Production gateways often enforce multiple tiers simultaneously:

Global limit — protect the entire platform (e.g., 100K req/s).
Per-tenant limit — enforce plan-based quotas (free: 100 req/min, pro: 10K req/min).
Per-endpoint limit — protect expensive operations (e.g., search: 20 req/s per user).

Rate limit state is typically stored in Redis for sub-millisecond lookups across gateway replicas.

Authentication and Authorization#

The gateway centralizes identity verification so services do not each implement their own auth logic.

┌────────┐    ┌──────────┐    ┌────────────┐    ┌─────────┐
│ Client │───▶│ Gateway  │───▶│ Auth Check │───▶│ Service │
│        │    │          │    │ (JWT/OAuth) │    │         │
└────────┘    └──────────┘    └────────────┘    └─────────┘

Common patterns:

JWT validation — the gateway verifies the token signature and expiry, then forwards claims as headers.
OAuth 2.0 introspection — the gateway calls the authorization server to validate opaque tokens.
API key lookup — the gateway checks the key against a store and attaches the associated tenant context.
mTLS termination — the gateway verifies client certificates and passes identity downstream.

After authentication, the gateway can enforce coarse-grained authorization (does this tenant have access to this service?) while leaving fine-grained authorization to the service itself.

Request Transformation#

Gateways modify requests and responses in flight to decouple clients from backend contracts.

Common Transformations#

Path rewriting — strip the /api prefix before forwarding.
Header injection — add correlation IDs, tenant context, or trace headers.
Body mapping — convert between JSON and XML, or reshape payloads for legacy backends.
Protocol translation — accept REST from clients and forward as gRPC to internal services.
Request aggregation — combine multiple backend calls into a single client response (Backend-for-Frontend pattern).

Client: GET /api/dashboard
  └─▶ Gateway fans out:
        ├─▶ GET /orders/recent
        ├─▶ GET /metrics/summary
        └─▶ GET /notifications/unread
  └─▶ Gateway merges responses into one payload

Response Caching#

Caching at the gateway layer reduces backend load and improves latency for repeated queries.

Cache Strategies#

Time-based TTL — cache responses for a fixed duration (e.g., 60 seconds).
Cache-Control aware — respect Cache-Control, ETag, and Last-Modified headers from backends.
Vary-key caching — cache different variants based on Accept, Authorization, or custom headers.
Stale-while-revalidate — serve stale content immediately while refreshing in the background.

Key rule: never cache authenticated, user-specific responses in a shared cache without proper vary keys. Leaking one user's data to another is a critical security incident.

Circuit Breaking#

When a backend service becomes unhealthy, the gateway should stop sending traffic to it rather than letting requests pile up and cascade failures.

┌─────────┐     ┌──────────┐     ┌─────────────┐
│ Gateway │────▶│ Circuit  │────▶│  Backend    │
│         │     │ Breaker  │     │  Service    │
└─────────┘     └──────────┘     └─────────────┘

States:
  CLOSED  ──▶ requests flow normally
  OPEN    ──▶ requests fail fast (503)
  HALF-OPEN ──▶ allow a probe request to test recovery

Configuration typically includes:

Failure threshold — number of failures before opening the circuit (e.g., 5 in 10 seconds).
Open duration — how long to wait before probing (e.g., 30 seconds).
Success threshold — number of successful probes before closing the circuit.

Load Shedding#

Load shedding goes further than rate limiting: when the system is under extreme pressure, the gateway deliberately drops low-priority requests to preserve capacity for critical ones.

Priority Classification#

Priority 1 (Critical):  POST /api/payments/charge
Priority 2 (Important): GET  /api/orders/{id}
Priority 3 (Best-effort): GET /api/recommendations

When CPU or queue depth crosses a threshold, the gateway sheds Priority 3 first, then Priority 2, keeping Priority 1 alive as long as possible. This is how systems survive traffic 10x beyond capacity.

Implementation Signals#

Request queue depth — if the queue exceeds a threshold, reject new low-priority requests.
Backend latency — if p99 latency spikes, begin shedding.
CPU / memory — system-level signals trigger adaptive shedding.

Tooling#

Kong#

Kong is an open-source, Lua/OpenResty-based gateway with a rich plugin ecosystem. It supports declarative configuration, a database-backed admin API, and Kubernetes-native ingress via the Kong Ingress Controller.

Key plugins: rate-limiting, jwt, oauth2, request-transformer, response-ratelimiting, proxy-cache, circuit-breaker (via custom plugins or Kong Gateway Enterprise).

Tyk#

Tyk is a Go-based gateway with built-in analytics, a developer portal, and GraphQL support. It offers rate limiting, circuit breaking, request/response transformation, and API versioning out of the box.

AWS API Gateway#

AWS API Gateway provides a fully managed solution with two flavors:

HTTP API — lightweight, low-latency, best for proxying to Lambda or HTTP backends.
REST API — full-featured with request validation, WAF integration, caching, and usage plans.

Both integrate with AWS Cognito for authentication, CloudWatch for monitoring, and X-Ray for tracing.

Other Notable Tools#

Envoy — high-performance L4/L7 proxy, often used as a data plane for service meshes (Istio).
NGINX — widely used as a reverse proxy with API gateway capabilities via OpenResty or NGINX Plus.
Traefik — cloud-native edge router with automatic service discovery.
Ambassador (Emissary-Ingress) — Kubernetes-native gateway built on Envoy.

Key Takeaways#

An API gateway is not just a reverse proxy — it is the control plane for your external API surface. Routing, security, throttling, transformation, caching, circuit breaking, and load shedding are all cross-cutting concerns that belong at the edge, not scattered across services.

Choose a gateway that matches your operational maturity: managed services like AWS API Gateway for simplicity, or self-hosted solutions like Kong and Envoy for maximum control.

Want to sharpen your system design skills? Explore 323 more articles on Codelit.dev covering distributed systems, architecture patterns, and real-world engineering.

Try it on Codelit

Chaos Mode

Simulate node failures and watch cascading impact across your architecture

Build this architecture →

Comments

AI search

AI-Powered Search Architecture: Semantic Search, Hybrid Search, and RAG

8 min read

AI safety

AI Safety Guardrails Architecture: Input Validation, Output Filtering, and Human-in-the-Loop

8 min read

API design

API Backward Compatibility: Ship Changes Without Breaking Consumers

6 min read

Try these templates

Scalable SaaS Application

Modern SaaS with microservices, event-driven processing, and multi-tenant architecture.

10 components

Netflix Video Streaming Architecture

Global video streaming platform with adaptive bitrate, CDN distribution, and recommendation engine.

10 components

Search Engine Architecture

Web-scale search with crawling, indexing, ranking, and sub-second query serving.

8 components

Build this architecture

Generate an interactive API Gateway Patterns in seconds.

Try it in Codelit →

API gatewayrequest routingrate limitingcircuit breakerload sheddingKongTykAWS API Gatewaysystem designmicroservices

API Gateway Patterns: A Deep Dive into Production Architecture

March 29, 2026 7 min readBy Codelit Team Discussion

Request Routing#

The gateway's most fundamental job is routing requests to the correct backend service based on path, headers, method, or query parameters.

Client ──▶ /api/orders/*   ──▶ Order Service
       ──▶ /api/users/*    ──▶ User Service
       ──▶ /api/payments/* ──▶ Payment Service

Path-Based Routing#

The simplest model. Each URL prefix maps to a service:

# Kong declarative config
services:
  - name: order-service
    url: http://orders.internal:8080
    routes:
      - paths: ["/api/orders"]
  - name: user-service
    url: http://users.internal:8080
    routes:
      - paths: ["/api/users"]

Header-Based and Weighted Routing#

For canary deployments or A/B testing, route a percentage of traffic — or traffic with specific headers — to a different backend version:

┌────────────┐
│  Gateway   │──── 95% ────▶ Service v1
│            │──── 5%  ────▶ Service v2
└────────────┘

This lets you validate new releases under real traffic before a full rollout.

Rate Limiting#

Rate limiting protects backends from abuse, ensures fair usage, and prevents cascading failures during traffic spikes.

Common Algorithms#

Algorithm	Description	Best For
Token bucket	Tokens refill at a fixed rate; each request consumes one	Bursty traffic with sustained limits
Sliding window	Counts requests in a rolling time window	Smooth, predictable throttling
Fixed window	Counts requests in discrete time intervals	Simple implementation
Leaky bucket	Processes requests at a constant rate, queuing excess	Smoothing request flow

Multi-Tier Limits#

Production gateways often enforce multiple tiers simultaneously:

Global limit — protect the entire platform (e.g., 100K req/s).
Per-tenant limit — enforce plan-based quotas (free: 100 req/min, pro: 10K req/min).
Per-endpoint limit — protect expensive operations (e.g., search: 20 req/s per user).

Rate limit state is typically stored in Redis for sub-millisecond lookups across gateway replicas.

Authentication and Authorization#

The gateway centralizes identity verification so services do not each implement their own auth logic.

┌────────┐    ┌──────────┐    ┌────────────┐    ┌─────────┐
│ Client │───▶│ Gateway  │───▶│ Auth Check │───▶│ Service │
│        │    │          │    │ (JWT/OAuth) │    │         │
└────────┘    └──────────┘    └────────────┘    └─────────┘

Common patterns:

JWT validation — the gateway verifies the token signature and expiry, then forwards claims as headers.
OAuth 2.0 introspection — the gateway calls the authorization server to validate opaque tokens.
API key lookup — the gateway checks the key against a store and attaches the associated tenant context.
mTLS termination — the gateway verifies client certificates and passes identity downstream.

After authentication, the gateway can enforce coarse-grained authorization (does this tenant have access to this service?) while leaving fine-grained authorization to the service itself.

Request Transformation#

Gateways modify requests and responses in flight to decouple clients from backend contracts.

Common Transformations#

Path rewriting — strip the /api prefix before forwarding.
Header injection — add correlation IDs, tenant context, or trace headers.
Body mapping — convert between JSON and XML, or reshape payloads for legacy backends.
Protocol translation — accept REST from clients and forward as gRPC to internal services.
Request aggregation — combine multiple backend calls into a single client response (Backend-for-Frontend pattern).

Client: GET /api/dashboard
  └─▶ Gateway fans out:
        ├─▶ GET /orders/recent
        ├─▶ GET /metrics/summary
        └─▶ GET /notifications/unread
  └─▶ Gateway merges responses into one payload

Response Caching#

Caching at the gateway layer reduces backend load and improves latency for repeated queries.

Cache Strategies#

Time-based TTL — cache responses for a fixed duration (e.g., 60 seconds).
Cache-Control aware — respect Cache-Control, ETag, and Last-Modified headers from backends.
Vary-key caching — cache different variants based on Accept, Authorization, or custom headers.
Stale-while-revalidate — serve stale content immediately while refreshing in the background.

Key rule: never cache authenticated, user-specific responses in a shared cache without proper vary keys. Leaking one user's data to another is a critical security incident.

Circuit Breaking#

When a backend service becomes unhealthy, the gateway should stop sending traffic to it rather than letting requests pile up and cascade failures.

┌─────────┐     ┌──────────┐     ┌─────────────┐
│ Gateway │────▶│ Circuit  │────▶│  Backend    │
│         │     │ Breaker  │     │  Service    │
└─────────┘     └──────────┘     └─────────────┘

States:
  CLOSED  ──▶ requests flow normally
  OPEN    ──▶ requests fail fast (503)
  HALF-OPEN ──▶ allow a probe request to test recovery

Configuration typically includes:

Failure threshold — number of failures before opening the circuit (e.g., 5 in 10 seconds).
Open duration — how long to wait before probing (e.g., 30 seconds).
Success threshold — number of successful probes before closing the circuit.

Load Shedding#

Load shedding goes further than rate limiting: when the system is under extreme pressure, the gateway deliberately drops low-priority requests to preserve capacity for critical ones.

Priority Classification#

Priority 1 (Critical):  POST /api/payments/charge
Priority 2 (Important): GET  /api/orders/{id}
Priority 3 (Best-effort): GET /api/recommendations

Implementation Signals#

Request queue depth — if the queue exceeds a threshold, reject new low-priority requests.
Backend latency — if p99 latency spikes, begin shedding.
CPU / memory — system-level signals trigger adaptive shedding.

Tooling#

Kong#

Key plugins: rate-limiting, jwt, oauth2, request-transformer, response-ratelimiting, proxy-cache, circuit-breaker (via custom plugins or Kong Gateway Enterprise).

Tyk#

AWS API Gateway#

AWS API Gateway provides a fully managed solution with two flavors:

HTTP API — lightweight, low-latency, best for proxying to Lambda or HTTP backends.
REST API — full-featured with request validation, WAF integration, caching, and usage plans.

Both integrate with AWS Cognito for authentication, CloudWatch for monitoring, and X-Ray for tracing.

Other Notable Tools#

Envoy — high-performance L4/L7 proxy, often used as a data plane for service meshes (Istio).
NGINX — widely used as a reverse proxy with API gateway capabilities via OpenResty or NGINX Plus.
Traefik — cloud-native edge router with automatic service discovery.
Ambassador (Emissary-Ingress) — Kubernetes-native gateway built on Envoy.

Key Takeaways#

Choose a gateway that matches your operational maturity: managed services like AWS API Gateway for simplicity, or self-hosted solutions like Kong and Envoy for maximum control.

Want to sharpen your system design skills? Explore 323 more articles on Codelit.dev covering distributed systems, architecture patterns, and real-world engineering.

Try it on Codelit

Chaos Mode

Simulate node failures and watch cascading impact across your architecture

Build this architecture →

Comments

AI search

Build this architecture

Generate an interactive API Gateway Patterns in seconds.

Try it in Codelit →

API Gateway Patterns: A Deep Dive into Production Architecture

Request Routing#

Path-Based Routing#

Header-Based and Weighted Routing#

Rate Limiting#

Common Algorithms#

Multi-Tier Limits#

Authentication and Authorization#

Request Transformation#

Common Transformations#

Response Caching#

Cache Strategies#

Circuit Breaking#

Load Shedding#

Priority Classification#

Implementation Signals#

Tooling#

Kong#

Tyk#

AWS API Gateway#

Other Notable Tools#

Key Takeaways#

Comments

Related articles

AI-Powered Search Architecture: Semantic Search, Hybrid Search, and RAG

AI Safety Guardrails Architecture: Input Validation, Output Filtering, and Human-in-the-Loop

API Backward Compatibility: Ship Changes Without Breaking Consumers

Try these templates

Scalable SaaS Application

Netflix Video Streaming Architecture

Search Engine Architecture

Build this architecture

API Gateway Patterns: A Deep Dive into Production Architecture

Request Routing#

Path-Based Routing#

Header-Based and Weighted Routing#

Rate Limiting#

Common Algorithms#

Multi-Tier Limits#

Authentication and Authorization#

Request Transformation#

Common Transformations#

Response Caching#

Cache Strategies#

Circuit Breaking#

Load Shedding#

Priority Classification#

Implementation Signals#

Tooling#

Kong#

Tyk#

AWS API Gateway#

Other Notable Tools#

Key Takeaways#

Comments

Related articles

AI-Powered Search Architecture: Semantic Search, Hybrid Search, and RAG

AI Safety Guardrails Architecture: Input Validation, Output Filtering, and Human-in-the-Loop

API Backward Compatibility: Ship Changes Without Breaking Consumers

Try these templates

Scalable SaaS Application

Netflix Video Streaming Architecture

Search Engine Architecture

Build this architecture