architecturesystem-designapi

API Gateway Design Patterns — Routing, Rate Limiting, and Beyond

March 24, 2026 5 min readBy Mo Discussion

The front door of every modern system#

Every request to your backend passes through something — whether it's a reverse proxy, a load balancer, or a full API gateway. The difference matters.

An API gateway sits between your clients and your services. It handles cross-cutting concerns like authentication, rate limiting, request routing, and protocol translation so your services don't have to.

If you've ever wondered why companies like Netflix, Stripe, and Uber invest heavily in their gateway layer, this post explains why.

Why not just call services directly?#

Without a gateway, every client needs to:

Know the address of every service
Handle authentication on every request
Deal with different protocols (REST, gRPC, WebSocket)
Implement retry logic and circuit breaking
Manage API versioning per service

That's a lot of responsibility pushed to the client. An API gateway centralizes all of this.

Core gateway patterns#

1. Request routing#

The most basic pattern. The gateway maps public URLs to internal services:

GET /api/users/*     → User Service
GET /api/orders/*    → Order Service
GET /api/products/*  → Product Service

This decouples your public API from your internal service topology. You can split, merge, or migrate services without changing client code.

2. Authentication and authorization#

The gateway validates tokens (JWT, OAuth) before requests reach your services. This means:

Services trust that requests are already authenticated
Token validation logic lives in one place
You can swap auth providers without touching services

Pattern: Gateway validates the token, extracts user context, and forwards it as headers (X-User-Id, X-User-Role) to downstream services.

3. Rate limiting#

Protect your services from abuse and overload:

Per-user limits: 100 requests/minute per API key
Per-endpoint limits: Write endpoints get stricter limits than reads
Global limits: Total system capacity protection

Rate limiting at the gateway is more effective than at individual services because it catches abuse before it spreads.

4. Request aggregation (BFF pattern)#

The Backend for Frontend pattern uses the gateway to combine multiple service calls into one client response:

Client: GET /api/dashboard

Gateway internally calls:
  → User Service (profile)
  → Order Service (recent orders)
  → Analytics Service (stats)

Returns: Combined JSON response

This reduces client round trips and is especially important for mobile where latency matters.

5. Protocol translation#

Your gateway can accept REST from web clients and translate to gRPC for internal services. Clients get the simplicity of REST; services get the performance of gRPC.

6. Circuit breaking#

When a downstream service is failing, the gateway can short-circuit requests instead of letting them pile up:

Closed: Normal operation, requests pass through
Open: Service is down, return cached/fallback response immediately
Half-open: Periodically test if service has recovered

This prevents cascading failures across your system.

Gateway architectures#

Single gateway#

One gateway handles everything. Simple to operate, but becomes a bottleneck at scale. Works well for small-to-medium systems.

BFF gateways#

Separate gateways per client type (web, mobile, IoT). Each gateway is optimized for its client's needs — different aggregation, different rate limits, different response formats.

Mesh gateway#

In a service mesh (Istio, Linkerd), every service gets a sidecar proxy. The "gateway" logic is distributed. Better for microservices at scale but more complex to operate.

Common mistakes#

Over-engineering the gateway. Your gateway should route and protect, not contain business logic. If you're writing if/else statements about order processing in your gateway, that logic belongs in a service.

Single point of failure. Your gateway needs to be highly available. Run multiple instances behind a load balancer. Use health checks. Plan for gateway failures.

Ignoring observability. The gateway sees every request. Add structured logging, distributed tracing (correlation IDs), and metrics (latency percentiles, error rates) here.

Not versioning. API versioning at the gateway (/v1/users, /v2/users) lets you evolve your API without breaking existing clients.

Popular API gateways#

Gateway	Best for	Protocol support
Kong	Plugin ecosystem	REST, gRPC, WebSocket
AWS API Gateway	Serverless/AWS	REST, WebSocket
Envoy	Service mesh	gRPC, HTTP/2
Nginx	Raw performance	HTTP, TCP, UDP
Traefik	Docker/K8s	HTTP, TCP, gRPC

When you don't need a gateway#

Monolithic apps — if you have one service, a reverse proxy (Nginx) is enough
Internal tools — low traffic, trusted clients, no need for rate limiting
Prototypes — add the gateway when you have actual traffic to manage

Visualize your gateway architecture#

The best way to understand how a gateway fits into your system is to visualize it. Try describing your architecture in Codelit — it will generate an interactive diagram showing how your gateway connects to services, databases, and external APIs.

Key takeaways#

Centralize cross-cutting concerns at the gateway — auth, rate limiting, logging
Keep the gateway thin — route and protect, don't embed business logic
Plan for availability — the gateway is on the critical path for every request
Use BFF pattern when different clients need different API shapes
Add observability — the gateway is the best place to measure your API's health

{ }

Explore the Slack architecture interactively

Try it →

Try it on Codelit

Chaos Mode

Simulate node failures and watch cascading impact across your architecture

Build this architecture →

Comments

AI agents

AI Agent Tool Use Architecture: Function Calling, ReAct Loops & Structured Outputs

6 min read

AI workflows

AI Workflow Orchestration: Chains, DAGs, Human-in-the-Loop & Production Patterns

6 min read

API design

API Backward Compatibility: Ship Changes Without Breaking Consumers

6 min read

Try these templates

OpenAI API Request Pipeline

7-stage pipeline from API call to token generation, handling millions of requests per minute.

8 components

Netflix Video Streaming Architecture

Global video streaming platform with adaptive bitrate, CDN distribution, and recommendation engine.

10 components

Distributed Rate Limiter

API rate limiting with sliding window, token bucket, and per-user quotas.

7 components

Build this architecture

Generate an interactive architecture for API Gateway Design Patterns in seconds.

Try it in Codelit →

architecturesystem-designapi

API Gateway Design Patterns — Routing, Rate Limiting, and Beyond

March 24, 2026 5 min readBy Mo Discussion

The front door of every modern system#

Every request to your backend passes through something — whether it's a reverse proxy, a load balancer, or a full API gateway. The difference matters.

If you've ever wondered why companies like Netflix, Stripe, and Uber invest heavily in their gateway layer, this post explains why.

Why not just call services directly?#

Without a gateway, every client needs to:

Know the address of every service
Handle authentication on every request
Deal with different protocols (REST, gRPC, WebSocket)
Implement retry logic and circuit breaking
Manage API versioning per service

That's a lot of responsibility pushed to the client. An API gateway centralizes all of this.

Core gateway patterns#

1. Request routing#

The most basic pattern. The gateway maps public URLs to internal services:

GET /api/users/*     → User Service
GET /api/orders/*    → Order Service
GET /api/products/*  → Product Service

This decouples your public API from your internal service topology. You can split, merge, or migrate services without changing client code.

2. Authentication and authorization#

The gateway validates tokens (JWT, OAuth) before requests reach your services. This means:

Services trust that requests are already authenticated
Token validation logic lives in one place
You can swap auth providers without touching services

Pattern: Gateway validates the token, extracts user context, and forwards it as headers (X-User-Id, X-User-Role) to downstream services.

3. Rate limiting#

Protect your services from abuse and overload:

Per-user limits: 100 requests/minute per API key
Per-endpoint limits: Write endpoints get stricter limits than reads
Global limits: Total system capacity protection

Rate limiting at the gateway is more effective than at individual services because it catches abuse before it spreads.

4. Request aggregation (BFF pattern)#

The Backend for Frontend pattern uses the gateway to combine multiple service calls into one client response:

Client: GET /api/dashboard

Gateway internally calls:
  → User Service (profile)
  → Order Service (recent orders)
  → Analytics Service (stats)

Returns: Combined JSON response

This reduces client round trips and is especially important for mobile where latency matters.

5. Protocol translation#

Your gateway can accept REST from web clients and translate to gRPC for internal services. Clients get the simplicity of REST; services get the performance of gRPC.

6. Circuit breaking#

When a downstream service is failing, the gateway can short-circuit requests instead of letting them pile up:

Closed: Normal operation, requests pass through
Open: Service is down, return cached/fallback response immediately
Half-open: Periodically test if service has recovered

This prevents cascading failures across your system.

Gateway architectures#

Single gateway#

One gateway handles everything. Simple to operate, but becomes a bottleneck at scale. Works well for small-to-medium systems.

BFF gateways#

Separate gateways per client type (web, mobile, IoT). Each gateway is optimized for its client's needs — different aggregation, different rate limits, different response formats.

Mesh gateway#

In a service mesh (Istio, Linkerd), every service gets a sidecar proxy. The "gateway" logic is distributed. Better for microservices at scale but more complex to operate.

Common mistakes#

Single point of failure. Your gateway needs to be highly available. Run multiple instances behind a load balancer. Use health checks. Plan for gateway failures.

Ignoring observability. The gateway sees every request. Add structured logging, distributed tracing (correlation IDs), and metrics (latency percentiles, error rates) here.

Not versioning. API versioning at the gateway (/v1/users, /v2/users) lets you evolve your API without breaking existing clients.

Popular API gateways#

Gateway	Best for	Protocol support
Kong	Plugin ecosystem	REST, gRPC, WebSocket
AWS API Gateway	Serverless/AWS	REST, WebSocket
Envoy	Service mesh	gRPC, HTTP/2
Nginx	Raw performance	HTTP, TCP, UDP
Traefik	Docker/K8s	HTTP, TCP, gRPC

When you don't need a gateway#

Monolithic apps — if you have one service, a reverse proxy (Nginx) is enough
Internal tools — low traffic, trusted clients, no need for rate limiting
Prototypes — add the gateway when you have actual traffic to manage

Visualize your gateway architecture#

Key takeaways#

Centralize cross-cutting concerns at the gateway — auth, rate limiting, logging
Keep the gateway thin — route and protect, don't embed business logic
Plan for availability — the gateway is on the critical path for every request
Use BFF pattern when different clients need different API shapes
Add observability — the gateway is the best place to measure your API's health

{ }

Explore the Slack architecture interactively

Try it →

Try it on Codelit

Chaos Mode

Simulate node failures and watch cascading impact across your architecture

Build this architecture →

Comments

AI agents

Build this architecture

Generate an interactive architecture for API Gateway Design Patterns in seconds.

Try it in Codelit →

API Gateway Design Patterns — Routing, Rate Limiting, and Beyond

The front door of every modern system#

Why not just call services directly?#

Core gateway patterns#

1. Request routing#

2. Authentication and authorization#

3. Rate limiting#

4. Request aggregation (BFF pattern)#

5. Protocol translation#

6. Circuit breaking#

Gateway architectures#

Single gateway#

BFF gateways#

Mesh gateway#

Common mistakes#

Popular API gateways#

When you don't need a gateway#

Visualize your gateway architecture#

Key takeaways#

Comments

Related articles

AI Agent Tool Use Architecture: Function Calling, ReAct Loops & Structured Outputs

AI Workflow Orchestration: Chains, DAGs, Human-in-the-Loop & Production Patterns

API Backward Compatibility: Ship Changes Without Breaking Consumers

Try these templates

OpenAI API Request Pipeline

Netflix Video Streaming Architecture

Distributed Rate Limiter

Build this architecture

API Gateway Design Patterns — Routing, Rate Limiting, and Beyond

The front door of every modern system#

Why not just call services directly?#

Core gateway patterns#

1. Request routing#

2. Authentication and authorization#

3. Rate limiting#

4. Request aggregation (BFF pattern)#

5. Protocol translation#

6. Circuit breaking#

Gateway architectures#

Single gateway#

BFF gateways#

Mesh gateway#

Common mistakes#

Popular API gateways#

When you don't need a gateway#

Visualize your gateway architecture#

Key takeaways#

Comments

Related articles

AI Agent Tool Use Architecture: Function Calling, ReAct Loops & Structured Outputs

AI Workflow Orchestration: Chains, DAGs, Human-in-the-Loop & Production Patterns

API Backward Compatibility: Ship Changes Without Breaking Consumers

Try these templates

OpenAI API Request Pipeline

Netflix Video Streaming Architecture

Distributed Rate Limiter

Build this architecture