rate-limitingapisystem-design

Rate Limiting Strategies — Token Bucket, Sliding Window, and How to Not Get Hacked

March 23, 2026 3 min readBy Mo Discussion

Why rate limiting isn't optional#

Without rate limiting, one bad actor can:

DDoS your API with millions of requests
Brute-force passwords by trying thousands per second
Scrape your entire database through your public API
Run up your cloud bill by triggering expensive operations

Rate limiting is security, reliability, and cost control in one feature.

The algorithms#

Fixed window#

Count requests in fixed time windows (e.g., per minute).

How it works: At the start of each minute, reset the counter to 0. Increment on each request. Reject when the counter hits the limit.

Problem: Burst at window boundaries. A user can send 100 requests at :59 and 100 more at :00 — 200 requests in 2 seconds despite a "100 per minute" limit.

Sliding window log#

Track the timestamp of every request. Count requests in the last N seconds.

How it works: On each request, remove timestamps older than the window, count remaining, reject if over limit.

Accurate but expensive — storing every timestamp uses a lot of memory at scale.

Sliding window counter#

Combine fixed window with weighted overlap. The best balance of accuracy and efficiency.

How it works: Use two fixed windows. Weight the previous window by the overlap percentage.

Current window: 40 requests (60% through)
Previous window: 80 requests (40% remaining weight)
Estimated rate: 40 + (80 × 0.4) = 72
Limit: 100 → Allow

Used by: Most production rate limiters. Good accuracy, constant memory.

Token bucket#

A bucket holds tokens. Each request takes one. Tokens refill at a fixed rate.

How it works:

Bucket capacity: 100 tokens
Refill rate: 10 tokens/second
Each request consumes 1 token
If bucket is empty → reject

Allows bursts: A full bucket allows 100 rapid requests (burst), then settles to 10/second (sustained rate).

Used by: AWS API Gateway, Stripe, most cloud APIs.

Leaky bucket#

Requests enter a queue (bucket). The queue drains at a fixed rate.

How it works: Incoming requests join the queue. If the queue is full → reject. Requests are processed at a constant rate regardless of arrival pattern.

Smooths traffic: Unlike token bucket, leaky bucket enforces a constant output rate. Good for protecting downstream services from bursts.

Implementation with Redis#

// Token bucket with Redis
async function checkRateLimit(userId: string): Promise<boolean> {
  const key = `ratelimit:${userId}`;
  const now = Date.now();
  const limit = 100;
  const windowMs = 60_000;

  const result = await redis.multi()
    .zremrangebyscore(key, 0, now - windowMs) // Remove old entries
    .zcard(key)                                // Count current
    .zadd(key, now, `${now}`)                 // Add this request
    .expire(key, 60)                           // TTL cleanup
    .exec();

  const count = result[1] as number;
  return count < limit;
}

What to rate limit by#

Target	When
IP address	Anonymous users, public APIs
API key	Authenticated API consumers
User ID	Logged-in users
Endpoint	Protect expensive operations specifically
Combination	IP + endpoint for login pages

Response headers#

Tell clients their rate limit status:

X-RateLimit-Limit: 100
X-RateLimit-Remaining: 47
X-RateLimit-Reset: 1679616000
Retry-After: 30

Return 429 Too Many Requests with Retry-After so well-behaved clients can back off automatically.

See rate limiting in your architecture#

On Codelit, search "rate limiter" in ⌘K to see a complete distributed rate limiting architecture — Redis counters, rules config, analytics, and API gateway integration.

Design rate-limited APIs: open ⌘K on Codelit.io and load the rate limiter architecture instantly.

{ }

Explore the Stripe architecture interactively

Try it →

Try it on Codelit

Chaos Mode

Simulate node failures and watch cascading impact across your architecture

Build this architecture →

Comments

api design

Batch API Endpoints — Patterns for Bulk Operations, Partial Success, and Idempotency

8 min read

system design

Circuit Breaker Implementation — State Machine, Failure Counting, Fallbacks, and Resilience4j

7 min read

testing

API Contract Testing with Pact — Consumer-Driven Contracts for Microservices

8 min read

Try these templates

OpenAI API Request Pipeline

7-stage pipeline from API call to token generation, handling millions of requests per minute.

8 components

Distributed Rate Limiter

API rate limiting with sliding window, token bucket, and per-user quotas.

7 components

API Gateway Platform

Kong/AWS API Gateway-like platform with routing, auth, rate limiting, transformation, and developer portal.

8 components

Build this architecture

Generate an interactive architecture for Rate Limiting Strategies in seconds.

Try it in Codelit →

rate-limitingapisystem-design

Rate Limiting Strategies — Token Bucket, Sliding Window, and How to Not Get Hacked

March 23, 2026 3 min readBy Mo Discussion

Why rate limiting isn't optional#

Without rate limiting, one bad actor can:

DDoS your API with millions of requests
Brute-force passwords by trying thousands per second
Scrape your entire database through your public API
Run up your cloud bill by triggering expensive operations

Rate limiting is security, reliability, and cost control in one feature.

The algorithms#

Fixed window#

Count requests in fixed time windows (e.g., per minute).

How it works: At the start of each minute, reset the counter to 0. Increment on each request. Reject when the counter hits the limit.

Problem: Burst at window boundaries. A user can send 100 requests at :59 and 100 more at :00 — 200 requests in 2 seconds despite a "100 per minute" limit.

Sliding window log#

Track the timestamp of every request. Count requests in the last N seconds.

How it works: On each request, remove timestamps older than the window, count remaining, reject if over limit.

Accurate but expensive — storing every timestamp uses a lot of memory at scale.

Sliding window counter#

Combine fixed window with weighted overlap. The best balance of accuracy and efficiency.

How it works: Use two fixed windows. Weight the previous window by the overlap percentage.

Current window: 40 requests (60% through)
Previous window: 80 requests (40% remaining weight)
Estimated rate: 40 + (80 × 0.4) = 72
Limit: 100 → Allow

Used by: Most production rate limiters. Good accuracy, constant memory.

Token bucket#

A bucket holds tokens. Each request takes one. Tokens refill at a fixed rate.

How it works:

Bucket capacity: 100 tokens
Refill rate: 10 tokens/second
Each request consumes 1 token
If bucket is empty → reject

Allows bursts: A full bucket allows 100 rapid requests (burst), then settles to 10/second (sustained rate).

Used by: AWS API Gateway, Stripe, most cloud APIs.

Leaky bucket#

Requests enter a queue (bucket). The queue drains at a fixed rate.

How it works: Incoming requests join the queue. If the queue is full → reject. Requests are processed at a constant rate regardless of arrival pattern.

Smooths traffic: Unlike token bucket, leaky bucket enforces a constant output rate. Good for protecting downstream services from bursts.

Implementation with Redis#

// Token bucket with Redis
async function checkRateLimit(userId: string): Promise<boolean> {
  const key = `ratelimit:${userId}`;
  const now = Date.now();
  const limit = 100;
  const windowMs = 60_000;

  const result = await redis.multi()
    .zremrangebyscore(key, 0, now - windowMs) // Remove old entries
    .zcard(key)                                // Count current
    .zadd(key, now, `${now}`)                 // Add this request
    .expire(key, 60)                           // TTL cleanup
    .exec();

  const count = result[1] as number;
  return count < limit;
}

What to rate limit by#

Target	When
IP address	Anonymous users, public APIs
API key	Authenticated API consumers
User ID	Logged-in users
Endpoint	Protect expensive operations specifically
Combination	IP + endpoint for login pages

Response headers#

Tell clients their rate limit status:

X-RateLimit-Limit: 100
X-RateLimit-Remaining: 47
X-RateLimit-Reset: 1679616000
Retry-After: 30

Return 429 Too Many Requests with Retry-After so well-behaved clients can back off automatically.

See rate limiting in your architecture#

On Codelit, search "rate limiter" in ⌘K to see a complete distributed rate limiting architecture — Redis counters, rules config, analytics, and API gateway integration.

Design rate-limited APIs: open ⌘K on Codelit.io and load the rate limiter architecture instantly.

{ }

Explore the Stripe architecture interactively

Try it →

Try it on Codelit

Chaos Mode

Simulate node failures and watch cascading impact across your architecture

Build this architecture →

Comments

api design

Build this architecture

Generate an interactive architecture for Rate Limiting Strategies in seconds.

Try it in Codelit →

Rate Limiting Strategies — Token Bucket, Sliding Window, and How to Not Get Hacked

Why rate limiting isn't optional#

The algorithms#

Fixed window#

Sliding window log#

Sliding window counter#

Token bucket#

Leaky bucket#

Implementation with Redis#

What to rate limit by#

Response headers#

See rate limiting in your architecture#

Comments

Related articles

Batch API Endpoints — Patterns for Bulk Operations, Partial Success, and Idempotency

Circuit Breaker Implementation — State Machine, Failure Counting, Fallbacks, and Resilience4j

API Contract Testing with Pact — Consumer-Driven Contracts for Microservices

Try these templates

OpenAI API Request Pipeline

Distributed Rate Limiter

API Gateway Platform

Build this architecture

Rate Limiting Strategies — Token Bucket, Sliding Window, and How to Not Get Hacked

Why rate limiting isn't optional#

The algorithms#

Fixed window#

Sliding window log#

Sliding window counter#

Token bucket#

Leaky bucket#

Implementation with Redis#

What to rate limit by#

Response headers#

See rate limiting in your architecture#

Comments

Related articles

Batch API Endpoints — Patterns for Bulk Operations, Partial Success, and Idempotency

Circuit Breaker Implementation — State Machine, Failure Counting, Fallbacks, and Resilience4j

API Contract Testing with Pact — Consumer-Driven Contracts for Microservices

Try these templates

OpenAI API Request Pipeline

Distributed Rate Limiter

API Gateway Platform

Build this architecture