rate limitingalgorithmsAPIRedissystem design

Rate Limiting Algorithms: Token Bucket, Sliding Window, and Fixed Window Explained

March 28, 2026 6 min readBy Codelit Team Discussion

Rate Limiting: Token Bucket, Sliding Window, and Fixed Window#

Rate limiting protects your API from abuse, ensures fair usage, and prevents cascade failures from traffic spikes. But which algorithm should you use?

Why Rate Limit?#

Without rate limiting:

Malicious user → 10,000 req/sec → your API → database overwhelmed → all users affected

With rate limiting:

Malicious user → 10,000 req/sec → rate limiter → 429 Too Many Requests (after 100/min)
Normal users → unaffected

The Four Algorithms#

1. Fixed Window#

Count requests in fixed time intervals (e.g., per minute):

Window: 12:00 - 12:01 → count: 0
  Request at 12:00:15 → count: 1 ✓
  Request at 12:00:30 → count: 2 ✓
  ...
  Request at 12:00:55 → count: 100 ✓ (limit reached)
  Request at 12:00:56 → REJECTED (429)
Window: 12:01 - 12:02 → count resets to 0

Redis implementation:

key = "rate:{user_id}:{minute}"
count = INCR key
if count == 1: EXPIRE key 60
if count > limit: reject

Pros: Simple, low memory (one counter per window) Cons: Burst at window boundaries — 100 requests at 12:00:59 + 100 at 12:01:00 = 200 in 2 seconds

2. Sliding Window Log#

Track timestamp of every request, count within rolling window:

Requests log: [12:00:15, 12:00:30, 12:00:45, 12:01:10, ...]

At 12:01:20, check: how many in last 60s?
  Remove entries before 12:00:20
  Count remaining: 3 → under limit ✓

Redis implementation:

ZADD rate:{user_id} {timestamp} {request_id}
ZREMRANGEBYSCORE rate:{user_id} 0 {timestamp - window}
count = ZCARD rate:{user_id}
if count > limit: reject

Pros: Perfectly accurate, no boundary burst Cons: High memory (stores every request timestamp)

3. Sliding Window Counter#

Hybrid: weighted average of current + previous window:

Previous window (12:00-12:01): 80 requests
Current window (12:01-12:02): 30 requests
Current position: 12:01:15 (25% into window)

Weighted count = 80 × 0.75 + 30 = 90
Limit: 100 → allowed ✓

Pros: Low memory (two counters), no boundary burst Cons: Approximate (not exact), but close enough for most use cases

This is the best default. Good accuracy with minimal resources.

4. Token Bucket#

Bucket fills with tokens at a steady rate. Each request consumes one token:

Bucket capacity: 10 tokens
Refill rate: 1 token/second

Time 0:   10 tokens (full)
Burst:    10 requests → 0 tokens (all allowed)
Time 1:   1 token (refilled)
Time 2:   2 tokens
...
Time 10:  10 tokens (full again)

Pros: Allows controlled bursts, smooth rate limiting Cons: Slightly more complex to implement

Redis implementation:

last_refill = GET rate:{user_id}:time
tokens = GET rate:{user_id}:tokens

// Refill tokens based on time elapsed
elapsed = now - last_refill
tokens = min(capacity, tokens + elapsed * refill_rate)

if tokens >= 1:
  tokens -= 1
  SET rate:{user_id}:tokens {tokens}
  SET rate:{user_id}:time {now}
  allow
else:
  reject (429)

5. Leaky Bucket#

Requests queue and process at a fixed rate (like water leaking from a bucket):

Requests arrive at variable rate → enter bucket (queue)
Bucket processes at fixed rate (10/sec)
If bucket is full → overflow → reject

Pros: Perfectly smooth output rate Cons: Adds latency (requests wait in queue)

Comparison#

Algorithm	Accuracy	Memory	Burst	Complexity
Fixed Window	Low (boundary burst)	Very Low	Allows 2x at boundary	Simple
Sliding Log	Perfect	High	None	Medium
Sliding Counter	Good (~99%)	Low	Minimal	Medium
Token Bucket	Good	Low	Controlled burst	Medium
Leaky Bucket	Perfect	Medium	None (queued)	Medium

Recommendation: Sliding window counter or token bucket for most APIs.

What to Rate Limit By#

Identifier	When
API key	Public APIs (Stripe, Twilio)
User ID	Authenticated endpoints
IP address	Unauthenticated endpoints, login
Endpoint	Expensive operations (search, export)
Combination	User + endpoint for fine-grained control

Distributed Rate Limiting#

Single-server rate limiting is easy. Distributed is hard:

Centralized (Redis)#

Server 1 → Redis (shared counter) ← Server 2
Server 3 ↗                        ↖ Server 4

All servers check/increment the same Redis key. Accurate but adds Redis latency (~1ms).

Local + Sync#

Server 1: local counter (approximate)
Server 2: local counter (approximate)
Periodic sync: servers share counts every 5s

Less accurate but no Redis dependency. Good for very high throughput.

Sticky Sessions#

Route same user to same server → local rate limiting works. But breaks if server dies. Not recommended.

Response Headers#

Always tell clients their rate limit status:

HTTP/1.1 200 OK
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 67
X-RateLimit-Reset: 1711616400
Retry-After: 30  (only on 429)

Architecture Example#

API Rate Limiting#

Client → Load Balancer → Rate Limiter (Redis)
                              ↓ allowed
                         API Gateway → Service A
                                     → Service B
                              ↓ rejected
                         429 Too Many Requests

Multi-Tier Limits#

Global:     10,000 req/min (all users combined)
Per-user:   100 req/min (per API key)
Per-endpoint: 10 req/min (expensive operations like /export)

Generate your rate limiting architecture →

Tools#

Tool	Type	Best For
Redis	Custom implementation	Full control, any algorithm
Upstash Ratelimit	Managed SDK	Serverless, edge-compatible
Kong	API gateway plugin	Drop-in rate limiting
Nginx	Reverse proxy	Simple req/sec limiting
Cloudflare	Edge rate limiting	DDoS protection + rate limits

Best Practices#

Use sliding window counter as default — good accuracy, low resources
Token bucket when you want controlled bursts
Rate limit by API key for public APIs, by user ID for authenticated
Always return rate limit headers — clients can self-throttle
Separate limits for expensive operations (search, export, AI generation)
Graceful degradation — consider returning cached/stale data instead of 429
Log rate limit events — detect abuse patterns

Summary#

Need	Algorithm
Simple, low memory	Fixed window
No boundary bursts	Sliding window counter
Allow controlled bursts	Token bucket
Perfectly smooth output	Leaky bucket
Perfect accuracy	Sliding window log

Design rate limiting into your architecture at codelit.io — generate interactive diagrams with security audits and infrastructure exports.

Try it on Codelit

GitHub Integration

Paste any repo URL to generate an interactive architecture diagram from real code

Build this architecture →

Comments

AI search

AI-Powered Search Architecture: Semantic Search, Hybrid Search, and RAG

8 min read

AI safety

AI Safety Guardrails Architecture: Input Validation, Output Filtering, and Human-in-the-Loop

8 min read

API design

API Backward Compatibility: Ship Changes Without Breaking Consumers

6 min read

Try these templates

OpenAI API Request Pipeline

7-stage pipeline from API call to token generation, handling millions of requests per minute.

8 components

Distributed Rate Limiter

API rate limiting with sliding window, token bucket, and per-user quotas.

7 components

API Gateway Platform

Kong/AWS API Gateway-like platform with routing, auth, rate limiting, transformation, and developer portal.

8 components

Build this architecture

Generate an interactive architecture for Rate Limiting Algorithms in seconds.

Try it in Codelit →

rate limitingalgorithmsAPIRedissystem design

Rate Limiting Algorithms: Token Bucket, Sliding Window, and Fixed Window Explained

March 28, 2026 6 min readBy Codelit Team Discussion

Rate Limiting: Token Bucket, Sliding Window, and Fixed Window#

Rate limiting protects your API from abuse, ensures fair usage, and prevents cascade failures from traffic spikes. But which algorithm should you use?

Why Rate Limit?#

Without rate limiting:

Malicious user → 10,000 req/sec → your API → database overwhelmed → all users affected

With rate limiting:

Malicious user → 10,000 req/sec → rate limiter → 429 Too Many Requests (after 100/min)
Normal users → unaffected

The Four Algorithms#

1. Fixed Window#

Count requests in fixed time intervals (e.g., per minute):

Window: 12:00 - 12:01 → count: 0
  Request at 12:00:15 → count: 1 ✓
  Request at 12:00:30 → count: 2 ✓
  ...
  Request at 12:00:55 → count: 100 ✓ (limit reached)
  Request at 12:00:56 → REJECTED (429)
Window: 12:01 - 12:02 → count resets to 0

Redis implementation:

key = "rate:{user_id}:{minute}"
count = INCR key
if count == 1: EXPIRE key 60
if count > limit: reject

Pros: Simple, low memory (one counter per window) Cons: Burst at window boundaries — 100 requests at 12:00:59 + 100 at 12:01:00 = 200 in 2 seconds

2. Sliding Window Log#

Track timestamp of every request, count within rolling window:

Requests log: [12:00:15, 12:00:30, 12:00:45, 12:01:10, ...]

At 12:01:20, check: how many in last 60s?
  Remove entries before 12:00:20
  Count remaining: 3 → under limit ✓

Redis implementation:

ZADD rate:{user_id} {timestamp} {request_id}
ZREMRANGEBYSCORE rate:{user_id} 0 {timestamp - window}
count = ZCARD rate:{user_id}
if count > limit: reject

Pros: Perfectly accurate, no boundary burst Cons: High memory (stores every request timestamp)

3. Sliding Window Counter#

Hybrid: weighted average of current + previous window:

Previous window (12:00-12:01): 80 requests
Current window (12:01-12:02): 30 requests
Current position: 12:01:15 (25% into window)

Weighted count = 80 × 0.75 + 30 = 90
Limit: 100 → allowed ✓

Pros: Low memory (two counters), no boundary burst Cons: Approximate (not exact), but close enough for most use cases

This is the best default. Good accuracy with minimal resources.

4. Token Bucket#

Bucket fills with tokens at a steady rate. Each request consumes one token:

Bucket capacity: 10 tokens
Refill rate: 1 token/second

Time 0:   10 tokens (full)
Burst:    10 requests → 0 tokens (all allowed)
Time 1:   1 token (refilled)
Time 2:   2 tokens
...
Time 10:  10 tokens (full again)

Pros: Allows controlled bursts, smooth rate limiting Cons: Slightly more complex to implement

Redis implementation:

last_refill = GET rate:{user_id}:time
tokens = GET rate:{user_id}:tokens

// Refill tokens based on time elapsed
elapsed = now - last_refill
tokens = min(capacity, tokens + elapsed * refill_rate)

if tokens >= 1:
  tokens -= 1
  SET rate:{user_id}:tokens {tokens}
  SET rate:{user_id}:time {now}
  allow
else:
  reject (429)

5. Leaky Bucket#

Requests queue and process at a fixed rate (like water leaking from a bucket):

Requests arrive at variable rate → enter bucket (queue)
Bucket processes at fixed rate (10/sec)
If bucket is full → overflow → reject

Pros: Perfectly smooth output rate Cons: Adds latency (requests wait in queue)

Comparison#

Algorithm	Accuracy	Memory	Burst	Complexity
Fixed Window	Low (boundary burst)	Very Low	Allows 2x at boundary	Simple
Sliding Log	Perfect	High	None	Medium
Sliding Counter	Good (~99%)	Low	Minimal	Medium
Token Bucket	Good	Low	Controlled burst	Medium
Leaky Bucket	Perfect	Medium	None (queued)	Medium

Recommendation: Sliding window counter or token bucket for most APIs.

What to Rate Limit By#

Identifier	When
API key	Public APIs (Stripe, Twilio)
User ID	Authenticated endpoints
IP address	Unauthenticated endpoints, login
Endpoint	Expensive operations (search, export)
Combination	User + endpoint for fine-grained control

Distributed Rate Limiting#

Single-server rate limiting is easy. Distributed is hard:

Centralized (Redis)#

Server 1 → Redis (shared counter) ← Server 2
Server 3 ↗                        ↖ Server 4

All servers check/increment the same Redis key. Accurate but adds Redis latency (~1ms).

Local + Sync#

Server 1: local counter (approximate)
Server 2: local counter (approximate)
Periodic sync: servers share counts every 5s

Less accurate but no Redis dependency. Good for very high throughput.

Sticky Sessions#

Route same user to same server → local rate limiting works. But breaks if server dies. Not recommended.

Response Headers#

Always tell clients their rate limit status:

HTTP/1.1 200 OK
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 67
X-RateLimit-Reset: 1711616400
Retry-After: 30  (only on 429)

Architecture Example#

API Rate Limiting#

Client → Load Balancer → Rate Limiter (Redis)
                              ↓ allowed
                         API Gateway → Service A
                                     → Service B
                              ↓ rejected
                         429 Too Many Requests

Multi-Tier Limits#

Global:     10,000 req/min (all users combined)
Per-user:   100 req/min (per API key)
Per-endpoint: 10 req/min (expensive operations like /export)

Generate your rate limiting architecture →

Tools#

Tool	Type	Best For
Redis	Custom implementation	Full control, any algorithm
Upstash Ratelimit	Managed SDK	Serverless, edge-compatible
Kong	API gateway plugin	Drop-in rate limiting
Nginx	Reverse proxy	Simple req/sec limiting
Cloudflare	Edge rate limiting	DDoS protection + rate limits

Best Practices#

Use sliding window counter as default — good accuracy, low resources
Token bucket when you want controlled bursts
Rate limit by API key for public APIs, by user ID for authenticated
Always return rate limit headers — clients can self-throttle
Separate limits for expensive operations (search, export, AI generation)
Graceful degradation — consider returning cached/stale data instead of 429
Log rate limit events — detect abuse patterns

Summary#

Need	Algorithm
Simple, low memory	Fixed window
No boundary bursts	Sliding window counter
Allow controlled bursts	Token bucket
Perfectly smooth output	Leaky bucket
Perfect accuracy	Sliding window log

Design rate limiting into your architecture at codelit.io — generate interactive diagrams with security audits and infrastructure exports.

Try it on Codelit

GitHub Integration

Paste any repo URL to generate an interactive architecture diagram from real code

Build this architecture →

Comments

AI search

Build this architecture

Generate an interactive architecture for Rate Limiting Algorithms in seconds.

Try it in Codelit →

Rate Limiting Algorithms: Token Bucket, Sliding Window, and Fixed Window Explained

Rate Limiting: Token Bucket, Sliding Window, and Fixed Window#

Why Rate Limit?#

The Four Algorithms#

1. Fixed Window#

2. Sliding Window Log#

3. Sliding Window Counter#

4. Token Bucket#

5. Leaky Bucket#

Comparison#

What to Rate Limit By#

Distributed Rate Limiting#

Centralized (Redis)#

Local + Sync#

Sticky Sessions#

Response Headers#

Architecture Example#

API Rate Limiting#

Multi-Tier Limits#

Tools#

Best Practices#

Summary#

Comments

Related articles

AI-Powered Search Architecture: Semantic Search, Hybrid Search, and RAG

AI Safety Guardrails Architecture: Input Validation, Output Filtering, and Human-in-the-Loop

API Backward Compatibility: Ship Changes Without Breaking Consumers

Try these templates

OpenAI API Request Pipeline

Distributed Rate Limiter

API Gateway Platform

Build this architecture

Rate Limiting Algorithms: Token Bucket, Sliding Window, and Fixed Window Explained

Rate Limiting: Token Bucket, Sliding Window, and Fixed Window#

Why Rate Limit?#

The Four Algorithms#

1. Fixed Window#

2. Sliding Window Log#

3. Sliding Window Counter#

4. Token Bucket#

5. Leaky Bucket#

Comparison#

What to Rate Limit By#

Distributed Rate Limiting#

Centralized (Redis)#

Local + Sync#

Sticky Sessions#

Response Headers#

Architecture Example#

API Rate Limiting#

Multi-Tier Limits#

Tools#

Best Practices#

Summary#

Comments

Related articles

AI-Powered Search Architecture: Semantic Search, Hybrid Search, and RAG

AI Safety Guardrails Architecture: Input Validation, Output Filtering, and Human-in-the-Loop

API Backward Compatibility: Ship Changes Without Breaking Consumers

Try these templates

OpenAI API Request Pipeline

Distributed Rate Limiter

API Gateway Platform

Build this architecture