Rate Limiting Algorithms: Token Bucket, Sliding Window, and Fixed Window Explained
Rate Limiting: Token Bucket, Sliding Window, and Fixed Window#
Rate limiting protects your API from abuse, ensures fair usage, and prevents cascade failures from traffic spikes. But which algorithm should you use?
Why Rate Limit?#
Without rate limiting:
Malicious user → 10,000 req/sec → your API → database overwhelmed → all users affected
With rate limiting:
Malicious user → 10,000 req/sec → rate limiter → 429 Too Many Requests (after 100/min)
Normal users → unaffected
The Four Algorithms#
1. Fixed Window#
Count requests in fixed time intervals (e.g., per minute):
Window: 12:00 - 12:01 → count: 0
Request at 12:00:15 → count: 1 ✓
Request at 12:00:30 → count: 2 ✓
...
Request at 12:00:55 → count: 100 ✓ (limit reached)
Request at 12:00:56 → REJECTED (429)
Window: 12:01 - 12:02 → count resets to 0
Redis implementation:
key = "rate:{user_id}:{minute}"
count = INCR key
if count == 1: EXPIRE key 60
if count > limit: reject
Pros: Simple, low memory (one counter per window) Cons: Burst at window boundaries — 100 requests at 12:00:59 + 100 at 12:01:00 = 200 in 2 seconds
2. Sliding Window Log#
Track timestamp of every request, count within rolling window:
Requests log: [12:00:15, 12:00:30, 12:00:45, 12:01:10, ...]
At 12:01:20, check: how many in last 60s?
Remove entries before 12:00:20
Count remaining: 3 → under limit ✓
Redis implementation:
ZADD rate:{user_id} {timestamp} {request_id}
ZREMRANGEBYSCORE rate:{user_id} 0 {timestamp - window}
count = ZCARD rate:{user_id}
if count > limit: reject
Pros: Perfectly accurate, no boundary burst Cons: High memory (stores every request timestamp)
3. Sliding Window Counter#
Hybrid: weighted average of current + previous window:
Previous window (12:00-12:01): 80 requests
Current window (12:01-12:02): 30 requests
Current position: 12:01:15 (25% into window)
Weighted count = 80 × 0.75 + 30 = 90
Limit: 100 → allowed ✓
Pros: Low memory (two counters), no boundary burst Cons: Approximate (not exact), but close enough for most use cases
This is the best default. Good accuracy with minimal resources.
4. Token Bucket#
Bucket fills with tokens at a steady rate. Each request consumes one token:
Bucket capacity: 10 tokens
Refill rate: 1 token/second
Time 0: 10 tokens (full)
Burst: 10 requests → 0 tokens (all allowed)
Time 1: 1 token (refilled)
Time 2: 2 tokens
...
Time 10: 10 tokens (full again)
Pros: Allows controlled bursts, smooth rate limiting Cons: Slightly more complex to implement
Redis implementation:
last_refill = GET rate:{user_id}:time
tokens = GET rate:{user_id}:tokens
// Refill tokens based on time elapsed
elapsed = now - last_refill
tokens = min(capacity, tokens + elapsed * refill_rate)
if tokens >= 1:
tokens -= 1
SET rate:{user_id}:tokens {tokens}
SET rate:{user_id}:time {now}
allow
else:
reject (429)
5. Leaky Bucket#
Requests queue and process at a fixed rate (like water leaking from a bucket):
Requests arrive at variable rate → enter bucket (queue)
Bucket processes at fixed rate (10/sec)
If bucket is full → overflow → reject
Pros: Perfectly smooth output rate Cons: Adds latency (requests wait in queue)
Comparison#
| Algorithm | Accuracy | Memory | Burst | Complexity |
|---|---|---|---|---|
| Fixed Window | Low (boundary burst) | Very Low | Allows 2x at boundary | Simple |
| Sliding Log | Perfect | High | None | Medium |
| Sliding Counter | Good (~99%) | Low | Minimal | Medium |
| Token Bucket | Good | Low | Controlled burst | Medium |
| Leaky Bucket | Perfect | Medium | None (queued) | Medium |
Recommendation: Sliding window counter or token bucket for most APIs.
What to Rate Limit By#
| Identifier | When |
|---|---|
| API key | Public APIs (Stripe, Twilio) |
| User ID | Authenticated endpoints |
| IP address | Unauthenticated endpoints, login |
| Endpoint | Expensive operations (search, export) |
| Combination | User + endpoint for fine-grained control |
Distributed Rate Limiting#
Single-server rate limiting is easy. Distributed is hard:
Centralized (Redis)#
Server 1 → Redis (shared counter) ← Server 2
Server 3 ↗ ↖ Server 4
All servers check/increment the same Redis key. Accurate but adds Redis latency (~1ms).
Local + Sync#
Server 1: local counter (approximate)
Server 2: local counter (approximate)
Periodic sync: servers share counts every 5s
Less accurate but no Redis dependency. Good for very high throughput.
Sticky Sessions#
Route same user to same server → local rate limiting works. But breaks if server dies. Not recommended.
Response Headers#
Always tell clients their rate limit status:
HTTP/1.1 200 OK
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 67
X-RateLimit-Reset: 1711616400
Retry-After: 30 (only on 429)
Architecture Example#
API Rate Limiting#
Client → Load Balancer → Rate Limiter (Redis)
↓ allowed
API Gateway → Service A
→ Service B
↓ rejected
429 Too Many Requests
Multi-Tier Limits#
Global: 10,000 req/min (all users combined)
Per-user: 100 req/min (per API key)
Per-endpoint: 10 req/min (expensive operations like /export)
Tools#
| Tool | Type | Best For |
|---|---|---|
| Redis | Custom implementation | Full control, any algorithm |
| Upstash Ratelimit | Managed SDK | Serverless, edge-compatible |
| Kong | API gateway plugin | Drop-in rate limiting |
| Nginx | Reverse proxy | Simple req/sec limiting |
| Cloudflare | Edge rate limiting | DDoS protection + rate limits |
Best Practices#
- Use sliding window counter as default — good accuracy, low resources
- Token bucket when you want controlled bursts
- Rate limit by API key for public APIs, by user ID for authenticated
- Always return rate limit headers — clients can self-throttle
- Separate limits for expensive operations (search, export, AI generation)
- Graceful degradation — consider returning cached/stale data instead of 429
- Log rate limit events — detect abuse patterns
Summary#
| Need | Algorithm |
|---|---|
| Simple, low memory | Fixed window |
| No boundary bursts | Sliding window counter |
| Allow controlled bursts | Token bucket |
| Perfectly smooth output | Leaky bucket |
| Perfect accuracy | Sliding window log |
Design rate limiting into your architecture at codelit.io — generate interactive diagrams with security audits and infrastructure exports.
Try it on Codelit
GitHub Integration
Paste any repo URL to generate an interactive architecture diagram from real code
Related articles
Try these templates
OpenAI API Request Pipeline
7-stage pipeline from API call to token generation, handling millions of requests per minute.
8 componentsDistributed Rate Limiter
API rate limiting with sliding window, token bucket, and per-user quotas.
7 componentsAPI Gateway Platform
Kong/AWS API Gateway-like platform with routing, auth, rate limiting, transformation, and developer portal.
8 componentsBuild this architecture
Generate an interactive architecture for Rate Limiting Algorithms in seconds.
Try it in Codelit →
Comments