Rate Limiting Algorithms Compared: Token Bucket, Sliding Window & More
Every production API needs rate limiting. Without it, a single misbehaving client can exhaust your resources, degrade service for everyone, and run up your infrastructure bill. The question is not whether to rate limit — it is which algorithm to use. Each makes different trade-offs between accuracy, memory, and implementation complexity.
Fixed Window#
The simplest algorithm. Divide time into fixed intervals (e.g., 1-minute windows) and count requests per window:
Window: 12:00:00 - 12:00:59 → count: 47/100
Window: 12:01:00 - 12:01:59 → count: 12/100
Implementation: A single counter per key, reset at the start of each window.
def is_allowed(key, limit, window_seconds):
window = int(time.time() / window_seconds)
counter_key = f"{key}:{window}"
count = redis.incr(counter_key)
if count == 1:
redis.expire(counter_key, window_seconds)
return count <= limit
Pros: Minimal memory (one counter per key). Simple to implement.
Cons: Boundary burst problem. A client can send 100 requests at 12:00:59 and 100 more at 12:01:00 — 200 requests in 2 seconds while respecting the 100/minute limit in both windows.
Sliding Window Log#
Track the exact timestamp of every request. To check the limit, count timestamps within the last N seconds:
def is_allowed(key, limit, window_seconds):
now = time.time()
cutoff = now - window_seconds
# Remove expired entries
redis.zremrangebyscore(key, 0, cutoff)
# Check count
count = redis.zcard(key)
if count < limit:
redis.zadd(key, {str(now): now})
redis.expire(key, window_seconds)
return True
return False
Pros: Perfectly accurate. No boundary burst problem.
Cons: High memory usage. Storing every request timestamp is expensive at scale. If you allow 10,000 requests/minute, you store 10,000 entries per key.
Sliding Window Counter#
A hybrid that combines fixed window efficiency with sliding window accuracy. Estimate the count using a weighted average of the current and previous window:
Previous window count: 84 (weight: 0.4 — 40% of window elapsed)
Current window count: 30 (weight: 1.0)
Estimated rate = 84 × 0.4 + 30 = 63.6
Implementation:
def is_allowed(key, limit, window_seconds):
now = time.time()
current_window = int(now / window_seconds)
previous_window = current_window - 1
elapsed = (now % window_seconds) / window_seconds
prev_count = int(redis.get(f"{key}:{previous_window}") or 0)
curr_count = int(redis.get(f"{key}:{current_window}") or 0)
estimated = prev_count * (1 - elapsed) + curr_count
if estimated < limit:
redis.incr(f"{key}:{current_window}")
redis.expire(f"{key}:{current_window}", window_seconds * 2)
return True
return False
Pros: Low memory (two counters per key). Smooths the boundary burst problem. Good enough for most use cases.
Cons: Approximate — not perfectly accurate, but the error is small and bounded.
Token Bucket#
Imagine a bucket that holds tokens. Tokens are added at a fixed rate. Each request consumes one token. If the bucket is empty, the request is rejected:
Bucket capacity: 10 tokens
Refill rate: 2 tokens/second
Current tokens: 5
Request arrives → 5 > 0 → allowed → tokens = 4
The token bucket allows bursts up to the bucket capacity while enforcing an average rate equal to the refill rate.
def is_allowed(key, capacity, refill_rate):
now = time.time()
bucket = redis.hgetall(key)
tokens = float(bucket.get('tokens', capacity))
last_refill = float(bucket.get('last_refill', now))
elapsed = now - last_refill
tokens = min(capacity, tokens + elapsed * refill_rate)
if tokens >= 1:
tokens -= 1
redis.hset(key, mapping={'tokens': tokens, 'last_refill': now})
redis.expire(key, int(capacity / refill_rate) + 1)
return True
else:
redis.hset(key, mapping={'tokens': tokens, 'last_refill': now})
return False
Pros: Allows controlled bursts. Intuitive to configure (capacity = burst size, refill rate = sustained rate). Used by AWS, Stripe, and most major APIs.
Cons: Slightly more complex state (two values per key). Burst behavior can be surprising if not documented.
Leaky Bucket#
The leaky bucket is the inverse of the token bucket. Requests enter a queue (the bucket). The queue drains at a fixed rate. If the queue is full, new requests are rejected:
Queue capacity: 10
Drain rate: 2 requests/second
Current queue: 7
Request arrives → 7 < 10 → enqueued → queue = 8
Pros: Produces a perfectly smooth output rate. No bursts. Ideal for APIs that need strict rate control.
Cons: Does not allow any burst, even small ones. Adds latency because requests wait in the queue. Less commonly used than token bucket.
Algorithm Comparison#
| Algorithm | Memory | Accuracy | Burst Handling | Complexity |
|---|---|---|---|---|
| Fixed Window | Very low | Low (boundary bursts) | Allows 2x burst at boundary | Trivial |
| Sliding Window Log | High | Perfect | No bursts | Low |
| Sliding Window Counter | Low | Good (approximate) | Minor boundary smoothing | Low |
| Token Bucket | Low | Good | Controlled bursts | Medium |
| Leaky Bucket | Low | Perfect | No bursts (smooth output) | Medium |
Recommendation: Start with sliding window counter for simplicity. Use token bucket when you want to allow controlled bursts. Use leaky bucket when you need smooth, predictable output.
Distributed Rate Limiting#
All examples above assume a single Redis instance. In a distributed system, you face additional challenges:
- Race conditions — Two servers may read the same count, both allow, and exceed the limit.
- Clock skew — Different servers may disagree on the current window.
- Network latency — Round-trips to Redis add latency to every request.
Redis Lua Scripts#
Lua scripts execute atomically in Redis, eliminating race conditions. Here is a token bucket implemented as a Lua script:
-- KEYS[1] = rate limit key
-- ARGV[1] = capacity, ARGV[2] = refill_rate, ARGV[3] = now
local key = KEYS[1]
local capacity = tonumber(ARGV[1])
local refill_rate = tonumber(ARGV[2])
local now = tonumber(ARGV[3])
local data = redis.call('HMGET', key, 'tokens', 'last_refill')
local tokens = tonumber(data[1]) or capacity
local last_refill = tonumber(data[2]) or now
local elapsed = now - last_refill
tokens = math.min(capacity, tokens + elapsed * refill_rate)
local allowed = 0
if tokens >= 1 then
tokens = tokens - 1
allowed = 1
end
redis.call('HMSET', key, 'tokens', tokens, 'last_refill', now)
redis.call('EXPIRE', key, math.ceil(capacity / refill_rate) + 1)
return allowed
Call it from your application:
allowed = redis.eval(LUA_SCRIPT, 1, key, capacity, refill_rate, time.time())
The entire read-modify-write cycle happens in a single atomic operation. No race conditions, no distributed locks.
Multi-Region Rate Limiting#
For globally distributed APIs, you have two options:
- Centralized Redis — All regions talk to one Redis cluster. Simple but adds cross-region latency.
- Local counters with sync — Each region maintains local counters and periodically syncs to a global store. Allows slight over-limit but eliminates cross-region latency on the hot path.
Most systems choose local counters with async sync and accept a small margin of error.
Response Headers#
Always communicate rate limit status to clients via headers:
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 42
X-RateLimit-Reset: 1711756800
Retry-After: 30
Return HTTP 429 Too Many Requests when the limit is exceeded. Include Retry-After so clients know when to retry.
Wrapping Up#
Rate limiting is a foundational building block for any production API. Fixed window is too naive for most use cases. Sliding window counter gives you the best balance of simplicity and accuracy. Token bucket is the industry standard when you want burst tolerance. Redis Lua scripts solve the distributed atomicity problem cleanly. Choose the algorithm that matches your traffic pattern, implement it atomically, and always tell your clients what the limits are.
306 articles and guides at codelit.io/blog.
Try it on Codelit
GitHub Integration
Paste any repo URL to generate an interactive architecture diagram from real code
Related articles
Try these templates
Build this architecture
Generate an interactive architecture for Rate Limiting Algorithms Compared in seconds.
Try it in Codelit →
Comments