rate limitingrate limiting algorithmstoken bucketsliding windowleaky bucketRedisAPI designdistributed systems

Rate Limiting Algorithms Compared: Token Bucket, Sliding Window & More

March 29, 2026 7 min readBy Codelit Team Discussion

Every production API needs rate limiting. Without it, a single misbehaving client can exhaust your resources, degrade service for everyone, and run up your infrastructure bill. The question is not whether to rate limit — it is which algorithm to use. Each makes different trade-offs between accuracy, memory, and implementation complexity.

Fixed Window#

The simplest algorithm. Divide time into fixed intervals (e.g., 1-minute windows) and count requests per window:

Window: 12:00:00 - 12:00:59 → count: 47/100
Window: 12:01:00 - 12:01:59 → count: 12/100

Implementation: A single counter per key, reset at the start of each window.

def is_allowed(key, limit, window_seconds):
    window = int(time.time() / window_seconds)
    counter_key = f"{key}:{window}"
    count = redis.incr(counter_key)
    if count == 1:
        redis.expire(counter_key, window_seconds)
    return count <= limit

Pros: Minimal memory (one counter per key). Simple to implement.

Cons: Boundary burst problem. A client can send 100 requests at 12:00:59 and 100 more at 12:01:00 — 200 requests in 2 seconds while respecting the 100/minute limit in both windows.

Sliding Window Log#

Track the exact timestamp of every request. To check the limit, count timestamps within the last N seconds:

def is_allowed(key, limit, window_seconds):
    now = time.time()
    cutoff = now - window_seconds

    # Remove expired entries
    redis.zremrangebyscore(key, 0, cutoff)

    # Check count
    count = redis.zcard(key)
    if count < limit:
        redis.zadd(key, {str(now): now})
        redis.expire(key, window_seconds)
        return True
    return False

Pros: Perfectly accurate. No boundary burst problem.

Cons: High memory usage. Storing every request timestamp is expensive at scale. If you allow 10,000 requests/minute, you store 10,000 entries per key.

Sliding Window Counter#

A hybrid that combines fixed window efficiency with sliding window accuracy. Estimate the count using a weighted average of the current and previous window:

Previous window count: 84 (weight: 0.4 — 40% of window elapsed)
Current window count:  30 (weight: 1.0)

Estimated rate = 84 × 0.4 + 30 = 63.6

Implementation:

def is_allowed(key, limit, window_seconds):
    now = time.time()
    current_window = int(now / window_seconds)
    previous_window = current_window - 1
    elapsed = (now % window_seconds) / window_seconds

    prev_count = int(redis.get(f"{key}:{previous_window}") or 0)
    curr_count = int(redis.get(f"{key}:{current_window}") or 0)

    estimated = prev_count * (1 - elapsed) + curr_count
    if estimated < limit:
        redis.incr(f"{key}:{current_window}")
        redis.expire(f"{key}:{current_window}", window_seconds * 2)
        return True
    return False

Pros: Low memory (two counters per key). Smooths the boundary burst problem. Good enough for most use cases.

Cons: Approximate — not perfectly accurate, but the error is small and bounded.

Token Bucket#

Imagine a bucket that holds tokens. Tokens are added at a fixed rate. Each request consumes one token. If the bucket is empty, the request is rejected:

Bucket capacity: 10 tokens
Refill rate: 2 tokens/second
Current tokens: 5

Request arrives → 5 > 0 → allowed → tokens = 4

The token bucket allows bursts up to the bucket capacity while enforcing an average rate equal to the refill rate.

def is_allowed(key, capacity, refill_rate):
    now = time.time()
    bucket = redis.hgetall(key)

    tokens = float(bucket.get('tokens', capacity))
    last_refill = float(bucket.get('last_refill', now))

    elapsed = now - last_refill
    tokens = min(capacity, tokens + elapsed * refill_rate)

    if tokens >= 1:
        tokens -= 1
        redis.hset(key, mapping={'tokens': tokens, 'last_refill': now})
        redis.expire(key, int(capacity / refill_rate) + 1)
        return True
    else:
        redis.hset(key, mapping={'tokens': tokens, 'last_refill': now})
        return False

Pros: Allows controlled bursts. Intuitive to configure (capacity = burst size, refill rate = sustained rate). Used by AWS, Stripe, and most major APIs.

Cons: Slightly more complex state (two values per key). Burst behavior can be surprising if not documented.

Leaky Bucket#

The leaky bucket is the inverse of the token bucket. Requests enter a queue (the bucket). The queue drains at a fixed rate. If the queue is full, new requests are rejected:

Queue capacity: 10
Drain rate: 2 requests/second
Current queue: 7

Request arrives → 7 < 10 → enqueued → queue = 8

Pros: Produces a perfectly smooth output rate. No bursts. Ideal for APIs that need strict rate control.

Cons: Does not allow any burst, even small ones. Adds latency because requests wait in the queue. Less commonly used than token bucket.

Algorithm Comparison#

Algorithm	Memory	Accuracy	Burst Handling	Complexity
Fixed Window	Very low	Low (boundary bursts)	Allows 2x burst at boundary	Trivial
Sliding Window Log	High	Perfect	No bursts	Low
Sliding Window Counter	Low	Good (approximate)	Minor boundary smoothing	Low
Token Bucket	Low	Good	Controlled bursts	Medium
Leaky Bucket	Low	Perfect	No bursts (smooth output)	Medium

Recommendation: Start with sliding window counter for simplicity. Use token bucket when you want to allow controlled bursts. Use leaky bucket when you need smooth, predictable output.

Distributed Rate Limiting#

All examples above assume a single Redis instance. In a distributed system, you face additional challenges:

Race conditions — Two servers may read the same count, both allow, and exceed the limit.
Clock skew — Different servers may disagree on the current window.
Network latency — Round-trips to Redis add latency to every request.

Redis Lua Scripts#

Lua scripts execute atomically in Redis, eliminating race conditions. Here is a token bucket implemented as a Lua script:

-- KEYS[1] = rate limit key
-- ARGV[1] = capacity, ARGV[2] = refill_rate, ARGV[3] = now

local key = KEYS[1]
local capacity = tonumber(ARGV[1])
local refill_rate = tonumber(ARGV[2])
local now = tonumber(ARGV[3])

local data = redis.call('HMGET', key, 'tokens', 'last_refill')
local tokens = tonumber(data[1]) or capacity
local last_refill = tonumber(data[2]) or now

local elapsed = now - last_refill
tokens = math.min(capacity, tokens + elapsed * refill_rate)

local allowed = 0
if tokens >= 1 then
    tokens = tokens - 1
    allowed = 1
end

redis.call('HMSET', key, 'tokens', tokens, 'last_refill', now)
redis.call('EXPIRE', key, math.ceil(capacity / refill_rate) + 1)

return allowed

Call it from your application:

allowed = redis.eval(LUA_SCRIPT, 1, key, capacity, refill_rate, time.time())

The entire read-modify-write cycle happens in a single atomic operation. No race conditions, no distributed locks.

Multi-Region Rate Limiting#

For globally distributed APIs, you have two options:

Centralized Redis — All regions talk to one Redis cluster. Simple but adds cross-region latency.
Local counters with sync — Each region maintains local counters and periodically syncs to a global store. Allows slight over-limit but eliminates cross-region latency on the hot path.

Most systems choose local counters with async sync and accept a small margin of error.

Response Headers#

Always communicate rate limit status to clients via headers:

X-RateLimit-Limit: 100
X-RateLimit-Remaining: 42
X-RateLimit-Reset: 1711756800
Retry-After: 30

Return HTTP 429 Too Many Requests when the limit is exceeded. Include Retry-After so clients know when to retry.

Wrapping Up#

Rate limiting is a foundational building block for any production API. Fixed window is too naive for most use cases. Sliding window counter gives you the best balance of simplicity and accuracy. Token bucket is the industry standard when you want burst tolerance. Redis Lua scripts solve the distributed atomicity problem cleanly. Choose the algorithm that matches your traffic pattern, implement it atomically, and always tell your clients what the limits are.

306 articles and guides at codelit.io/blog.

Try it on Codelit

GitHub Integration

Paste any repo URL to generate an interactive architecture diagram from real code

Build this architecture →

Comments

API design

API Backward Compatibility: Ship Changes Without Breaking Consumers

6 min read

API design

API Design for Mobile: Bandwidth, Offline-First, and Beyond

7 min read

API gateway authentication

API Gateway Authentication: Securing Your Services at the Edge

7 min read

Try these templates

Distributed Rate Limiter

API rate limiting with sliding window, token bucket, and per-user quotas.

7 components

Build this architecture

Generate an interactive architecture for Rate Limiting Algorithms Compared in seconds.

Try it in Codelit →

rate limitingrate limiting algorithmstoken bucketsliding windowleaky bucketRedisAPI designdistributed systems

Rate Limiting Algorithms Compared: Token Bucket, Sliding Window & More

March 29, 2026 7 min readBy Codelit Team Discussion

Fixed Window#

The simplest algorithm. Divide time into fixed intervals (e.g., 1-minute windows) and count requests per window:

Window: 12:00:00 - 12:00:59 → count: 47/100
Window: 12:01:00 - 12:01:59 → count: 12/100

Implementation: A single counter per key, reset at the start of each window.

def is_allowed(key, limit, window_seconds):
    window = int(time.time() / window_seconds)
    counter_key = f"{key}:{window}"
    count = redis.incr(counter_key)
    if count == 1:
        redis.expire(counter_key, window_seconds)
    return count <= limit

Pros: Minimal memory (one counter per key). Simple to implement.

Cons: Boundary burst problem. A client can send 100 requests at 12:00:59 and 100 more at 12:01:00 — 200 requests in 2 seconds while respecting the 100/minute limit in both windows.

Sliding Window Log#

Track the exact timestamp of every request. To check the limit, count timestamps within the last N seconds:

def is_allowed(key, limit, window_seconds):
    now = time.time()
    cutoff = now - window_seconds

    # Remove expired entries
    redis.zremrangebyscore(key, 0, cutoff)

    # Check count
    count = redis.zcard(key)
    if count < limit:
        redis.zadd(key, {str(now): now})
        redis.expire(key, window_seconds)
        return True
    return False

Pros: Perfectly accurate. No boundary burst problem.

Cons: High memory usage. Storing every request timestamp is expensive at scale. If you allow 10,000 requests/minute, you store 10,000 entries per key.

Sliding Window Counter#

A hybrid that combines fixed window efficiency with sliding window accuracy. Estimate the count using a weighted average of the current and previous window:

Previous window count: 84 (weight: 0.4 — 40% of window elapsed)
Current window count:  30 (weight: 1.0)

Estimated rate = 84 × 0.4 + 30 = 63.6

Implementation:

def is_allowed(key, limit, window_seconds):
    now = time.time()
    current_window = int(now / window_seconds)
    previous_window = current_window - 1
    elapsed = (now % window_seconds) / window_seconds

    prev_count = int(redis.get(f"{key}:{previous_window}") or 0)
    curr_count = int(redis.get(f"{key}:{current_window}") or 0)

    estimated = prev_count * (1 - elapsed) + curr_count
    if estimated < limit:
        redis.incr(f"{key}:{current_window}")
        redis.expire(f"{key}:{current_window}", window_seconds * 2)
        return True
    return False

Pros: Low memory (two counters per key). Smooths the boundary burst problem. Good enough for most use cases.

Cons: Approximate — not perfectly accurate, but the error is small and bounded.

Token Bucket#

Imagine a bucket that holds tokens. Tokens are added at a fixed rate. Each request consumes one token. If the bucket is empty, the request is rejected:

Bucket capacity: 10 tokens
Refill rate: 2 tokens/second
Current tokens: 5

Request arrives → 5 > 0 → allowed → tokens = 4

The token bucket allows bursts up to the bucket capacity while enforcing an average rate equal to the refill rate.

def is_allowed(key, capacity, refill_rate):
    now = time.time()
    bucket = redis.hgetall(key)

    tokens = float(bucket.get('tokens', capacity))
    last_refill = float(bucket.get('last_refill', now))

    elapsed = now - last_refill
    tokens = min(capacity, tokens + elapsed * refill_rate)

    if tokens >= 1:
        tokens -= 1
        redis.hset(key, mapping={'tokens': tokens, 'last_refill': now})
        redis.expire(key, int(capacity / refill_rate) + 1)
        return True
    else:
        redis.hset(key, mapping={'tokens': tokens, 'last_refill': now})
        return False

Pros: Allows controlled bursts. Intuitive to configure (capacity = burst size, refill rate = sustained rate). Used by AWS, Stripe, and most major APIs.

Cons: Slightly more complex state (two values per key). Burst behavior can be surprising if not documented.

Leaky Bucket#

The leaky bucket is the inverse of the token bucket. Requests enter a queue (the bucket). The queue drains at a fixed rate. If the queue is full, new requests are rejected:

Queue capacity: 10
Drain rate: 2 requests/second
Current queue: 7

Request arrives → 7 < 10 → enqueued → queue = 8

Pros: Produces a perfectly smooth output rate. No bursts. Ideal for APIs that need strict rate control.

Cons: Does not allow any burst, even small ones. Adds latency because requests wait in the queue. Less commonly used than token bucket.

Algorithm Comparison#

Algorithm	Memory	Accuracy	Burst Handling	Complexity
Fixed Window	Very low	Low (boundary bursts)	Allows 2x burst at boundary	Trivial
Sliding Window Log	High	Perfect	No bursts	Low
Sliding Window Counter	Low	Good (approximate)	Minor boundary smoothing	Low
Token Bucket	Low	Good	Controlled bursts	Medium
Leaky Bucket	Low	Perfect	No bursts (smooth output)	Medium

Recommendation: Start with sliding window counter for simplicity. Use token bucket when you want to allow controlled bursts. Use leaky bucket when you need smooth, predictable output.

Distributed Rate Limiting#

All examples above assume a single Redis instance. In a distributed system, you face additional challenges:

Race conditions — Two servers may read the same count, both allow, and exceed the limit.
Clock skew — Different servers may disagree on the current window.
Network latency — Round-trips to Redis add latency to every request.

Redis Lua Scripts#

Lua scripts execute atomically in Redis, eliminating race conditions. Here is a token bucket implemented as a Lua script:

-- KEYS[1] = rate limit key
-- ARGV[1] = capacity, ARGV[2] = refill_rate, ARGV[3] = now

local key = KEYS[1]
local capacity = tonumber(ARGV[1])
local refill_rate = tonumber(ARGV[2])
local now = tonumber(ARGV[3])

local data = redis.call('HMGET', key, 'tokens', 'last_refill')
local tokens = tonumber(data[1]) or capacity
local last_refill = tonumber(data[2]) or now

local elapsed = now - last_refill
tokens = math.min(capacity, tokens + elapsed * refill_rate)

local allowed = 0
if tokens >= 1 then
    tokens = tokens - 1
    allowed = 1
end

redis.call('HMSET', key, 'tokens', tokens, 'last_refill', now)
redis.call('EXPIRE', key, math.ceil(capacity / refill_rate) + 1)

return allowed

Call it from your application:

allowed = redis.eval(LUA_SCRIPT, 1, key, capacity, refill_rate, time.time())

The entire read-modify-write cycle happens in a single atomic operation. No race conditions, no distributed locks.

Multi-Region Rate Limiting#

For globally distributed APIs, you have two options:

Centralized Redis — All regions talk to one Redis cluster. Simple but adds cross-region latency.
Local counters with sync — Each region maintains local counters and periodically syncs to a global store. Allows slight over-limit but eliminates cross-region latency on the hot path.

Most systems choose local counters with async sync and accept a small margin of error.

Response Headers#

Always communicate rate limit status to clients via headers:

X-RateLimit-Limit: 100
X-RateLimit-Remaining: 42
X-RateLimit-Reset: 1711756800
Retry-After: 30

Return HTTP 429 Too Many Requests when the limit is exceeded. Include Retry-After so clients know when to retry.

Wrapping Up#

306 articles and guides at codelit.io/blog.

Try it on Codelit

GitHub Integration

Paste any repo URL to generate an interactive architecture diagram from real code

Build this architecture →

Comments

API design

Try these templates

Distributed Rate Limiter

API rate limiting with sliding window, token bucket, and per-user quotas.

7 components

Build this architecture

Generate an interactive architecture for Rate Limiting Algorithms Compared in seconds.

Try it in Codelit →

Rate Limiting Algorithms Compared: Token Bucket, Sliding Window & More

Fixed Window#

Sliding Window Log#

Sliding Window Counter#

Token Bucket#

Leaky Bucket#

Algorithm Comparison#

Distributed Rate Limiting#

Redis Lua Scripts#

Multi-Region Rate Limiting#

Response Headers#

Wrapping Up#

Comments

Related articles

API Backward Compatibility: Ship Changes Without Breaking Consumers

API Design for Mobile: Bandwidth, Offline-First, and Beyond

API Gateway Authentication: Securing Your Services at the Edge

Try these templates

Distributed Rate Limiter

Build this architecture

Rate Limiting Algorithms Compared: Token Bucket, Sliding Window & More

Fixed Window#

Sliding Window Log#

Sliding Window Counter#

Token Bucket#

Leaky Bucket#

Algorithm Comparison#

Distributed Rate Limiting#

Redis Lua Scripts#

Multi-Region Rate Limiting#

Response Headers#

Wrapping Up#

Comments

Related articles

API Backward Compatibility: Ship Changes Without Breaking Consumers

API Design for Mobile: Bandwidth, Offline-First, and Beyond

API Gateway Authentication: Securing Your Services at the Edge

Try these templates

Distributed Rate Limiter

Build this architecture