rate limitersystem designRedisdistributed systemsarchitecture

Rate Limiter System Design: Algorithms, Distributed Redis, and Scale

March 28, 2026 7 min readBy Codelit Team Discussion

Rate Limiter System Design#

A rate limiter controls how many requests a client can send in a given time window. It protects services from abuse, prevents resource starvation, and manages cost — every major API (Stripe, GitHub, Twitter) enforces rate limits.

Functional Requirements#

Limit requests per client/IP/API key within a configurable time window
Return clear feedback when a client is throttled (HTTP 429 + headers)
Support multiple rules — e.g., 100 req/min for /api/search, 1000 req/min for /api/read

Non-Functional Requirements#

Low latency — the limiter sits on the hot path; must add < 1 ms overhead
Highly available — if the limiter goes down, traffic should not be blocked
Distributed — works across multiple servers behind a load balancer
Accurate — minimal over-counting or under-counting

Scale Estimation#

DAU: 50 M
Avg requests per user per day: 20
Total daily requests: 1 B
Peak QPS: ~30,000
Rate limit check per request: 1 Redis call ≈ 0.5 ms

Each rate limit check must be atomic and sub-millisecond.

Where to Place the Rate Limiter#

Client → API Gateway / Rate Limiter Middleware → Application Server → DB

Three options:

Placement	Pros	Cons
Client-side	Zero server load	Easily bypassed
Middleware / Gateway	Centralized, language-agnostic	Extra hop
Application layer	Fine-grained per-endpoint rules	Coupled to app

Recommended: Rate limit at the API gateway or middleware layer. Most cloud gateways (AWS API Gateway, Kong, Envoy) have built-in rate limiting.

Rate Limiting Algorithms#

1. Token Bucket#

The most widely used algorithm (used by AWS, Stripe).

Bucket capacity: B tokens
Refill rate: R tokens per second
Each request consumes 1 token
If tokens > 0: allow, decrement
If tokens == 0: reject (429)

Pros: Allows short bursts up to B, smooth long-term rate of R/s. Cons: Two parameters to tune per rule.

class TokenBucket:
    def __init__(self, capacity, refill_rate):
        self.capacity = capacity
        self.tokens = capacity
        self.refill_rate = refill_rate
        self.last_refill = time.time()

    def allow(self):
        now = time.time()
        elapsed = now - self.last_refill
        self.tokens = min(self.capacity, self.tokens + elapsed * self.refill_rate)
        self.last_refill = now
        if self.tokens >= 1:
            self.tokens -= 1
            return True
        return False

2. Sliding Window Log#

Track the exact timestamp of every request in a sorted set.

On each request:
  1. Remove all entries older than (now - window_size)
  2. Count remaining entries
  3. If count < limit: allow, add timestamp
  4. Else: reject

Pros: Perfectly accurate — no boundary issues. Cons: Memory-intensive (stores every timestamp). For 1 M users at 100 req/min, that's 100 M entries.

3. Sliding Window Counter#

A memory-efficient approximation combining fixed windows:

current_window_count × weight + previous_window_count × (1 - weight)

weight = elapsed_time_in_current_window / window_size

Example with a 1-minute window, limit = 100:

Previous minute: 80 requests
Current minute (40s in): 30 requests
Estimated count = 30 × (40/60) + 80 × (20/60) = 20 + 26.7 ≈ 47
47 < 100 → allow

Pros: Low memory (two counters per key), reasonably accurate. Cons: Approximate — can allow ~1% more than the limit.

Algorithm Comparison#

Algorithm	Memory	Accuracy	Burst Handling
Token Bucket	Low	Good	Allows controlled bursts
Sliding Window Log	High	Exact	No bursts beyond limit
Sliding Window Counter	Low	~99%	Smooth approximation

Distributed Rate Limiting with Redis#

In a distributed system, multiple servers must share rate limit state. Redis is the standard choice — single-threaded, atomic operations, sub-millisecond latency.

Token Bucket in Redis (Lua Script)#

-- KEYS[1] = rate limit key
-- ARGV[1] = capacity, ARGV[2] = refill_rate, ARGV[3] = now

local key = KEYS[1]
local capacity = tonumber(ARGV[1])
local refill_rate = tonumber(ARGV[2])
local now = tonumber(ARGV[3])

local bucket = redis.call('HMGET', key, 'tokens', 'last_refill')
local tokens = tonumber(bucket[1]) or capacity
local last_refill = tonumber(bucket[2]) or now

local elapsed = now - last_refill
tokens = math.min(capacity, tokens + elapsed * refill_rate)

if tokens >= 1 then
    tokens = tokens - 1
    redis.call('HMSET', key, 'tokens', tokens, 'last_refill', now)
    redis.call('EXPIRE', key, math.ceil(capacity / refill_rate) * 2)
    return 1  -- allowed
else
    redis.call('HMSET', key, 'tokens', tokens, 'last_refill', now)
    redis.call('EXPIRE', key, math.ceil(capacity / refill_rate) * 2)
    return 0  -- rejected
end

The Lua script executes atomically in Redis — no race conditions.

Sliding Window Counter in Redis#

def is_allowed(redis, key, limit, window_seconds):
    pipe = redis.pipeline()
    now = time.time()
    window_start = now - window_seconds

    pipe.zremrangebyscore(key, 0, window_start)  # prune old
    pipe.zcard(key)                                # count
    pipe.zadd(key, {str(now): now})               # add current
    pipe.expire(key, window_seconds)

    results = pipe.execute()
    count = results[1]
    return count < limit

Handling Race Conditions#

The Read-Then-Write Problem#

Server A: reads count = 99 (limit 100)
Server B: reads count = 99
Server A: increments to 100, allows
Server B: increments to 100, allows  ← should have been rejected!

Solutions#

Redis Lua scripts — atomic read+write in a single command (recommended)
Redis MULTI/EXEC — transaction block, but watch/retry on contention
Redis INCR with TTL — for simple fixed-window counters:

-- Atomic fixed-window counter
MULTI
INCR rate:user123:1711234560
EXPIRE rate:user123:1711234560 60
EXEC
-- Check if INCR result > limit

Rate Limit Headers#

Follow standard conventions so clients can adapt:

HTTP/1.1 429 Too Many Requests
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1711234620
Retry-After: 47

Header	Meaning
`X-RateLimit-Limit`	Max requests in the window
`X-RateLimit-Remaining`	Requests left in current window
`X-RateLimit-Reset`	Unix timestamp when the window resets
`Retry-After`	Seconds until the client should retry

Always return these headers on both successful and rejected responses.

Client-Side Handling#

Well-behaved clients respect rate limits:

async function fetchWithRateLimit(url, options = {}) {
  const res = await fetch(url, options);

  if (res.status === 429) {
    const retryAfter = parseInt(res.headers.get('Retry-After') || '1', 10);
    await new Promise(r => setTimeout(r, retryAfter * 1000));
    return fetchWithRateLimit(url, options);  // retry
  }

  return res;
}

Best practices:

Exponential backoff with jitter on 429s
Read headers proactively — slow down before hitting the limit
Queue requests client-side to stay under the limit

Failure Modes#

Scenario	Behaviour
Redis down	Fail open — allow traffic (availability > accuracy)
Clock skew between servers	Use Redis server time (`TIME` command), not local clocks
Hot key (celebrity user)	Shard by user + endpoint, or use local counters with periodic sync

Architecture Overview#

              ┌───────────────┐
  Client ────►│  API Gateway  │
              │  (rate check) │
              └───────┬───────┘
                      │ Lua script
              ┌───────▼───────┐
              │  Redis Cluster │
              │  (rate state)  │
              └───────────────┘
                      │
              ┌───────▼───────┐
              │  Rules Config  │
              │  (per endpoint │
              │   per tier)    │
              └───────────────┘

Rules are stored in a config service and cached at the gateway. Changes propagate via pub/sub.

Key Takeaways#

Token bucket is the go-to algorithm — simple, allows bursts, low memory
Sliding window counter is the best approximation when you need smooth limiting
Redis Lua scripts solve the distributed race condition problem atomically
Always return rate limit headers — good API citizenship
Fail open when the rate limiter is unavailable — never block legitimate traffic because of infrastructure failure
Separate rate limits by user tier, endpoint, and method for fine-grained control

Design, build, and ship — all in one place. Try codelit.io.

This is article #196 in the Codelit engineering blog.

Try it on Codelit

Chaos Mode

Simulate node failures and watch cascading impact across your architecture

Build this architecture →

Comments

AI agents

Context Engineering for Agentic Systems

2 min read

AI agents

AI Agent Memory Architecture

2 min read

AI agents

Production AI Agent Deployment Checklist

2 min read

Try these templates

Uber Real-Time Location System

Handles 5M+ GPS pings per second using H3 hexagonal geospatial indexing.

6 components

Netflix Video Streaming Architecture

Global video streaming platform with adaptive bitrate, CDN distribution, and recommendation engine.

10 components

E-Commerce Checkout System

Production checkout flow with Stripe payments, inventory management, and fraud detection.

11 components

Build this architecture

Generate an interactive architecture for Rate Limiter System Design in seconds.

Try it in Codelit →

rate limitersystem designRedisdistributed systemsarchitecture

Rate Limiter System Design: Algorithms, Distributed Redis, and Scale

March 28, 2026 7 min readBy Codelit Team Discussion

Rate Limiter System Design#

Functional Requirements#

Limit requests per client/IP/API key within a configurable time window
Return clear feedback when a client is throttled (HTTP 429 + headers)
Support multiple rules — e.g., 100 req/min for /api/search, 1000 req/min for /api/read

Non-Functional Requirements#

Low latency — the limiter sits on the hot path; must add < 1 ms overhead
Highly available — if the limiter goes down, traffic should not be blocked
Distributed — works across multiple servers behind a load balancer
Accurate — minimal over-counting or under-counting

Scale Estimation#

DAU: 50 M
Avg requests per user per day: 20
Total daily requests: 1 B
Peak QPS: ~30,000
Rate limit check per request: 1 Redis call ≈ 0.5 ms

Each rate limit check must be atomic and sub-millisecond.

Where to Place the Rate Limiter#

Client → API Gateway / Rate Limiter Middleware → Application Server → DB

Three options:

Placement	Pros	Cons
Client-side	Zero server load	Easily bypassed
Middleware / Gateway	Centralized, language-agnostic	Extra hop
Application layer	Fine-grained per-endpoint rules	Coupled to app

Recommended: Rate limit at the API gateway or middleware layer. Most cloud gateways (AWS API Gateway, Kong, Envoy) have built-in rate limiting.

Rate Limiting Algorithms#

1. Token Bucket#

The most widely used algorithm (used by AWS, Stripe).

Bucket capacity: B tokens
Refill rate: R tokens per second
Each request consumes 1 token
If tokens > 0: allow, decrement
If tokens == 0: reject (429)

Pros: Allows short bursts up to B, smooth long-term rate of R/s. Cons: Two parameters to tune per rule.

class TokenBucket:
    def __init__(self, capacity, refill_rate):
        self.capacity = capacity
        self.tokens = capacity
        self.refill_rate = refill_rate
        self.last_refill = time.time()

    def allow(self):
        now = time.time()
        elapsed = now - self.last_refill
        self.tokens = min(self.capacity, self.tokens + elapsed * self.refill_rate)
        self.last_refill = now
        if self.tokens >= 1:
            self.tokens -= 1
            return True
        return False

2. Sliding Window Log#

Track the exact timestamp of every request in a sorted set.

On each request:
  1. Remove all entries older than (now - window_size)
  2. Count remaining entries
  3. If count < limit: allow, add timestamp
  4. Else: reject

Pros: Perfectly accurate — no boundary issues. Cons: Memory-intensive (stores every timestamp). For 1 M users at 100 req/min, that's 100 M entries.

3. Sliding Window Counter#

A memory-efficient approximation combining fixed windows:

current_window_count × weight + previous_window_count × (1 - weight)

weight = elapsed_time_in_current_window / window_size

Example with a 1-minute window, limit = 100:

Previous minute: 80 requests
Current minute (40s in): 30 requests
Estimated count = 30 × (40/60) + 80 × (20/60) = 20 + 26.7 ≈ 47
47 < 100 → allow

Pros: Low memory (two counters per key), reasonably accurate. Cons: Approximate — can allow ~1% more than the limit.

Algorithm Comparison#

Algorithm	Memory	Accuracy	Burst Handling
Token Bucket	Low	Good	Allows controlled bursts
Sliding Window Log	High	Exact	No bursts beyond limit
Sliding Window Counter	Low	~99%	Smooth approximation

Distributed Rate Limiting with Redis#

In a distributed system, multiple servers must share rate limit state. Redis is the standard choice — single-threaded, atomic operations, sub-millisecond latency.

Token Bucket in Redis (Lua Script)#

-- KEYS[1] = rate limit key
-- ARGV[1] = capacity, ARGV[2] = refill_rate, ARGV[3] = now

local key = KEYS[1]
local capacity = tonumber(ARGV[1])
local refill_rate = tonumber(ARGV[2])
local now = tonumber(ARGV[3])

local bucket = redis.call('HMGET', key, 'tokens', 'last_refill')
local tokens = tonumber(bucket[1]) or capacity
local last_refill = tonumber(bucket[2]) or now

local elapsed = now - last_refill
tokens = math.min(capacity, tokens + elapsed * refill_rate)

if tokens >= 1 then
    tokens = tokens - 1
    redis.call('HMSET', key, 'tokens', tokens, 'last_refill', now)
    redis.call('EXPIRE', key, math.ceil(capacity / refill_rate) * 2)
    return 1  -- allowed
else
    redis.call('HMSET', key, 'tokens', tokens, 'last_refill', now)
    redis.call('EXPIRE', key, math.ceil(capacity / refill_rate) * 2)
    return 0  -- rejected
end

The Lua script executes atomically in Redis — no race conditions.

Sliding Window Counter in Redis#

def is_allowed(redis, key, limit, window_seconds):
    pipe = redis.pipeline()
    now = time.time()
    window_start = now - window_seconds

    pipe.zremrangebyscore(key, 0, window_start)  # prune old
    pipe.zcard(key)                                # count
    pipe.zadd(key, {str(now): now})               # add current
    pipe.expire(key, window_seconds)

    results = pipe.execute()
    count = results[1]
    return count < limit

Handling Race Conditions#

The Read-Then-Write Problem#

Server A: reads count = 99 (limit 100)
Server B: reads count = 99
Server A: increments to 100, allows
Server B: increments to 100, allows  ← should have been rejected!

Solutions#

Redis Lua scripts — atomic read+write in a single command (recommended)
Redis MULTI/EXEC — transaction block, but watch/retry on contention
Redis INCR with TTL — for simple fixed-window counters:

-- Atomic fixed-window counter
MULTI
INCR rate:user123:1711234560
EXPIRE rate:user123:1711234560 60
EXEC
-- Check if INCR result > limit

Rate Limit Headers#

Follow standard conventions so clients can adapt:

HTTP/1.1 429 Too Many Requests
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1711234620
Retry-After: 47

Header	Meaning
`X-RateLimit-Limit`	Max requests in the window
`X-RateLimit-Remaining`	Requests left in current window
`X-RateLimit-Reset`	Unix timestamp when the window resets
`Retry-After`	Seconds until the client should retry

Always return these headers on both successful and rejected responses.

Client-Side Handling#

Well-behaved clients respect rate limits:

async function fetchWithRateLimit(url, options = {}) {
  const res = await fetch(url, options);

  if (res.status === 429) {
    const retryAfter = parseInt(res.headers.get('Retry-After') || '1', 10);
    await new Promise(r => setTimeout(r, retryAfter * 1000));
    return fetchWithRateLimit(url, options);  // retry
  }

  return res;
}

Best practices:

Exponential backoff with jitter on 429s
Read headers proactively — slow down before hitting the limit
Queue requests client-side to stay under the limit

Failure Modes#

Scenario	Behaviour
Redis down	Fail open — allow traffic (availability > accuracy)
Clock skew between servers	Use Redis server time (`TIME` command), not local clocks
Hot key (celebrity user)	Shard by user + endpoint, or use local counters with periodic sync

Architecture Overview#

              ┌───────────────┐
  Client ────►│  API Gateway  │
              │  (rate check) │
              └───────┬───────┘
                      │ Lua script
              ┌───────▼───────┐
              │  Redis Cluster │
              │  (rate state)  │
              └───────────────┘
                      │
              ┌───────▼───────┐
              │  Rules Config  │
              │  (per endpoint │
              │   per tier)    │
              └───────────────┘

Rules are stored in a config service and cached at the gateway. Changes propagate via pub/sub.

Key Takeaways#

Token bucket is the go-to algorithm — simple, allows bursts, low memory
Sliding window counter is the best approximation when you need smooth limiting
Redis Lua scripts solve the distributed race condition problem atomically
Always return rate limit headers — good API citizenship
Fail open when the rate limiter is unavailable — never block legitimate traffic because of infrastructure failure
Separate rate limits by user tier, endpoint, and method for fine-grained control

Design, build, and ship — all in one place. Try codelit.io.

This is article #196 in the Codelit engineering blog.

Try it on Codelit

Chaos Mode

Simulate node failures and watch cascading impact across your architecture

Build this architecture →

Comments

AI agents

Build this architecture

Generate an interactive architecture for Rate Limiter System Design in seconds.

Try it in Codelit →

Rate Limiter System Design: Algorithms, Distributed Redis, and Scale

Rate Limiter System Design#

Functional Requirements#

Non-Functional Requirements#

Scale Estimation#

Where to Place the Rate Limiter#

Rate Limiting Algorithms#

1. Token Bucket#

2. Sliding Window Log#

3. Sliding Window Counter#

Algorithm Comparison#

Distributed Rate Limiting with Redis#

Token Bucket in Redis (Lua Script)#

Sliding Window Counter in Redis#

Handling Race Conditions#

The Read-Then-Write Problem#

Solutions#

Rate Limit Headers#

Client-Side Handling#

Failure Modes#

Architecture Overview#

Key Takeaways#

Comments

Related articles

Context Engineering for Agentic Systems

AI Agent Memory Architecture

Production AI Agent Deployment Checklist

Try these templates

Uber Real-Time Location System

Netflix Video Streaming Architecture

E-Commerce Checkout System

Build this architecture

Rate Limiter System Design: Algorithms, Distributed Redis, and Scale

Rate Limiter System Design#

Functional Requirements#

Non-Functional Requirements#

Scale Estimation#

Where to Place the Rate Limiter#

Rate Limiting Algorithms#

1. Token Bucket#

2. Sliding Window Log#

3. Sliding Window Counter#

Algorithm Comparison#

Distributed Rate Limiting with Redis#

Token Bucket in Redis (Lua Script)#

Sliding Window Counter in Redis#

Handling Race Conditions#

The Read-Then-Write Problem#

Solutions#

Rate Limit Headers#

Client-Side Handling#

Failure Modes#

Architecture Overview#

Key Takeaways#

Comments

Related articles

Context Engineering for Agentic Systems

AI Agent Memory Architecture

Production AI Agent Deployment Checklist

Try these templates

Uber Real-Time Location System

Netflix Video Streaming Architecture

E-Commerce Checkout System

Build this architecture