Rate Limiting Strategies — Token Bucket, Sliding Window, and How to Not Get Hacked
Why rate limiting isn't optional#
Without rate limiting, one bad actor can:
- DDoS your API with millions of requests
- Brute-force passwords by trying thousands per second
- Scrape your entire database through your public API
- Run up your cloud bill by triggering expensive operations
Rate limiting is security, reliability, and cost control in one feature.
The algorithms#
Fixed window#
Count requests in fixed time windows (e.g., per minute).
How it works: At the start of each minute, reset the counter to 0. Increment on each request. Reject when the counter hits the limit.
Problem: Burst at window boundaries. A user can send 100 requests at :59 and 100 more at :00 — 200 requests in 2 seconds despite a "100 per minute" limit.
Sliding window log#
Track the timestamp of every request. Count requests in the last N seconds.
How it works: On each request, remove timestamps older than the window, count remaining, reject if over limit.
Accurate but expensive — storing every timestamp uses a lot of memory at scale.
Sliding window counter#
Combine fixed window with weighted overlap. The best balance of accuracy and efficiency.
How it works: Use two fixed windows. Weight the previous window by the overlap percentage.
Current window: 40 requests (60% through)
Previous window: 80 requests (40% remaining weight)
Estimated rate: 40 + (80 × 0.4) = 72
Limit: 100 → Allow
Used by: Most production rate limiters. Good accuracy, constant memory.
Token bucket#
A bucket holds tokens. Each request takes one. Tokens refill at a fixed rate.
How it works:
- Bucket capacity: 100 tokens
- Refill rate: 10 tokens/second
- Each request consumes 1 token
- If bucket is empty → reject
Allows bursts: A full bucket allows 100 rapid requests (burst), then settles to 10/second (sustained rate).
Used by: AWS API Gateway, Stripe, most cloud APIs.
Leaky bucket#
Requests enter a queue (bucket). The queue drains at a fixed rate.
How it works: Incoming requests join the queue. If the queue is full → reject. Requests are processed at a constant rate regardless of arrival pattern.
Smooths traffic: Unlike token bucket, leaky bucket enforces a constant output rate. Good for protecting downstream services from bursts.
Implementation with Redis#
// Token bucket with Redis
async function checkRateLimit(userId: string): Promise<boolean> {
const key = `ratelimit:${userId}`;
const now = Date.now();
const limit = 100;
const windowMs = 60_000;
const result = await redis.multi()
.zremrangebyscore(key, 0, now - windowMs) // Remove old entries
.zcard(key) // Count current
.zadd(key, now, `${now}`) // Add this request
.expire(key, 60) // TTL cleanup
.exec();
const count = result[1] as number;
return count < limit;
}
What to rate limit by#
| Target | When |
|---|---|
| IP address | Anonymous users, public APIs |
| API key | Authenticated API consumers |
| User ID | Logged-in users |
| Endpoint | Protect expensive operations specifically |
| Combination | IP + endpoint for login pages |
Response headers#
Tell clients their rate limit status:
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 47
X-RateLimit-Reset: 1679616000
Retry-After: 30
Return 429 Too Many Requests with Retry-After so well-behaved clients can back off automatically.
See rate limiting in your architecture#
On Codelit, search "rate limiter" in ⌘K to see a complete distributed rate limiting architecture — Redis counters, rules config, analytics, and API gateway integration.
Design rate-limited APIs: open ⌘K on Codelit.io and load the rate limiter architecture instantly.
Try it on Codelit
Chaos Mode
Simulate node failures and watch cascading impact across your architecture
Related articles
Batch API Endpoints — Patterns for Bulk Operations, Partial Success, and Idempotency
8 min read
system designCircuit Breaker Implementation — State Machine, Failure Counting, Fallbacks, and Resilience4j
7 min read
testingAPI Contract Testing with Pact — Consumer-Driven Contracts for Microservices
8 min read
Try these templates
OpenAI API Request Pipeline
7-stage pipeline from API call to token generation, handling millions of requests per minute.
8 componentsDistributed Rate Limiter
API rate limiting with sliding window, token bucket, and per-user quotas.
7 componentsAPI Gateway Platform
Kong/AWS API Gateway-like platform with routing, auth, rate limiting, transformation, and developer portal.
8 componentsBuild this architecture
Generate an interactive architecture for Rate Limiting Strategies in seconds.
Try it in Codelit →
Comments