Rate Limiting & Throttling
TL;DR
Rate limiting: Limit requests per time window. Algorithms: Token bucket (flexible), leaky bucket (smooth), fixed/sliding window. Use cases: API protection, DDoS prevention, fair usage.
Algorithms
1. Token Bucket
Bucket capacity: 100 tokens
Refill rate: 10 tokens/second
Request arrives → consume 1 token
If tokens available → allow
If empty → reject (429 Too Many Requests)
Pros: Handles bursts (100 requests at once if bucket full)
Cons: Allows bursts
2. Leaky Bucket
Bucket capacity: 100 requests
Process rate: 10 requests/second
Requests queue in bucket
Process at constant rate (smooth traffic)
If full → reject
Pros: Smooth traffic (no bursts)
Cons: Doesn't allow bursts
3. Fixed Window
Window: 1 minute (00:00-01:00)
Limit: 100 requests
Count requests in current minute
Reset counter at 01:00
Pros: Simple
Cons: Burst at window boundary (100 at 00:59, 100 at 01:00 = 200 in 1 second)
4. Sliding Window
Track requests with timestamps
Count requests in last 60 seconds (rolling window)
Pros: Accurate, no bursts
Cons: Memory overhead (store timestamps)
Implementation (Redis)
def rate_limit_sliding_window(user_id, limit=100, window=60):
now = time.time()
key = f"rate_limit:{user_id}"
# Remove old entries (outside window)
redis.zremrangebyscore(key, 0, now - window)
# Count requests in window
count = redis.zcard(key)
if count >= limit:
raise RateLimitExceeded()
# Add current request
redis.zadd(key, {str(uuid.uuid4()): now})
redis.expire(key, window)
Common Interview Questions
Q1: "How would you implement rate limiting?"
Answer:
- Algorithm: Token bucket or sliding window
- Storage: Redis (distributed, fast)
- Key:
user_id:api_endpointorip_address - Response: 429 Too Many Requests with
Retry-Afterheader
Q2: "Token bucket vs leaky bucket?"
Answer:
- Token bucket: Allows bursts (good for UX)
- Leaky bucket: Smooth traffic (good for backend protection)
- Most common: Token bucket (better user experience)
Q3: "How do you rate limit across multiple servers?"
Answer:
- Centralized: Redis stores counters (all servers check Redis)
- Distributed: Each server tracks quota, sync periodically (eventual consistency)
- Prefer: Centralized with Redis (accurate)
Quick Reference
Algorithms:
- Token bucket: Flexible, allows bursts (most common)
- Leaky bucket: Smooth, no bursts
- Sliding window: Accurate, memory overhead
Implementation: Redis with TTL
Response: 429 + Retry-After header
Next: Geo-Distribution.