Skip to main content

Rate Limiting & Throttling

TL;DR

Rate limiting: Limit requests per time window. Algorithms: Token bucket (flexible), leaky bucket (smooth), fixed/sliding window. Use cases: API protection, DDoS prevention, fair usage.

Algorithms

1. Token Bucket

Bucket capacity: 100 tokens
Refill rate: 10 tokens/second

Request arrives → consume 1 token
If tokens available → allow
If empty → reject (429 Too Many Requests)

Pros: Handles bursts (100 requests at once if bucket full)
Cons: Allows bursts

2. Leaky Bucket

Bucket capacity: 100 requests
Process rate: 10 requests/second

Requests queue in bucket
Process at constant rate (smooth traffic)
If full → reject

Pros: Smooth traffic (no bursts)
Cons: Doesn't allow bursts

3. Fixed Window

Window: 1 minute (00:00-01:00)
Limit: 100 requests

Count requests in current minute
Reset counter at 01:00

Pros: Simple
Cons: Burst at window boundary (100 at 00:59, 100 at 01:00 = 200 in 1 second)

4. Sliding Window

Track requests with timestamps
Count requests in last 60 seconds (rolling window)

Pros: Accurate, no bursts
Cons: Memory overhead (store timestamps)

Implementation (Redis)

def rate_limit_sliding_window(user_id, limit=100, window=60):
now = time.time()
key = f"rate_limit:{user_id}"

# Remove old entries (outside window)
redis.zremrangebyscore(key, 0, now - window)

# Count requests in window
count = redis.zcard(key)

if count >= limit:
raise RateLimitExceeded()

# Add current request
redis.zadd(key, {str(uuid.uuid4()): now})
redis.expire(key, window)

Common Interview Questions

Q1: "How would you implement rate limiting?"

Answer:

  1. Algorithm: Token bucket or sliding window
  2. Storage: Redis (distributed, fast)
  3. Key: user_id:api_endpoint or ip_address
  4. Response: 429 Too Many Requests with Retry-After header

Q2: "Token bucket vs leaky bucket?"

Answer:

  • Token bucket: Allows bursts (good for UX)
  • Leaky bucket: Smooth traffic (good for backend protection)
  • Most common: Token bucket (better user experience)

Q3: "How do you rate limit across multiple servers?"

Answer:

  • Centralized: Redis stores counters (all servers check Redis)
  • Distributed: Each server tracks quota, sync periodically (eventual consistency)
  • Prefer: Centralized with Redis (accurate)

Quick Reference

Algorithms:

  • Token bucket: Flexible, allows bursts (most common)
  • Leaky bucket: Smooth, no bursts
  • Sliding window: Accurate, memory overhead

Implementation: Redis with TTL
Response: 429 + Retry-After header


Next: Geo-Distribution.