Design URL Shortener (TinyURL)
Problem Statement
Design a service like TinyURL or bit.ly that converts long URLs into short links.
Example: https://example.com/very/long/url → https://short.ly/abc123
Requirements
Functional
- Generate short URL from long URL
- Redirect short URL to original long URL
- Optional: Custom aliases, expiration, analytics
Non-Functional
- High availability: 99.9% uptime
- Low latency: < 100ms redirect
- Scalability: 100M URLs, 10K writes/sec, 100K reads/sec
- URL length: 7 characters (base62)
Out of Scope
- User accounts, dashboards, detailed analytics
Estimation
URLs created: 100M/month = 3.3M/day = 38 URLs/second
Peak: 38 × 3 = ~100 writes/second
Redirects: 10:1 read:write ratio = 1K redirects/second
Storage (5 years):
- 100M URLs/month × 60 months = 6B URLs
- Each URL: original (2KB) + short (7 bytes) + metadata = ~2 KB
- Total: 6B × 2 KB = 12 TB
Bandwidth:
- Write: 100 QPS × 2 KB = 200 KB/s
- Read: 1K QPS × 2 KB = 2 MB/s
System Interface
POST /api/v1/shorten
{
"long_url": "https://example.com/very/long/url",
"custom_alias": "my-link", // optional
"expiration_date": "2025-12-31" // optional
}
Response: {
"short_url": "https://short.ly/abc123",
"created_at": "2024-01-15T10:30:00Z"
}
GET /abc123
Response: 301 Redirect to original URL
High-Level Design
Flow:
- Shorten: Generate unique ID → convert to base62 → store mapping → return short URL
- Redirect: Lookup short URL in cache → if miss, query DB → redirect (301)
Data Model
CREATE TABLE url_mappings (
id BIGINT PRIMARY KEY,
short_code VARCHAR(7) UNIQUE,
long_url TEXT,
user_id BIGINT,
created_at TIMESTAMP,
expires_at TIMESTAMP,
click_count INT DEFAULT 0,
INDEX(short_code)
);
Key Design Decisions
1. Generating Short URLs
Option A: Hash (MD5, SHA256)
hash = md5(long_url) # 128-bit
short_code = base62(hash[:6]) # Take first 6 bytes
Pros: Fast, deterministic
Cons: Collisions (need to check + retry)
Option B: Auto-increment Counter
id = counter.increment() # Atomic counter
short_code = base62(id) # Convert to base62
Pros: No collisions, predictable length
Cons: Need distributed counter (single point of failure)
Chosen: Counter (predictable, no collisions)
2. Base62 Encoding
Why base62? Case-sensitive (a-z, A-Z, 0-9) = 62 characters
def base62_encode(num):
chars = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"
result = []
while num > 0:
result.append(chars[num % 62])
num //= 62
return ''.join(reversed(result))
# Example:
base62_encode(125) → "21" # 2×62 + 1 = 125
7 characters = 62^7 = 3.5 trillion URLs
3. Distributed ID Generator
Option A: Database Auto-Increment
- Simple, but becomes bottleneck at scale
Option B: Redis INCR
id = redis.incr("url_counter") # Atomic
Option C: Snowflake (Twitter's approach)
64-bit ID:
- 1 bit: unused
- 41 bits: timestamp (milliseconds)
- 10 bits: machine ID
- 12 bits: sequence number
Total: 4096 IDs per millisecond per machine
Chosen: Redis INCR (simple, fast) or Snowflake (distributed, no single point of failure)
4. Caching Strategy
def redirect(short_code):
# 1. Check cache (LRU, 80/20 rule)
long_url = cache.get(short_code)
if long_url:
return redirect_301(long_url) # Cache hit
# 2. Cache miss - query database
long_url = db.query("SELECT long_url FROM url_mappings WHERE short_code = ?", short_code)
if not long_url:
return 404 # Not found
# 3. Store in cache (TTL = 24 hours)
cache.set(short_code, long_url, ttl=86400)
return redirect_301(long_url)
Cache hit rate: ~90% (hot URLs)
5. Handling Custom Aliases
def shorten_url(long_url, custom_alias=None):
if custom_alias:
# Check if alias available
if db.exists("SELECT 1 FROM url_mappings WHERE short_code = ?", custom_alias):
raise AliasAlreadyTaken()
short_code = custom_alias
else:
# Generate short code
id = redis.incr("url_counter")
short_code = base62_encode(id)
db.insert("url_mappings", {
"short_code": short_code,
"long_url": long_url,
"created_at": now()
})
return f"https://short.ly/{short_code}"
Scalability
Sharding
Shard by short_code (hash-based):
shard = hash(short_code) % num_shards
Pros: Even distribution
Cons: Range queries don't work (not needed here)
Caching
- Redis cluster: Distributed cache (handle 100K QPS)
- LRU eviction: Keep hot URLs in cache
- TTL: 24 hours (prevent stale data)
Trade-Offs
| Decision | Chosen | Alternative | Reason |
|---|---|---|---|
| ID Generation | Counter | Hash | No collisions, predictable |
| Encoding | Base62 | Base10 | Shorter URLs (7 vs 10 chars) |
| Redirect | 301 | 302 | Permanent (browsers cache) |
| Consistency | Strong | Eventual | Critical for URL mappings |
Follow-Up Questions
Q: How do you handle URL expiration?
Answer: Background job (cron) deletes expired URLs daily. Or check expiration on redirect (lazy deletion).
Q: How do you prevent abuse (spam URLs)?
Answer: Rate limiting (100 URLs/day per IP), captcha, blocklist malicious domains.
Q: Analytics (click tracking)?
Answer: Async write to analytics DB (don't block redirect). Use message queue.
Q: Custom domains (https://my-brand.com/abc123)?
Answer: Store domain in url_mappings table, configure DNS CNAME to point to service.
Quick Reference
Algorithm:
- Generate ID (Redis INCR or Snowflake)
- Encode to base62 (7 chars)
- Store mapping in DB
- Cache hot URLs (Redis)
Scale: 100M URLs, 12 TB storage, 1K redirects/sec
Next: Design Twitter - Timeline, fanout, celebrity problem.