Skip to main content

Design URL Shortener (TinyURL)

Problem Statement

Design a service like TinyURL or bit.ly that converts long URLs into short links.

Example: https://example.com/very/long/urlhttps://short.ly/abc123

Requirements

Functional

  • Generate short URL from long URL
  • Redirect short URL to original long URL
  • Optional: Custom aliases, expiration, analytics

Non-Functional

  • High availability: 99.9% uptime
  • Low latency: < 100ms redirect
  • Scalability: 100M URLs, 10K writes/sec, 100K reads/sec
  • URL length: 7 characters (base62)

Out of Scope

  • User accounts, dashboards, detailed analytics

Estimation

URLs created: 100M/month = 3.3M/day = 38 URLs/second
Peak: 38 × 3 = ~100 writes/second

Redirects: 10:1 read:write ratio = 1K redirects/second

Storage (5 years):
- 100M URLs/month × 60 months = 6B URLs
- Each URL: original (2KB) + short (7 bytes) + metadata = ~2 KB
- Total: 6B × 2 KB = 12 TB

Bandwidth:
- Write: 100 QPS × 2 KB = 200 KB/s
- Read: 1K QPS × 2 KB = 2 MB/s

System Interface

POST /api/v1/shorten
{
"long_url": "https://example.com/very/long/url",
"custom_alias": "my-link", // optional
"expiration_date": "2025-12-31" // optional
}
Response: {
"short_url": "https://short.ly/abc123",
"created_at": "2024-01-15T10:30:00Z"
}

GET /abc123
Response: 301 Redirect to original URL

High-Level Design

Flow:

  1. Shorten: Generate unique ID → convert to base62 → store mapping → return short URL
  2. Redirect: Lookup short URL in cache → if miss, query DB → redirect (301)

Data Model

CREATE TABLE url_mappings (
id BIGINT PRIMARY KEY,
short_code VARCHAR(7) UNIQUE,
long_url TEXT,
user_id BIGINT,
created_at TIMESTAMP,
expires_at TIMESTAMP,
click_count INT DEFAULT 0,
INDEX(short_code)
);

Key Design Decisions

1. Generating Short URLs

Option A: Hash (MD5, SHA256)

hash = md5(long_url)  # 128-bit
short_code = base62(hash[:6]) # Take first 6 bytes

Pros: Fast, deterministic
Cons: Collisions (need to check + retry)

Option B: Auto-increment Counter

id = counter.increment()  # Atomic counter
short_code = base62(id) # Convert to base62

Pros: No collisions, predictable length
Cons: Need distributed counter (single point of failure)

Chosen: Counter (predictable, no collisions)

2. Base62 Encoding

Why base62? Case-sensitive (a-z, A-Z, 0-9) = 62 characters

def base62_encode(num):
chars = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"
result = []
while num > 0:
result.append(chars[num % 62])
num //= 62
return ''.join(reversed(result))

# Example:
base62_encode(125)"21" # 2×62 + 1 = 125

7 characters = 62^7 = 3.5 trillion URLs

3. Distributed ID Generator

Option A: Database Auto-Increment

  • Simple, but becomes bottleneck at scale

Option B: Redis INCR

id = redis.incr("url_counter")  # Atomic

Option C: Snowflake (Twitter's approach)

64-bit ID:
- 1 bit: unused
- 41 bits: timestamp (milliseconds)
- 10 bits: machine ID
- 12 bits: sequence number

Total: 4096 IDs per millisecond per machine

Chosen: Redis INCR (simple, fast) or Snowflake (distributed, no single point of failure)

4. Caching Strategy

def redirect(short_code):
# 1. Check cache (LRU, 80/20 rule)
long_url = cache.get(short_code)

if long_url:
return redirect_301(long_url) # Cache hit

# 2. Cache miss - query database
long_url = db.query("SELECT long_url FROM url_mappings WHERE short_code = ?", short_code)

if not long_url:
return 404 # Not found

# 3. Store in cache (TTL = 24 hours)
cache.set(short_code, long_url, ttl=86400)

return redirect_301(long_url)

Cache hit rate: ~90% (hot URLs)

5. Handling Custom Aliases

def shorten_url(long_url, custom_alias=None):
if custom_alias:
# Check if alias available
if db.exists("SELECT 1 FROM url_mappings WHERE short_code = ?", custom_alias):
raise AliasAlreadyTaken()

short_code = custom_alias
else:
# Generate short code
id = redis.incr("url_counter")
short_code = base62_encode(id)

db.insert("url_mappings", {
"short_code": short_code,
"long_url": long_url,
"created_at": now()
})

return f"https://short.ly/{short_code}"

Scalability

Sharding

Shard by short_code (hash-based):

shard = hash(short_code) % num_shards

Pros: Even distribution
Cons: Range queries don't work (not needed here)

Caching

  • Redis cluster: Distributed cache (handle 100K QPS)
  • LRU eviction: Keep hot URLs in cache
  • TTL: 24 hours (prevent stale data)

Trade-Offs

DecisionChosenAlternativeReason
ID GenerationCounterHashNo collisions, predictable
EncodingBase62Base10Shorter URLs (7 vs 10 chars)
Redirect301302Permanent (browsers cache)
ConsistencyStrongEventualCritical for URL mappings

Follow-Up Questions

Q: How do you handle URL expiration?

Answer: Background job (cron) deletes expired URLs daily. Or check expiration on redirect (lazy deletion).

Q: How do you prevent abuse (spam URLs)?

Answer: Rate limiting (100 URLs/day per IP), captcha, blocklist malicious domains.

Q: Analytics (click tracking)?

Answer: Async write to analytics DB (don't block redirect). Use message queue.

Q: Custom domains (https://my-brand.com/abc123)?

Answer: Store domain in url_mappings table, configure DNS CNAME to point to service.

Quick Reference

Algorithm:

  1. Generate ID (Redis INCR or Snowflake)
  2. Encode to base62 (7 chars)
  3. Store mapping in DB
  4. Cache hot URLs (Redis)

Scale: 100M URLs, 12 TB storage, 1K redirects/sec


Next: Design Twitter - Timeline, fanout, celebrity problem.