Design URL Shortener (TinyURL)

Problem Statement

Design a service like TinyURL or bit.ly that converts long URLs into short links.

Example: https://example.com/very/long/url → https://short.ly/abc123

Requirements

Functional

Generate short URL from long URL
Redirect short URL to original long URL
Optional: Custom aliases, expiration, analytics

Non-Functional

High availability: 99.9% uptime
Low latency: < 100ms redirect
Scalability: 100M URLs, 10K writes/sec, 100K reads/sec
URL length: 7 characters (base62)

Out of Scope

User accounts, dashboards, detailed analytics

Estimation

URLs created: 100M/month = 3.3M/day = 38 URLs/second
Peak: 38 × 3 = ~100 writes/second

Redirects: 10:1 read:write ratio = 1K redirects/second

Storage (5 years):
- 100M URLs/month × 60 months = 6B URLs
- Each URL: original (2KB) + short (7 bytes) + metadata = ~2 KB
- Total: 6B × 2 KB = 12 TB

Bandwidth:
- Write: 100 QPS × 2 KB = 200 KB/s
- Read: 1K QPS × 2 KB = 2 MB/s

System Interface

POST /api/v1/shorten
{
  "long_url": "https://example.com/very/long/url",
  "custom_alias": "my-link",  // optional
  "expiration_date": "2025-12-31"  // optional
}
Response: {
  "short_url": "https://short.ly/abc123",
  "created_at": "2024-01-15T10:30:00Z"
}

GET /abc123
Response: 301 Redirect to original URL

High-Level Design

Flow:

Shorten: Generate unique ID → convert to base62 → store mapping → return short URL
Redirect: Lookup short URL in cache → if miss, query DB → redirect (301)

Data Model

CREATE TABLE url_mappings (
  id BIGINT PRIMARY KEY,
  short_code VARCHAR(7) UNIQUE,
  long_url TEXT,
  user_id BIGINT,
  created_at TIMESTAMP,
  expires_at TIMESTAMP,
  click_count INT DEFAULT 0,
  INDEX(short_code)
);

Key Design Decisions

1. Generating Short URLs

Option A: Hash (MD5, SHA256)

hash = md5(long_url)  # 128-bit
short_code = base62(hash[:6])  # Take first 6 bytes

Pros: Fast, deterministic
Cons: Collisions (need to check + retry)

Option B: Auto-increment Counter

id = counter.increment()  # Atomic counter
short_code = base62(id)  # Convert to base62

Pros: No collisions, predictable length
Cons: Need distributed counter (single point of failure)

Chosen: Counter (predictable, no collisions)

2. Base62 Encoding

Why base62? Case-sensitive (a-z, A-Z, 0-9) = 62 characters

def base62_encode(num):
    chars = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"
    result = []
    while num > 0:
        result.append(chars[num % 62])
        num //= 62
    return ''.join(reversed(result))

# Example:
base62_encode(125) → "21"  # 2×62 + 1 = 125

7 characters = 62^7 = 3.5 trillion URLs

3. Distributed ID Generator

Option A: Database Auto-Increment

Simple, but becomes bottleneck at scale

Option B: Redis INCR

id = redis.incr("url_counter")  # Atomic

Option C: Snowflake (Twitter's approach)

64-bit ID:
- 1 bit: unused
- 41 bits: timestamp (milliseconds)
- 10 bits: machine ID
- 12 bits: sequence number

Total: 4096 IDs per millisecond per machine

Chosen: Redis INCR (simple, fast) or Snowflake (distributed, no single point of failure)

4. Caching Strategy

def redirect(short_code):
    # 1. Check cache (LRU, 80/20 rule)
    long_url = cache.get(short_code)
    
    if long_url:
        return redirect_301(long_url)  # Cache hit
    
    # 2. Cache miss - query database
    long_url = db.query("SELECT long_url FROM url_mappings WHERE short_code = ?", short_code)
    
    if not long_url:
        return 404  # Not found
    
    # 3. Store in cache (TTL = 24 hours)
    cache.set(short_code, long_url, ttl=86400)
    
    return redirect_301(long_url)

Cache hit rate: ~90% (hot URLs)

5. Handling Custom Aliases

def shorten_url(long_url, custom_alias=None):
    if custom_alias:
        # Check if alias available
        if db.exists("SELECT 1 FROM url_mappings WHERE short_code = ?", custom_alias):
            raise AliasAlreadyTaken()
        
        short_code = custom_alias
    else:
        # Generate short code
        id = redis.incr("url_counter")
        short_code = base62_encode(id)
    
    db.insert("url_mappings", {
        "short_code": short_code,
        "long_url": long_url,
        "created_at": now()
    })
    
    return f"https://short.ly/{short_code}"

Scalability

Sharding

Shard by short_code (hash-based):

shard = hash(short_code) % num_shards

Pros: Even distribution
Cons: Range queries don't work (not needed here)

Caching

Redis cluster: Distributed cache (handle 100K QPS)
LRU eviction: Keep hot URLs in cache
TTL: 24 hours (prevent stale data)

Trade-Offs

Decision	Chosen	Alternative	Reason
ID Generation	Counter	Hash	No collisions, predictable
Encoding	Base62	Base10	Shorter URLs (7 vs 10 chars)
Redirect	301	302	Permanent (browsers cache)
Consistency	Strong	Eventual	Critical for URL mappings

Follow-Up Questions

Q: How do you handle URL expiration?

Answer: Background job (cron) deletes expired URLs daily. Or check expiration on redirect (lazy deletion).

Q: How do you prevent abuse (spam URLs)?

Answer: Rate limiting (100 URLs/day per IP), captcha, blocklist malicious domains.

Q: Analytics (click tracking)?

Answer: Async write to analytics DB (don't block redirect). Use message queue.

Q: Custom domains (https://my-brand.com/abc123)?

Answer: Store domain in url_mappings table, configure DNS CNAME to point to service.

Quick Reference

Algorithm:

Generate ID (Redis INCR or Snowflake)
Encode to base62 (7 chars)
Store mapping in DB
Cache hot URLs (Redis)

Scale: 100M URLs, 12 TB storage, 1K redirects/sec

Next: Design Twitter - Timeline, fanout, celebrity problem.

Problem Statement​

Requirements​

Functional​

Non-Functional​

Out of Scope​

Estimation​

System Interface​

High-Level Design​

Data Model​

Key Design Decisions​

1. Generating Short URLs​

2. Base62 Encoding​

3. Distributed ID Generator​

4. Caching Strategy​

5. Handling Custom Aliases​

Scalability​

Sharding​

Caching​

Trade-Offs​

Follow-Up Questions​

Q: How do you handle URL expiration?​

Q: How do you prevent abuse (spam URLs)?​

Q: Analytics (click tracking)?​

Q: Custom domains (https://my-brand.com/abc123)?​

Quick Reference​