Design Google Drive
Problem
Design cloud storage with file sync, sharing, collaboration.
Requirements
- Upload/download files
- Sync across devices
- Share files (public link, permissions)
- Real-time collaboration (like Google Docs)
- Scale: 1B users, 1 PB storage
Key Challenges
- File sync: Detect changes, sync efficiently
- Conflict resolution: Two users edit same file offline
- Large files: Chunking, resume uploads
- Real-time collab: Operational Transformation (OT) or CRDT
High-Level Design
File Sync (Dropbox Model)
Chunking
Split file into chunks (4 MB each):
file.txt (20 MB) →
chunk1 (4 MB) + chunk2 (4 MB) + ... + chunk5 (4 MB)
Benefits:
- Resume uploads (re-upload failed chunks only)
- De-duplication (same chunk across files)
- Efficient sync (only changed chunks)
Change Detection
Option A: Polling (check every 30s)
Option B: File system watcher (inotify, FSEvents)
Chosen: File system watcher (real-time)
Delta Sync
# Only upload changed chunks
local_chunks = compute_checksums(file)
remote_chunks = fetch_remote_checksums(file_id)
changed_chunks = local_chunks - remote_chunks
upload_chunks(changed_chunks)
Conflict Resolution
Scenario: User A and B edit same file offline, then sync.
Strategy 1: Last-write-wins (simple, may lose data)
Strategy 2: Keep both versions (rename to "file (conflict copy)")
Strategy 3: Merge (for text files, use diff/patch)
Chosen: Keep both (user decides)
Real-Time Collaboration (Google Docs)
Operational Transformation (OT)
Doc: "Hello"
User A inserts "!" at position 5: "Hello!"
User B deletes "l" at position 3: "Helo"
OT transforms operations so both reach same state:
Final: "Helo!"
Complex but proven (Google Docs uses this)
Alternative: CRDT
Conflict-free replicated data types (automatic conflict resolution)
Sharing
CREATE TABLE shared_files (
file_id UUID,
user_id BIGINT,
permission ENUM('read', 'write', 'owner'),
shared_link VARCHAR(255) UNIQUE,
expiration_date TIMESTAMP
);
Public link: Generate signed URL (expires after 7 days)
Storage Optimization
- Deduplication: Same file uploaded by multiple users → store once
- Compression: gzip before upload
- Cold storage: Move old files to Glacier
Next: Design Search Engine.