Skip to main content

Design Google Drive

Problem

Design cloud storage with file sync, sharing, collaboration.

Requirements

  • Upload/download files
  • Sync across devices
  • Share files (public link, permissions)
  • Real-time collaboration (like Google Docs)
  • Scale: 1B users, 1 PB storage

Key Challenges

  1. File sync: Detect changes, sync efficiently
  2. Conflict resolution: Two users edit same file offline
  3. Large files: Chunking, resume uploads
  4. Real-time collab: Operational Transformation (OT) or CRDT

High-Level Design

File Sync (Dropbox Model)

Chunking

Split file into chunks (4 MB each):

file.txt (20 MB) → 
chunk1 (4 MB) + chunk2 (4 MB) + ... + chunk5 (4 MB)

Benefits:

  • Resume uploads (re-upload failed chunks only)
  • De-duplication (same chunk across files)
  • Efficient sync (only changed chunks)

Change Detection

Option A: Polling (check every 30s)
Option B: File system watcher (inotify, FSEvents)

Chosen: File system watcher (real-time)

Delta Sync

# Only upload changed chunks
local_chunks = compute_checksums(file)
remote_chunks = fetch_remote_checksums(file_id)

changed_chunks = local_chunks - remote_chunks
upload_chunks(changed_chunks)

Conflict Resolution

Scenario: User A and B edit same file offline, then sync.

Strategy 1: Last-write-wins (simple, may lose data)
Strategy 2: Keep both versions (rename to "file (conflict copy)")
Strategy 3: Merge (for text files, use diff/patch)

Chosen: Keep both (user decides)

Real-Time Collaboration (Google Docs)

Operational Transformation (OT)

Doc: "Hello"

User A inserts "!" at position 5: "Hello!"
User B deletes "l" at position 3: "Helo"

OT transforms operations so both reach same state:
Final: "Helo!"

Complex but proven (Google Docs uses this)

Alternative: CRDT

Conflict-free replicated data types (automatic conflict resolution)

Sharing

CREATE TABLE shared_files (
file_id UUID,
user_id BIGINT,
permission ENUM('read', 'write', 'owner'),
shared_link VARCHAR(255) UNIQUE,
expiration_date TIMESTAMP
);

Public link: Generate signed URL (expires after 7 days)

Storage Optimization

  • Deduplication: Same file uploaded by multiple users → store once
  • Compression: gzip before upload
  • Cold storage: Move old files to Glacier

Next: Design Search Engine.