Skip to main content

Blob Storage

TL;DR

Blob storage: Store unstructured data (files, images, videos). S3: Amazon's object storage (99.999999999% durability). CDN: Cache blobs close to users.

Core Concepts

1. Object Storage vs Block Storage

TypeUse CaseExample
ObjectFiles, images, backupsS3, Azure Blob, GCS
BlockDatabases, VMs (low-level)EBS, SAN
FileShared file systemsEFS, NFS

Object storage = Best for web apps (scalable, HTTP access)

2. S3 Architecture

Key features:

  • Durability: 99.999999999% (11 nines) - won't lose data
  • Availability: 99.99% - can access data
  • Scalability: Unlimited storage
  • Versioning: Keep old versions of files
  • Lifecycle policies: Auto-delete old files

3. Signed URLs

Problem: Don't want S3 buckets public (security risk)

Solution: Generate temporary signed URL

# Server generates signed URL (valid for 1 hour)
signed_url = s3.generate_presigned_url(
'get_object',
Params={'Bucket': 'my-bucket', 'Key': 'photo.jpg'},
ExpiresIn=3600
)

# Client downloads directly from S3 (no server bandwidth)
# https://my-bucket.s3.amazonaws.com/photo.jpg?signature=...

4. Upload Strategies

Small files (less than 5MB): Direct upload to S3

Large files (greater than 100MB): Multipart upload

Benefits: Resume failed uploads, parallel uploads (faster)

Common Interview Questions

Q1: "How would you design a photo sharing service (Instagram)?"

Answer:

  1. Upload: Client → Server → S3 (original photo)
  2. Process: Resize to multiple resolutions (thumbnail, medium, large)
  3. Store: Upload resized photos to S3
  4. Serve: CDN caches photos, serve from edge locations
  5. Database: Store metadata (user_id, photo_id, S3 keys)

Q2: "How do you secure S3 buckets?"

Answer:

  • Private by default: Block public access
  • Signed URLs: Temporary access (expire after 1 hour)
  • IAM policies: Control who can access
  • Encryption: At rest (AES-256) and in transit (HTTPS)

Q3: "How do you handle millions of files in S3?"

Answer:

  • Key design: Use prefixes for parallelism
    • Bad: photo1.jpg, photo2.jpg (sequential)
    • Good: ab/cd/ef/photo1.jpg (random hash prefix)
  • Reason: S3 partitions by key prefix (better performance)

Quick Reference

S3 features:

  • 99.999999999% durability (won't lose data)
  • Unlimited storage
  • Signed URLs (temporary access)
  • Multipart upload (large files)

Use cases: File storage, backups, static websites, data lakes


Next: Notification Systems - Push notifications, email, SMS.