Skip to main content

System Design Cheat Sheets

Quick reference for interviews - memorize these!

Numbers Every Engineer Should Knowโ€‹

OperationLatencyHumanized
L1 cache0.5 ns-
L2 cache7 ns-
RAM access100 ns~0.1 ยตs
SSD read16 ยตs~10 ยตs
Network (same datacenter)0.5 ms~1 ms
HDD seek10 ms~10 ms
Network (cross-continent)150 ms~150 ms

Key takeaways:

  • Memory is 1000x faster than SSD
  • Network latency dominates in distributed systems
  • 1 day โ‰ˆ 10^5 seconds (100K)

Time/Data Conversionsโ€‹

UnitValueApproximation
1 million10^61M
1 billion10^91B
1 day86,400 sec~10^5 (100K)
1 week604,800 sec~0.6M
1 month2.6M sec~2.5M
1 year31.5M sec~30M
1 KB1,000 bytes10^3
1 MB1,000,000 bytes10^6
1 GB1 billion bytes10^9

Availability (Nines)โ€‹

AvailabilityDowntime/YearDowntime/MonthUse Case
99% (2 nines)3.65 days7.3 hoursInternal tools
99.9% (3 nines)8.77 hours43.8 minutesMost web apps
99.99% (4 nines)52.6 minutes4.4 minutesPayment systems
99.999% (5 nines)5.26 minutes26 secondsCritical infra

Common System Componentsโ€‹

Load Balancersโ€‹

  • L4: Fast, routes by IP/port (databases)
  • L7: Smarter, routes by URL/headers (web apps)
  • Algorithms: Round-robin, least connections, IP hash

Cachingโ€‹

  • Where: Browser, CDN, Application, Redis
  • Eviction: LRU (most common), LFU, TTL
  • Patterns: Cache-aside, write-through, write-behind

Databasesโ€‹

  • SQL: Transactions, relationships, complex queries
  • NoSQL: Massive scale, flexible schema, simple queries
  • Indexing: B-tree (range), hash (exact match)
  • Sharding: Range, hash, geographic

Message Queuesโ€‹

  • Kafka: Event streaming, high throughput
  • RabbitMQ: Traditional MQ, complex routing
  • SQS: Managed, serverless
  • Guarantees: At-most-once, at-least-once, exactly-once

Design Patterns Cheat Sheetโ€‹

Scalabilityโ€‹

  • Horizontal scaling: Add more servers (preferred)
  • Stateless services: Store session in Redis
  • Sharding: Split data by key
  • Caching: Reduce database load (90% cache hit rate)

Reliabilityโ€‹

  • Circuit breaker: Fast-fail when service down
  • Retry: Exponential backoff + jitter
  • Timeout: Always set (don't hang forever)
  • Bulkhead: Isolate resources
  • Graceful degradation: Partial > total failure

Consistencyโ€‹

  • Strong: All nodes see same data (banking)
  • Eventual: Nodes converge over time (social media)
  • CAP theorem: CP (consistent) or AP (available)

Data Patternsโ€‹

  • Replication: Master-slave (one leader), multi-master (conflicts)
  • Saga: Compensating transactions (eventual consistency)
  • Outbox: Reliable event publishing (transactional)
  • CQRS: Separate read/write models

HTTP Status Codesโ€‹

CodeMeaningWhen to Use
200OKSuccess
201CreatedResource created
301Moved PermanentlyPermanent redirect (SEO)
302FoundTemporary redirect
400Bad RequestInvalid input
401UnauthorizedNot authenticated
403ForbiddenNot authorized
404Not FoundResource doesn't exist
429Too Many RequestsRate limit exceeded
500Internal Server ErrorServer error
503Service UnavailableServer overloaded/maintenance

API Designโ€‹

REST endpoints:

GET    /api/v1/users        โ†’ List
GET /api/v1/users/123 โ†’ Get one
POST /api/v1/users โ†’ Create
PUT /api/v1/users/123 โ†’ Update (full)
PATCH /api/v1/users/123 โ†’ Update (partial)
DELETE /api/v1/users/123 โ†’ Delete

Versioning: /api/v1/ in URL (most common)

Rate limiting headers:

X-RateLimit-Limit: 100
X-RateLimit-Remaining: 42
X-RateLimit-Reset: 1642345600
Retry-After: 60

Interview Framework (RESHADED)โ€‹

  1. Requirements: Clarify functional, non-functional, out-of-scope (5 min)
  2. Estimation: Back-of-envelope (QPS, storage, bandwidth) (5 min)
  3. System Interface: Define APIs (2 min)
  4. High-Level Design: Draw boxes and arrows (10 min)
  5. API Design: Detail critical APIs (3 min)
  6. Data Model: Database schema (5 min)
  7. Elaborate: Deep dive 1-2 components (10 min)
  8. Discuss: Trade-offs, bottlenecks, failures (5 min)

Common Interview Questions Answersโ€‹

"How do you scale to 1M users?"โ€‹

  1. Vertical scaling (bigger server)
  2. Horizontal scaling (more servers + load balancer)
  3. Database replication (read replicas)
  4. Caching (Redis)
  5. CDN (static assets)

"SQL or NoSQL?"โ€‹

  • SQL when: Transactions, relationships, complex queries
  • NoSQL when: Massive scale, flexible schema, simple queries
  • Both: Hybrid (SQL for users, NoSQL for logs)

"How do you handle database bottleneck?"โ€‹

  1. Indexing (fast queries)
  2. Caching (Redis - 90% cache hit rate)
  3. Read replicas (scale reads)
  4. Sharding (scale writes)

"CAP theorem?"โ€‹

Can't have all three during network partition:

  • Consistency: All nodes see same data
  • Availability: Every request gets response
  • Partition tolerance: System works despite network split

Choose: CP (banking) or AP (social media)

"Microservices vs monolith?"โ€‹

Monolith when:

  • Small team (less than 10)
  • Simple domain
  • Startup (iterate fast)

Microservices when:

  • Large team (greater than 20)
  • Need independent deployment
  • Scale parts independently

Key Trade-Offsโ€‹

DecisionOption AOption B
ConsistencyStrong (slow, CP)Eventual (fast, AP)
ScalingVertical (simple)Horizontal (scalable)
StorageSQL (structured)NoSQL (flexible)
CommunicationSync (simple)Async (decoupled)
CachingMore cache (fast, expensive)Less cache (slow, cheap)

Final Checklistโ€‹

Before ending interview, ensure you've covered:

  • โœ… Scalability (how to handle 10x traffic)
  • โœ… Reliability (what if X fails?)
  • โœ… Trade-offs (why chose A over B?)
  • โœ… Bottlenecks (where will it break?)
  • โœ… Monitoring (how to detect issues?)
  • โœ… Security (auth, rate limiting, encryption)

Almost done! For advanced scenario-based problems (migrations, zero-downtime, cascading failures), see:

Next: Real-World Problems - Senior/Staff level interview scenarios.


Practice: Do mock interviews, draw diagrams, time yourself (45 minutes).

Good luck! ๐Ÿš€