System Design Cheat Sheets
Quick reference for interviews - memorize these!
Numbers Every Engineer Should Knowโ
| Operation | Latency | Humanized |
|---|---|---|
| L1 cache | 0.5 ns | - |
| L2 cache | 7 ns | - |
| RAM access | 100 ns | ~0.1 ยตs |
| SSD read | 16 ยตs | ~10 ยตs |
| Network (same datacenter) | 0.5 ms | ~1 ms |
| HDD seek | 10 ms | ~10 ms |
| Network (cross-continent) | 150 ms | ~150 ms |
Key takeaways:
- Memory is 1000x faster than SSD
- Network latency dominates in distributed systems
- 1 day โ 10^5 seconds (100K)
Time/Data Conversionsโ
| Unit | Value | Approximation |
|---|---|---|
| 1 million | 10^6 | 1M |
| 1 billion | 10^9 | 1B |
| 1 day | 86,400 sec | ~10^5 (100K) |
| 1 week | 604,800 sec | ~0.6M |
| 1 month | 2.6M sec | ~2.5M |
| 1 year | 31.5M sec | ~30M |
| 1 KB | 1,000 bytes | 10^3 |
| 1 MB | 1,000,000 bytes | 10^6 |
| 1 GB | 1 billion bytes | 10^9 |
Availability (Nines)โ
| Availability | Downtime/Year | Downtime/Month | Use Case |
|---|---|---|---|
| 99% (2 nines) | 3.65 days | 7.3 hours | Internal tools |
| 99.9% (3 nines) | 8.77 hours | 43.8 minutes | Most web apps |
| 99.99% (4 nines) | 52.6 minutes | 4.4 minutes | Payment systems |
| 99.999% (5 nines) | 5.26 minutes | 26 seconds | Critical infra |
Common System Componentsโ
Load Balancersโ
- L4: Fast, routes by IP/port (databases)
- L7: Smarter, routes by URL/headers (web apps)
- Algorithms: Round-robin, least connections, IP hash
Cachingโ
- Where: Browser, CDN, Application, Redis
- Eviction: LRU (most common), LFU, TTL
- Patterns: Cache-aside, write-through, write-behind
Databasesโ
- SQL: Transactions, relationships, complex queries
- NoSQL: Massive scale, flexible schema, simple queries
- Indexing: B-tree (range), hash (exact match)
- Sharding: Range, hash, geographic
Message Queuesโ
- Kafka: Event streaming, high throughput
- RabbitMQ: Traditional MQ, complex routing
- SQS: Managed, serverless
- Guarantees: At-most-once, at-least-once, exactly-once
Design Patterns Cheat Sheetโ
Scalabilityโ
- Horizontal scaling: Add more servers (preferred)
- Stateless services: Store session in Redis
- Sharding: Split data by key
- Caching: Reduce database load (90% cache hit rate)
Reliabilityโ
- Circuit breaker: Fast-fail when service down
- Retry: Exponential backoff + jitter
- Timeout: Always set (don't hang forever)
- Bulkhead: Isolate resources
- Graceful degradation: Partial > total failure
Consistencyโ
- Strong: All nodes see same data (banking)
- Eventual: Nodes converge over time (social media)
- CAP theorem: CP (consistent) or AP (available)
Data Patternsโ
- Replication: Master-slave (one leader), multi-master (conflicts)
- Saga: Compensating transactions (eventual consistency)
- Outbox: Reliable event publishing (transactional)
- CQRS: Separate read/write models
HTTP Status Codesโ
| Code | Meaning | When to Use |
|---|---|---|
| 200 | OK | Success |
| 201 | Created | Resource created |
| 301 | Moved Permanently | Permanent redirect (SEO) |
| 302 | Found | Temporary redirect |
| 400 | Bad Request | Invalid input |
| 401 | Unauthorized | Not authenticated |
| 403 | Forbidden | Not authorized |
| 404 | Not Found | Resource doesn't exist |
| 429 | Too Many Requests | Rate limit exceeded |
| 500 | Internal Server Error | Server error |
| 503 | Service Unavailable | Server overloaded/maintenance |
API Designโ
REST endpoints:
GET /api/v1/users โ List
GET /api/v1/users/123 โ Get one
POST /api/v1/users โ Create
PUT /api/v1/users/123 โ Update (full)
PATCH /api/v1/users/123 โ Update (partial)
DELETE /api/v1/users/123 โ Delete
Versioning: /api/v1/ in URL (most common)
Rate limiting headers:
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 42
X-RateLimit-Reset: 1642345600
Retry-After: 60
Interview Framework (RESHADED)โ
- Requirements: Clarify functional, non-functional, out-of-scope (5 min)
- Estimation: Back-of-envelope (QPS, storage, bandwidth) (5 min)
- System Interface: Define APIs (2 min)
- High-Level Design: Draw boxes and arrows (10 min)
- API Design: Detail critical APIs (3 min)
- Data Model: Database schema (5 min)
- Elaborate: Deep dive 1-2 components (10 min)
- Discuss: Trade-offs, bottlenecks, failures (5 min)
Common Interview Questions Answersโ
"How do you scale to 1M users?"โ
- Vertical scaling (bigger server)
- Horizontal scaling (more servers + load balancer)
- Database replication (read replicas)
- Caching (Redis)
- CDN (static assets)
"SQL or NoSQL?"โ
- SQL when: Transactions, relationships, complex queries
- NoSQL when: Massive scale, flexible schema, simple queries
- Both: Hybrid (SQL for users, NoSQL for logs)
"How do you handle database bottleneck?"โ
- Indexing (fast queries)
- Caching (Redis - 90% cache hit rate)
- Read replicas (scale reads)
- Sharding (scale writes)
"CAP theorem?"โ
Can't have all three during network partition:
- Consistency: All nodes see same data
- Availability: Every request gets response
- Partition tolerance: System works despite network split
Choose: CP (banking) or AP (social media)
"Microservices vs monolith?"โ
Monolith when:
- Small team (less than 10)
- Simple domain
- Startup (iterate fast)
Microservices when:
- Large team (greater than 20)
- Need independent deployment
- Scale parts independently
Key Trade-Offsโ
| Decision | Option A | Option B |
|---|---|---|
| Consistency | Strong (slow, CP) | Eventual (fast, AP) |
| Scaling | Vertical (simple) | Horizontal (scalable) |
| Storage | SQL (structured) | NoSQL (flexible) |
| Communication | Sync (simple) | Async (decoupled) |
| Caching | More cache (fast, expensive) | Less cache (slow, cheap) |
Final Checklistโ
Before ending interview, ensure you've covered:
- โ Scalability (how to handle 10x traffic)
- โ Reliability (what if X fails?)
- โ Trade-offs (why chose A over B?)
- โ Bottlenecks (where will it break?)
- โ Monitoring (how to detect issues?)
- โ Security (auth, rate limiting, encryption)
Almost done! For advanced scenario-based problems (migrations, zero-downtime, cascading failures), see:
Next: Real-World Problems - Senior/Staff level interview scenarios.
Practice: Do mock interviews, draw diagrams, time yourself (45 minutes).
Good luck! ๐