Storage Fundamentals
TL;DR (30-second summary)
SQL (relational): Structured data, ACID transactions, complex queries. NoSQL: Flexible schema, horizontal scaling, eventual consistency. ACID = Strong guarantees. BASE = Availability over consistency.
Choose SQL when you need transactions and relationships. Choose NoSQL when you need massive scale and flexibility.
Why This Matters
In interviews: Database choice is often the first major design decision. Shows you understand trade-offs between consistency, scalability, and flexibility.
At work: Wrong database choice costs millions in rewrites and downtime.
Core Concepts
1. SQL vs NoSQL
| Feature | SQL (Relational) | NoSQL |
|---|---|---|
| Schema | Fixed, predefined schema | Flexible, schemaless |
| Data Model | Tables with rows and columns | Documents, key-value, graphs, etc. |
| Relationships | Foreign keys, JOINs | Denormalized, embedded docs |
| Transactions | ACID (strong guarantees) | BASE (eventual consistency) |
| Scaling | Vertical (harder to scale out) | Horizontal (easy to scale out) |
| Query Language | SQL (standardized) | Varies by database |
| Use Case | Financial, ERP, e-commerce orders | Social media, logs, real-time analytics |
2. ACID Properties
ACID = Strong consistency guarantees for SQL databases.
| Property | Meaning | Example |
|---|---|---|
| Atomicity | All or nothing - transaction either completes fully or rolls back | Bank transfer: debit AND credit happen, or neither |
| Consistency | Database goes from one valid state to another | Account balance can't go negative (constraint enforced) |
| Isolation | Concurrent transactions don't interfere | Two users booking last seat don't both succeed |
| Durability | Committed data survives crashes | After "payment confirmed", data persists even if server crashes |
Example (Bank Transfer):
BEGIN TRANSACTION;
UPDATE accounts SET balance = balance - 100 WHERE id = 'Alice';
UPDATE accounts SET balance = balance + 100 WHERE id = 'Bob';
COMMIT;
-- If either fails, both roll back (Atomicity)
-- Balance constraints enforced (Consistency)
-- Other transactions see intermediate state (Isolation)
-- After COMMIT, survives crash (Durability)
3. BASE Properties
BASE = Eventual consistency model for NoSQL at scale.
| Property | Meaning |
|---|---|
| Basically Available | System appears to work most of the time (prioritizes availability) |
| Soft state | State may change over time without input (due to eventual consistency) |
| Eventually consistent | System will become consistent over time (given no new updates) |
Example (Social Media Likes):
User A likes post → Write to US datacenter
User B in EU views post → Reads from EU datacenter (might not see like yet)
After replication (1-2 seconds) → User B sees the like
Result: Availability (both users can interact immediately)
Trade-off: Consistency (brief period where like not visible)
Say: "For this banking system, we need ACID guarantees - SQL is the right choice. For this social media feed, eventual consistency is fine - NoSQL allows us to scale horizontally."
4. SQL Databases (Relational)
When to use:
- ✅ Structured data with relationships
- ✅ Need ACID transactions
- ✅ Complex queries with JOINs
- ✅ Data integrity is critical
Popular choices:
| Database | Strengths | Use Case |
|---|---|---|
| PostgreSQL | Full-featured, JSONB support, extensions | General purpose, GIS data, analytics |
| MySQL | Fast, widely adopted, simple | Web apps, WordPress, e-commerce |
| SQL Server | Enterprise features, integrates with Microsoft stack | Corporate applications, .NET ecosystem |
| Oracle | High performance, advanced features | Large enterprises, mission-critical systems |
Limitations:
- ❌ Vertical scaling limits (expensive to upgrade single server)
- ❌ Sharding is complex (need manual partition logic)
- ❌ Schema changes require migrations (downtime risk)
5. NoSQL Databases
Document Databases (MongoDB, CouchDB)
Data model: JSON-like documents
{
"_id": "user123",
"name": "John Doe",
"email": "john@example.com",
"addresses": [
{"type": "home", "city": "NYC"},
{"type": "work", "city": "SF"}
],
"orders": [...] // Embedded data
}
When to use:
- ✅ Flexible schema (product catalogs, content management)
- ✅ Nested/hierarchical data
- ✅ Rapid prototyping (no migrations)
Example: E-commerce product catalog (different products have different attributes)
Key-Value Stores (Redis, DynamoDB)
Data model: Simple key → value
key: "user:123:session"
value: "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9..."
key: "product:456"
value: {"name": "Laptop", "price": 999}
When to use:
- ✅ Caching (most common use case)
- ✅ Session storage
- ✅ Real-time leaderboards
- ✅ Simple access patterns (get by ID)
Example: Redis for caching API responses, DynamoDB for user sessions
Wide-Column Stores (Cassandra, HBase)
Data model: Rows with dynamic columns
Row key: "user123"
Columns: {timestamp1: data1, timestamp2: data2, ...}
When to use:
- ✅ Time-series data (logs, metrics, IoT)
- ✅ Massive write throughput
- ✅ Distributed across many nodes
Example: Cassandra for Netflix viewing history, HBase for Facebook messages
Graph Databases (Neo4j, Amazon Neptune)
Data model: Nodes and relationships
(Person: John)-[:FRIENDS_WITH]->(Person: Jane)
(Person: John)-[:LIKES]->(Movie: Inception)
When to use:
- ✅ Highly connected data (social networks)
- ✅ Recommendation engines
- ✅ Fraud detection (pattern matching)
Example: LinkedIn connections, fraud detection in banking
6. Storage Types by Use Case
| Use Case | Storage Type | Example DB | Reason |
|---|---|---|---|
| Financial transactions | SQL | PostgreSQL | Need ACID guarantees |
| User sessions | Key-Value | Redis | Fast reads/writes, TTL support |
| Product catalog | Document | MongoDB | Flexible schema per product |
| Time-series logs | Wide-Column | Cassandra | Massive write throughput |
| Social graph | Graph | Neo4j | Efficient relationship queries |
| Blob storage | Object Store | S3 | Files, images, videos |
| Search | Search Engine | Elasticsearch | Full-text search, faceted queries |
Common Interview Questions
Q1: "SQL or NoSQL for Twitter?"
Answer: Hybrid approach
- SQL (PostgreSQL): User accounts, relationships (followers) - need consistency
- NoSQL (Cassandra): Tweets, timeline - need massive write throughput
- Cache (Redis): Hot tweets, trending topics - need low latency
- Search (Elasticsearch): Tweet search - need full-text search
Reason: Different data has different requirements. Use the right tool for each.
Q2: "Explain ACID with a concrete example."
Answer (E-commerce order):
BEGIN TRANSACTION;
-- Atomicity: All these happen or none
INSERT INTO orders (user_id, total) VALUES (123, 99.99);
UPDATE inventory SET stock = stock - 1 WHERE product_id = 456;
INSERT INTO payments (order_id, amount) VALUES (789, 99.99);
-- Consistency: Stock can't go negative (constraint)
-- Isolation: Two orders don't both grab last item
-- Durability: After confirmation, data persists
COMMIT;
If payment fails → entire transaction rolls back (Atomicity)
Q3: "When would you choose MongoDB over PostgreSQL?"
Answer: Choose MongoDB when:
- Schema evolves frequently (e.g., product attributes vary)
- Need horizontal scaling without sharding complexity
- Working with nested/hierarchical data
- Rapid prototyping (no migrations)
Choose PostgreSQL when:
- Need strong consistency and transactions
- Complex queries with JOINs
- Data integrity is critical
- Team familiar with SQL
Q4: "What's eventual consistency? Give an example."
Answer: Data becomes consistent over time, but may be temporarily inconsistent.
Example (Instagram likes):
1. User in US likes post → written to US datacenter (0ms)
2. User in EU views same post → reads from EU datacenter
→ Might not see like immediately (replication lag ~1-2 seconds)
3. After replication → EU datacenter has the like (eventually consistent)
Trade-off: Availability (both operations succeed immediately) over consistency (brief stale reads)
Trade-offs
| Decision | SQL | NoSQL | Consider |
|---|---|---|---|
| Consistency | Strong (ACID) | Eventual (BASE) | Business requirements |
| Scalability | Vertical (limited) | Horizontal (unlimited) | Expected growth |
| Flexibility | Fixed schema | Flexible schema | Data structure stability |
| Query Complexity | JOINs, aggregations | Simple lookups | Query patterns |
| Transactions | Multi-row ACID | Limited or none | Data integrity needs |
Real-World Examples
Uber (Polyglot Persistence)
- PostgreSQL: User accounts, driver profiles (transactional data)
- Redis: Geospatial queries (nearby drivers), real-time location
- Cassandra: Trip history (high write throughput)
- Elasticsearch: Search for places, past trips
Netflix
- MySQL: Billing, subscription data (ACID required)
- Cassandra: Viewing history (billions of writes/day)
- Elasticsearch: Search for movies, content discovery
- EVCache (Redis): Caching layer for all reads
Facebook
- MySQL (sharded): User data, relationships (custom sharding)
- Cassandra: Messages, timeline posts
- Memcached: Cache layer (read-heavy workload)
- TAO (Graph): Social graph queries
Quick Reference Card
SQL Databases:
- PostgreSQL: General purpose, feature-rich
- MySQL: Simple, fast, widely adopted
- Use when: Transactions, relationships, complex queries
NoSQL Databases:
- MongoDB: Flexible schema, document store
- Redis: Caching, sessions, key-value
- Cassandra: Time-series, high writes, wide-column
- Neo4j: Social graphs, recommendations
- Use when: Massive scale, flexible schema, simple queries
ACID vs BASE:
- ACID: Strong consistency, transactions (SQL)
- BASE: Eventual consistency, availability (NoSQL)
Choosing:
- Need transactions? → SQL
- Massive scale (>1TB)? → NoSQL
- Complex queries? → SQL
- Flexible schema? → NoSQL
- When in doubt? → Start with PostgreSQL (can scale further than you think)
Further Reading
- Database of Databases - Comprehensive database catalog
- MongoDB vs PostgreSQL
- Martin Kleppmann - Designing Data-Intensive Applications (Chapter 2, 3)
- CAP Theorem Explained
Next: Database Deep Dive - Indexing, sharding, replication, partitioning.