Storage Fundamentals

TL;DR (30-second summary)

SQL (relational): Structured data, ACID transactions, complex queries. NoSQL: Flexible schema, horizontal scaling, eventual consistency. ACID = Strong guarantees. BASE = Availability over consistency.

Choose SQL when you need transactions and relationships. Choose NoSQL when you need massive scale and flexibility.

Why This Matters

In interviews: Database choice is often the first major design decision. Shows you understand trade-offs between consistency, scalability, and flexibility.

At work: Wrong database choice costs millions in rewrites and downtime.

Core Concepts

1. SQL vs NoSQL

Feature	SQL (Relational)	NoSQL
Schema	Fixed, predefined schema	Flexible, schemaless
Data Model	Tables with rows and columns	Documents, key-value, graphs, etc.
Relationships	Foreign keys, JOINs	Denormalized, embedded docs
Transactions	ACID (strong guarantees)	BASE (eventual consistency)
Scaling	Vertical (harder to scale out)	Horizontal (easy to scale out)
Query Language	SQL (standardized)	Varies by database
Use Case	Financial, ERP, e-commerce orders	Social media, logs, real-time analytics

2. ACID Properties

ACID = Strong consistency guarantees for SQL databases.

Property	Meaning	Example
Atomicity	All or nothing - transaction either completes fully or rolls back	Bank transfer: debit AND credit happen, or neither
Consistency	Database goes from one valid state to another	Account balance can't go negative (constraint enforced)
Isolation	Concurrent transactions don't interfere	Two users booking last seat don't both succeed
Durability	Committed data survives crashes	After "payment confirmed", data persists even if server crashes

Example (Bank Transfer):

BEGIN TRANSACTION;
  UPDATE accounts SET balance = balance - 100 WHERE id = 'Alice';
  UPDATE accounts SET balance = balance + 100 WHERE id = 'Bob';
COMMIT;
-- If either fails, both roll back (Atomicity)
-- Balance constraints enforced (Consistency)
-- Other transactions see intermediate state (Isolation)
-- After COMMIT, survives crash (Durability)

3. BASE Properties

BASE = Eventual consistency model for NoSQL at scale.

Property	Meaning
Basically Available	System appears to work most of the time (prioritizes availability)
Soft state	State may change over time without input (due to eventual consistency)
Eventually consistent	System will become consistent over time (given no new updates)

Example (Social Media Likes):

User A likes post → Write to US datacenter
User B in EU views post → Reads from EU datacenter (might not see like yet)
After replication (1-2 seconds) → User B sees the like

Result: Availability (both users can interact immediately)
Trade-off: Consistency (brief period where like not visible)

Interview Tip

Say: "For this banking system, we need ACID guarantees - SQL is the right choice. For this social media feed, eventual consistency is fine - NoSQL allows us to scale horizontally."

4. SQL Databases (Relational)

When to use:

✅ Structured data with relationships
✅ Need ACID transactions
✅ Complex queries with JOINs
✅ Data integrity is critical

Popular choices:

Database	Strengths	Use Case
PostgreSQL	Full-featured, JSONB support, extensions	General purpose, GIS data, analytics
MySQL	Fast, widely adopted, simple	Web apps, WordPress, e-commerce
SQL Server	Enterprise features, integrates with Microsoft stack	Corporate applications, .NET ecosystem
Oracle	High performance, advanced features	Large enterprises, mission-critical systems

Limitations:

❌ Vertical scaling limits (expensive to upgrade single server)
❌ Sharding is complex (need manual partition logic)
❌ Schema changes require migrations (downtime risk)

5. NoSQL Databases

Document Databases (MongoDB, CouchDB)

Data model: JSON-like documents

{
  "_id": "user123",
  "name": "John Doe",
  "email": "john@example.com",
  "addresses": [
    {"type": "home", "city": "NYC"},
    {"type": "work", "city": "SF"}
  ],
  "orders": [...]  // Embedded data
}

When to use:

✅ Flexible schema (product catalogs, content management)
✅ Nested/hierarchical data
✅ Rapid prototyping (no migrations)

Example: E-commerce product catalog (different products have different attributes)

Key-Value Stores (Redis, DynamoDB)

Data model: Simple key → value

key: "user:123:session"
value: "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9..."

key: "product:456"
value: {"name": "Laptop", "price": 999}

When to use:

✅ Caching (most common use case)
✅ Session storage
✅ Real-time leaderboards
✅ Simple access patterns (get by ID)

Example: Redis for caching API responses, DynamoDB for user sessions

Wide-Column Stores (Cassandra, HBase)

Data model: Rows with dynamic columns

Row key: "user123"
Columns: {timestamp1: data1, timestamp2: data2, ...}

When to use:

✅ Time-series data (logs, metrics, IoT)
✅ Massive write throughput
✅ Distributed across many nodes

Example: Cassandra for Netflix viewing history, HBase for Facebook messages

Graph Databases (Neo4j, Amazon Neptune)

Data model: Nodes and relationships

(Person: John)-[:FRIENDS_WITH]->(Person: Jane)
(Person: John)-[:LIKES]->(Movie: Inception)

When to use:

✅ Highly connected data (social networks)
✅ Recommendation engines
✅ Fraud detection (pattern matching)

Example: LinkedIn connections, fraud detection in banking

6. Storage Types by Use Case

Use Case	Storage Type	Example DB	Reason
Financial transactions	SQL	PostgreSQL	Need ACID guarantees
User sessions	Key-Value	Redis	Fast reads/writes, TTL support
Product catalog	Document	MongoDB	Flexible schema per product
Time-series logs	Wide-Column	Cassandra	Massive write throughput
Social graph	Graph	Neo4j	Efficient relationship queries
Blob storage	Object Store	S3	Files, images, videos
Search	Search Engine	Elasticsearch	Full-text search, faceted queries

Common Interview Questions

Q1: "SQL or NoSQL for Twitter?"

Answer: Hybrid approach

SQL (PostgreSQL): User accounts, relationships (followers) - need consistency
NoSQL (Cassandra): Tweets, timeline - need massive write throughput
Cache (Redis): Hot tweets, trending topics - need low latency
Search (Elasticsearch): Tweet search - need full-text search

Reason: Different data has different requirements. Use the right tool for each.

Q2: "Explain ACID with a concrete example."

Answer (E-commerce order):

BEGIN TRANSACTION;
  -- Atomicity: All these happen or none
  INSERT INTO orders (user_id, total) VALUES (123, 99.99);
  UPDATE inventory SET stock = stock - 1 WHERE product_id = 456;
  INSERT INTO payments (order_id, amount) VALUES (789, 99.99);
  
  -- Consistency: Stock can't go negative (constraint)
  -- Isolation: Two orders don't both grab last item
  -- Durability: After confirmation, data persists
COMMIT;

If payment fails → entire transaction rolls back (Atomicity)

Q3: "When would you choose MongoDB over PostgreSQL?"

Answer: Choose MongoDB when:

Schema evolves frequently (e.g., product attributes vary)
Need horizontal scaling without sharding complexity
Working with nested/hierarchical data
Rapid prototyping (no migrations)

Choose PostgreSQL when:

Need strong consistency and transactions
Complex queries with JOINs
Data integrity is critical
Team familiar with SQL

Q4: "What's eventual consistency? Give an example."

Answer: Data becomes consistent over time, but may be temporarily inconsistent.

Example (Instagram likes):

User in US likes post → written to US datacenter (0ms)
User in EU views same post → reads from EU datacenter
   → Might not see like immediately (replication lag ~1-2 seconds)
After replication → EU datacenter has the like (eventually consistent)

Trade-off: Availability (both operations succeed immediately) over consistency (brief stale reads)

Trade-offs

Decision	SQL	NoSQL	Consider
Consistency	Strong (ACID)	Eventual (BASE)	Business requirements
Scalability	Vertical (limited)	Horizontal (unlimited)	Expected growth
Flexibility	Fixed schema	Flexible schema	Data structure stability
Query Complexity	JOINs, aggregations	Simple lookups	Query patterns
Transactions	Multi-row ACID	Limited or none	Data integrity needs

Real-World Examples

Uber (Polyglot Persistence)

PostgreSQL: User accounts, driver profiles (transactional data)
Redis: Geospatial queries (nearby drivers), real-time location
Cassandra: Trip history (high write throughput)
Elasticsearch: Search for places, past trips

Netflix

MySQL: Billing, subscription data (ACID required)
Cassandra: Viewing history (billions of writes/day)
Elasticsearch: Search for movies, content discovery
EVCache (Redis): Caching layer for all reads

Facebook

MySQL (sharded): User data, relationships (custom sharding)
Cassandra: Messages, timeline posts
Memcached: Cache layer (read-heavy workload)
TAO (Graph): Social graph queries

Quick Reference Card

SQL Databases:

PostgreSQL: General purpose, feature-rich
MySQL: Simple, fast, widely adopted
Use when: Transactions, relationships, complex queries

NoSQL Databases:

MongoDB: Flexible schema, document store
Redis: Caching, sessions, key-value
Cassandra: Time-series, high writes, wide-column
Neo4j: Social graphs, recommendations
Use when: Massive scale, flexible schema, simple queries

ACID vs BASE:

ACID: Strong consistency, transactions (SQL)
BASE: Eventual consistency, availability (NoSQL)

Choosing:

Need transactions? → SQL
Massive scale (>1TB)? → NoSQL
Complex queries? → SQL
Flexible schema? → NoSQL
When in doubt? → Start with PostgreSQL (can scale further than you think)

TL;DR (30-second summary)​

Why This Matters​

Core Concepts​

1. SQL vs NoSQL​

2. ACID Properties​

3. BASE Properties​

4. SQL Databases (Relational)​

5. NoSQL Databases​

Document Databases (MongoDB, CouchDB)​

Key-Value Stores (Redis, DynamoDB)​

Wide-Column Stores (Cassandra, HBase)​

Graph Databases (Neo4j, Amazon Neptune)​

6. Storage Types by Use Case​

Common Interview Questions​

Q1: "SQL or NoSQL for Twitter?"​

Q2: "Explain ACID with a concrete example."​

Q3: "When would you choose MongoDB over PostgreSQL?"​

Q4: "What's eventual consistency? Give an example."​

Trade-offs​

Real-World Examples​

Uber (Polyglot Persistence)​

Netflix​

Facebook​

Quick Reference Card​

Further Reading​

TL;DR (30-second summary)

Why This Matters

Core Concepts

1. SQL vs NoSQL

2. ACID Properties

3. BASE Properties

4. SQL Databases (Relational)

5. NoSQL Databases

Document Databases (MongoDB, CouchDB)

Key-Value Stores (Redis, DynamoDB)

Wide-Column Stores (Cassandra, HBase)

Graph Databases (Neo4j, Amazon Neptune)

6. Storage Types by Use Case

Common Interview Questions

Q1: "SQL or NoSQL for Twitter?"

Q2: "Explain ACID with a concrete example."

Q3: "When would you choose MongoDB over PostgreSQL?"

Q4: "What's eventual consistency? Give an example."

Trade-offs

Real-World Examples

Uber (Polyglot Persistence)

Netflix

Facebook

Quick Reference Card

Further Reading