Skip to main content

Storage Fundamentals

TL;DR (30-second summary)

SQL (relational): Structured data, ACID transactions, complex queries. NoSQL: Flexible schema, horizontal scaling, eventual consistency. ACID = Strong guarantees. BASE = Availability over consistency.

Choose SQL when you need transactions and relationships. Choose NoSQL when you need massive scale and flexibility.

Why This Matters

In interviews: Database choice is often the first major design decision. Shows you understand trade-offs between consistency, scalability, and flexibility.

At work: Wrong database choice costs millions in rewrites and downtime.

Core Concepts

1. SQL vs NoSQL

FeatureSQL (Relational)NoSQL
SchemaFixed, predefined schemaFlexible, schemaless
Data ModelTables with rows and columnsDocuments, key-value, graphs, etc.
RelationshipsForeign keys, JOINsDenormalized, embedded docs
TransactionsACID (strong guarantees)BASE (eventual consistency)
ScalingVertical (harder to scale out)Horizontal (easy to scale out)
Query LanguageSQL (standardized)Varies by database
Use CaseFinancial, ERP, e-commerce ordersSocial media, logs, real-time analytics

2. ACID Properties

ACID = Strong consistency guarantees for SQL databases.

PropertyMeaningExample
AtomicityAll or nothing - transaction either completes fully or rolls backBank transfer: debit AND credit happen, or neither
ConsistencyDatabase goes from one valid state to anotherAccount balance can't go negative (constraint enforced)
IsolationConcurrent transactions don't interfereTwo users booking last seat don't both succeed
DurabilityCommitted data survives crashesAfter "payment confirmed", data persists even if server crashes

Example (Bank Transfer):

BEGIN TRANSACTION;
UPDATE accounts SET balance = balance - 100 WHERE id = 'Alice';
UPDATE accounts SET balance = balance + 100 WHERE id = 'Bob';
COMMIT;
-- If either fails, both roll back (Atomicity)
-- Balance constraints enforced (Consistency)
-- Other transactions see intermediate state (Isolation)
-- After COMMIT, survives crash (Durability)

3. BASE Properties

BASE = Eventual consistency model for NoSQL at scale.

PropertyMeaning
Basically AvailableSystem appears to work most of the time (prioritizes availability)
Soft stateState may change over time without input (due to eventual consistency)
Eventually consistentSystem will become consistent over time (given no new updates)

Example (Social Media Likes):

User A likes post → Write to US datacenter
User B in EU views post → Reads from EU datacenter (might not see like yet)
After replication (1-2 seconds) → User B sees the like

Result: Availability (both users can interact immediately)
Trade-off: Consistency (brief period where like not visible)
Interview Tip

Say: "For this banking system, we need ACID guarantees - SQL is the right choice. For this social media feed, eventual consistency is fine - NoSQL allows us to scale horizontally."

4. SQL Databases (Relational)

When to use:

  • ✅ Structured data with relationships
  • ✅ Need ACID transactions
  • ✅ Complex queries with JOINs
  • ✅ Data integrity is critical

Popular choices:

DatabaseStrengthsUse Case
PostgreSQLFull-featured, JSONB support, extensionsGeneral purpose, GIS data, analytics
MySQLFast, widely adopted, simpleWeb apps, WordPress, e-commerce
SQL ServerEnterprise features, integrates with Microsoft stackCorporate applications, .NET ecosystem
OracleHigh performance, advanced featuresLarge enterprises, mission-critical systems

Limitations:

  • ❌ Vertical scaling limits (expensive to upgrade single server)
  • ❌ Sharding is complex (need manual partition logic)
  • ❌ Schema changes require migrations (downtime risk)

5. NoSQL Databases

Document Databases (MongoDB, CouchDB)

Data model: JSON-like documents

{
"_id": "user123",
"name": "John Doe",
"email": "john@example.com",
"addresses": [
{"type": "home", "city": "NYC"},
{"type": "work", "city": "SF"}
],
"orders": [...] // Embedded data
}

When to use:

  • ✅ Flexible schema (product catalogs, content management)
  • ✅ Nested/hierarchical data
  • ✅ Rapid prototyping (no migrations)

Example: E-commerce product catalog (different products have different attributes)

Key-Value Stores (Redis, DynamoDB)

Data model: Simple key → value

key: "user:123:session"
value: "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9..."

key: "product:456"
value: {"name": "Laptop", "price": 999}

When to use:

  • ✅ Caching (most common use case)
  • ✅ Session storage
  • ✅ Real-time leaderboards
  • ✅ Simple access patterns (get by ID)

Example: Redis for caching API responses, DynamoDB for user sessions

Wide-Column Stores (Cassandra, HBase)

Data model: Rows with dynamic columns

Row key: "user123"
Columns: {timestamp1: data1, timestamp2: data2, ...}

When to use:

  • ✅ Time-series data (logs, metrics, IoT)
  • ✅ Massive write throughput
  • ✅ Distributed across many nodes

Example: Cassandra for Netflix viewing history, HBase for Facebook messages

Graph Databases (Neo4j, Amazon Neptune)

Data model: Nodes and relationships

(Person: John)-[:FRIENDS_WITH]->(Person: Jane)
(Person: John)-[:LIKES]->(Movie: Inception)

When to use:

  • ✅ Highly connected data (social networks)
  • ✅ Recommendation engines
  • ✅ Fraud detection (pattern matching)

Example: LinkedIn connections, fraud detection in banking

6. Storage Types by Use Case

Use CaseStorage TypeExample DBReason
Financial transactionsSQLPostgreSQLNeed ACID guarantees
User sessionsKey-ValueRedisFast reads/writes, TTL support
Product catalogDocumentMongoDBFlexible schema per product
Time-series logsWide-ColumnCassandraMassive write throughput
Social graphGraphNeo4jEfficient relationship queries
Blob storageObject StoreS3Files, images, videos
SearchSearch EngineElasticsearchFull-text search, faceted queries

Common Interview Questions

Q1: "SQL or NoSQL for Twitter?"

Answer: Hybrid approach

  • SQL (PostgreSQL): User accounts, relationships (followers) - need consistency
  • NoSQL (Cassandra): Tweets, timeline - need massive write throughput
  • Cache (Redis): Hot tweets, trending topics - need low latency
  • Search (Elasticsearch): Tweet search - need full-text search

Reason: Different data has different requirements. Use the right tool for each.

Q2: "Explain ACID with a concrete example."

Answer (E-commerce order):

BEGIN TRANSACTION;
-- Atomicity: All these happen or none
INSERT INTO orders (user_id, total) VALUES (123, 99.99);
UPDATE inventory SET stock = stock - 1 WHERE product_id = 456;
INSERT INTO payments (order_id, amount) VALUES (789, 99.99);

-- Consistency: Stock can't go negative (constraint)
-- Isolation: Two orders don't both grab last item
-- Durability: After confirmation, data persists
COMMIT;

If payment fails → entire transaction rolls back (Atomicity)

Q3: "When would you choose MongoDB over PostgreSQL?"

Answer: Choose MongoDB when:

  • Schema evolves frequently (e.g., product attributes vary)
  • Need horizontal scaling without sharding complexity
  • Working with nested/hierarchical data
  • Rapid prototyping (no migrations)

Choose PostgreSQL when:

  • Need strong consistency and transactions
  • Complex queries with JOINs
  • Data integrity is critical
  • Team familiar with SQL

Q4: "What's eventual consistency? Give an example."

Answer: Data becomes consistent over time, but may be temporarily inconsistent.

Example (Instagram likes):

1. User in US likes post → written to US datacenter (0ms)
2. User in EU views same post → reads from EU datacenter
→ Might not see like immediately (replication lag ~1-2 seconds)
3. After replication → EU datacenter has the like (eventually consistent)

Trade-off: Availability (both operations succeed immediately) over consistency (brief stale reads)

Trade-offs

DecisionSQLNoSQLConsider
ConsistencyStrong (ACID)Eventual (BASE)Business requirements
ScalabilityVertical (limited)Horizontal (unlimited)Expected growth
FlexibilityFixed schemaFlexible schemaData structure stability
Query ComplexityJOINs, aggregationsSimple lookupsQuery patterns
TransactionsMulti-row ACIDLimited or noneData integrity needs

Real-World Examples

Uber (Polyglot Persistence)

  • PostgreSQL: User accounts, driver profiles (transactional data)
  • Redis: Geospatial queries (nearby drivers), real-time location
  • Cassandra: Trip history (high write throughput)
  • Elasticsearch: Search for places, past trips

Netflix

  • MySQL: Billing, subscription data (ACID required)
  • Cassandra: Viewing history (billions of writes/day)
  • Elasticsearch: Search for movies, content discovery
  • EVCache (Redis): Caching layer for all reads

Facebook

  • MySQL (sharded): User data, relationships (custom sharding)
  • Cassandra: Messages, timeline posts
  • Memcached: Cache layer (read-heavy workload)
  • TAO (Graph): Social graph queries

Quick Reference Card

SQL Databases:

  • PostgreSQL: General purpose, feature-rich
  • MySQL: Simple, fast, widely adopted
  • Use when: Transactions, relationships, complex queries

NoSQL Databases:

  • MongoDB: Flexible schema, document store
  • Redis: Caching, sessions, key-value
  • Cassandra: Time-series, high writes, wide-column
  • Neo4j: Social graphs, recommendations
  • Use when: Massive scale, flexible schema, simple queries

ACID vs BASE:

  • ACID: Strong consistency, transactions (SQL)
  • BASE: Eventual consistency, availability (NoSQL)

Choosing:

  1. Need transactions? → SQL
  2. Massive scale (>1TB)? → NoSQL
  3. Complex queries? → SQL
  4. Flexible schema? → NoSQL
  5. When in doubt? → Start with PostgreSQL (can scale further than you think)

Further Reading


Next: Database Deep Dive - Indexing, sharding, replication, partitioning.