Shared Database Split

The Interview Question

"We have 5 microservices that all share a single PostgreSQL database. This creates tight coupling - schema changes affect all teams, and we can't deploy independently. How do we split this into database-per-service with zero downtime?"

Asked at: Google, Amazon, Microsoft, Stripe

Time to solve: 35-40 minutes

Difficulty: ⭐⭐⭐ (Senior)

Clarifying Questions to Ask

Before diving in, ask:

"How many tables and what's the data volume?" → Affects migration time
"Are there foreign keys between services' tables?" → Affects complexity
"What's the current write QPS?" → Affects dual-write performance
"Is there a timeline/deadline?" → Affects approach aggressiveness
"Can we have brief read inconsistency?" → Affects sync strategy

The Problem Visualized

Why this is bad:

Team A changes users table → Team B's queries break
Can't scale Order DB independently from User DB
One bad query affects all services
Can't use different databases for different needs (SQL vs NoSQL)

Solution Approach

Phase 1: Data Domain Analysis (Week 1)

First, map which tables belong to which service:

# domain_mapping.yaml
user_service:
  owned_tables:
    - users
    - user_preferences
    - user_sessions
  shared_tables:
    - none

order_service:
  owned_tables:
    - orders
    - order_items
    - shipping_addresses
  references:
    - users.id (foreign key)

payment_service:
  owned_tables:
    - payments
    - refunds
    - payment_methods
  references:
    - orders.id
    - users.id

inventory_service:
  owned_tables:
    - products
    - inventory
    - warehouses
  references:
    - none

notification_service:
  owned_tables:
    - notifications
    - notification_templates
  references:
    - users.id

Phase 2: Create Service APIs (Weeks 2-3)

Replace direct database access with API calls:

# BEFORE: Direct database access
class OrderService:
    def get_order_with_user(self, order_id):
        return db.query("""
            SELECT o.*, u.name, u.email
            FROM orders o
            JOIN users u ON o.user_id = u.id
            WHERE o.id = ?
        """, order_id)

# AFTER: API composition
class OrderService:
    def get_order_with_user(self, order_id):
        order = self.order_repo.get(order_id)
        user = self.user_client.get_user(order.user_id)  # HTTP call
        return OrderWithUser(order=order, user=user)

Key principle: Service owns its data, exposes via API.

Phase 3: Create Separate Database (Week 3)

-- Create new database for User Service
CREATE DATABASE user_service_db;

-- Copy schema
pg_dump shared_db --schema-only --table=users --table=user_* | psql user_service_db

-- Initial data copy (during low traffic)
INSERT INTO user_service_db.users 
SELECT * FROM shared_db.users;

Phase 4: Dual Write (Weeks 4-5)

Write to both databases during transition:

class UserRepository:
    def __init__(self):
        self.old_db = SharedDatabase()
        self.new_db = UserServiceDatabase()
        self.dual_write_enabled = True
    
    def create_user(self, user_data):
        # Always write to new DB first
        new_user = self.new_db.insert("users", user_data)
        
        # Also write to old DB (for other services still reading)
        if self.dual_write_enabled:
            self.old_db.insert("users", user_data)
        
        return new_user
    
    def update_user(self, user_id, data):
        self.new_db.update("users", user_id, data)
        
        if self.dual_write_enabled:
            self.old_db.update("users", user_id, data)

Phase 5: Shadow Read Verification (Week 5)

Compare reads from both databases to ensure consistency:

def get_user_with_verification(user_id):
    new_result = new_db.get_user(user_id)
    old_result = old_db.get_user(user_id)
    
    if new_result != old_result:
        log.error(f"Data mismatch for user {user_id}",
                  new=new_result, old=old_result)
        metrics.increment("migration.data_mismatch")
        
        # Alert if mismatch rate exceeds threshold
        if get_mismatch_rate() > 0.001:  # 0.1%
            alert_oncall("High data mismatch rate during migration")
    
    return new_result  # Return new DB result

Phase 6: Cutover (Week 6)

# Feature flag for gradual cutover
def get_user(user_id):
    if feature_flag.is_enabled("use_new_user_db", user_id):
        return new_db.get_user(user_id)
    else:
        return old_db.get_user(user_id)

# Rollout: 1% → 10% → 50% → 100%

Phase 7: Cleanup (Week 7)

# After 100% on new DB for 1 week with no issues:

# 1. Disable dual writes
dual_write_enabled = False

# 2. Remove old code paths
# 3. Drop tables from shared DB (after backup!)
# 4. Update documentation

Architecture After Migration

Handling Foreign Keys

Problem: Orders table has user_id foreign key to users table.

Solution: Remove FK, enforce at application level.

-- Before: Database enforced
ALTER TABLE orders 
  ADD CONSTRAINT fk_user 
  FOREIGN KEY (user_id) REFERENCES users(id);

-- After: Application enforced
-- Remove FK constraint
ALTER TABLE orders DROP CONSTRAINT fk_user;

-- Application validates
def create_order(order_data):
    # Verify user exists via API
    user = user_service.get_user(order_data.user_id)
    if not user:
        raise ValidationError("User not found")
    
    return order_repo.create(order_data)

Trade-offs Discussion

Approach	Pros	Cons
Big Bang	Fast, simple	High risk, downtime
Strangler Fig	Low risk, reversible	Slow (6-8 weeks)
Dual Write	Zero downtime	Complexity, consistency risk
CDC (Debezium)	Real-time sync	Infrastructure overhead

Recommended: Strangler Fig + Dual Write (what we described)

Follow-up Questions

"What if a service needs to join data from multiple services?"

Use API composition or CQRS (materialized view that combines data)

"How do you handle transactions across services?"

Saga pattern with compensation. See Distributed Transactions

"What about reporting that needs data from all services?"

Create a separate analytics database that receives events from all services

Key Takeaways

Map data ownership first - Know which service owns what
APIs before database split - Decouple access patterns first
Dual write for safety - Maintain consistency during transition
Feature flags for cutover - Easy rollback if issues
Shadow reads for verification - Catch mismatches before they matter

Estimated timeline: 6-8 weeks for a team of 2-3 engineers

Risk level: Medium (with proper dual-write and verification)

The Interview Question​

Clarifying Questions to Ask​

The Problem Visualized​

Solution Approach​

Phase 1: Data Domain Analysis (Week 1)​

Phase 2: Create Service APIs (Weeks 2-3)​

Phase 3: Create Separate Database (Week 3)​

Phase 4: Dual Write (Weeks 4-5)​

Phase 5: Shadow Read Verification (Week 5)​

Phase 6: Cutover (Week 6)​

Phase 7: Cleanup (Week 7)​

Architecture After Migration​

Handling Foreign Keys​

Trade-offs Discussion​

Follow-up Questions​

"What if a service needs to join data from multiple services?"​

"How do you handle transactions across services?"​

"What about reporting that needs data from all services?"​

Key Takeaways​