Skip to main content

Shared Database Split

The Interview Question

"We have 5 microservices that all share a single PostgreSQL database. This creates tight coupling - schema changes affect all teams, and we can't deploy independently. How do we split this into database-per-service with zero downtime?"

Asked at: Google, Amazon, Microsoft, Stripe

Time to solve: 35-40 minutes

Difficulty: ⭐⭐⭐ (Senior)


Clarifying Questions to Ask

Before diving in, ask:

  1. "How many tables and what's the data volume?" → Affects migration time
  2. "Are there foreign keys between services' tables?" → Affects complexity
  3. "What's the current write QPS?" → Affects dual-write performance
  4. "Is there a timeline/deadline?" → Affects approach aggressiveness
  5. "Can we have brief read inconsistency?" → Affects sync strategy

The Problem Visualized

Why this is bad:

  • Team A changes users table → Team B's queries break
  • Can't scale Order DB independently from User DB
  • One bad query affects all services
  • Can't use different databases for different needs (SQL vs NoSQL)

Solution Approach

Phase 1: Data Domain Analysis (Week 1)

First, map which tables belong to which service:

# domain_mapping.yaml
user_service:
owned_tables:
- users
- user_preferences
- user_sessions
shared_tables:
- none

order_service:
owned_tables:
- orders
- order_items
- shipping_addresses
references:
- users.id (foreign key)

payment_service:
owned_tables:
- payments
- refunds
- payment_methods
references:
- orders.id
- users.id

inventory_service:
owned_tables:
- products
- inventory
- warehouses
references:
- none

notification_service:
owned_tables:
- notifications
- notification_templates
references:
- users.id

Phase 2: Create Service APIs (Weeks 2-3)

Replace direct database access with API calls:

# BEFORE: Direct database access
class OrderService:
def get_order_with_user(self, order_id):
return db.query("""
SELECT o.*, u.name, u.email
FROM orders o
JOIN users u ON o.user_id = u.id
WHERE o.id = ?
""", order_id)

# AFTER: API composition
class OrderService:
def get_order_with_user(self, order_id):
order = self.order_repo.get(order_id)
user = self.user_client.get_user(order.user_id) # HTTP call
return OrderWithUser(order=order, user=user)

Key principle: Service owns its data, exposes via API.

Phase 3: Create Separate Database (Week 3)

-- Create new database for User Service
CREATE DATABASE user_service_db;

-- Copy schema
pg_dump shared_db --schema-only --table=users --table=user_* | psql user_service_db

-- Initial data copy (during low traffic)
INSERT INTO user_service_db.users
SELECT * FROM shared_db.users;

Phase 4: Dual Write (Weeks 4-5)

Write to both databases during transition:

class UserRepository:
def __init__(self):
self.old_db = SharedDatabase()
self.new_db = UserServiceDatabase()
self.dual_write_enabled = True

def create_user(self, user_data):
# Always write to new DB first
new_user = self.new_db.insert("users", user_data)

# Also write to old DB (for other services still reading)
if self.dual_write_enabled:
self.old_db.insert("users", user_data)

return new_user

def update_user(self, user_id, data):
self.new_db.update("users", user_id, data)

if self.dual_write_enabled:
self.old_db.update("users", user_id, data)

Phase 5: Shadow Read Verification (Week 5)

Compare reads from both databases to ensure consistency:

def get_user_with_verification(user_id):
new_result = new_db.get_user(user_id)
old_result = old_db.get_user(user_id)

if new_result != old_result:
log.error(f"Data mismatch for user {user_id}",
new=new_result, old=old_result)
metrics.increment("migration.data_mismatch")

# Alert if mismatch rate exceeds threshold
if get_mismatch_rate() > 0.001: # 0.1%
alert_oncall("High data mismatch rate during migration")

return new_result # Return new DB result

Phase 6: Cutover (Week 6)

# Feature flag for gradual cutover
def get_user(user_id):
if feature_flag.is_enabled("use_new_user_db", user_id):
return new_db.get_user(user_id)
else:
return old_db.get_user(user_id)

# Rollout: 1% → 10% → 50% → 100%

Phase 7: Cleanup (Week 7)

# After 100% on new DB for 1 week with no issues:

# 1. Disable dual writes
dual_write_enabled = False

# 2. Remove old code paths
# 3. Drop tables from shared DB (after backup!)
# 4. Update documentation

Architecture After Migration


Handling Foreign Keys

Problem: Orders table has user_id foreign key to users table.

Solution: Remove FK, enforce at application level.

-- Before: Database enforced
ALTER TABLE orders
ADD CONSTRAINT fk_user
FOREIGN KEY (user_id) REFERENCES users(id);

-- After: Application enforced
-- Remove FK constraint
ALTER TABLE orders DROP CONSTRAINT fk_user;

-- Application validates
def create_order(order_data):
# Verify user exists via API
user = user_service.get_user(order_data.user_id)
if not user:
raise ValidationError("User not found")

return order_repo.create(order_data)

Trade-offs Discussion

ApproachProsCons
Big BangFast, simpleHigh risk, downtime
Strangler FigLow risk, reversibleSlow (6-8 weeks)
Dual WriteZero downtimeComplexity, consistency risk
CDC (Debezium)Real-time syncInfrastructure overhead

Recommended: Strangler Fig + Dual Write (what we described)


Follow-up Questions

"What if a service needs to join data from multiple services?"

Use API composition or CQRS (materialized view that combines data)

"How do you handle transactions across services?"

Saga pattern with compensation. See Distributed Transactions

"What about reporting that needs data from all services?"

Create a separate analytics database that receives events from all services


Key Takeaways

  1. Map data ownership first - Know which service owns what
  2. APIs before database split - Decouple access patterns first
  3. Dual write for safety - Maintain consistency during transition
  4. Feature flags for cutover - Easy rollback if issues
  5. Shadow reads for verification - Catch mismatches before they matter

Estimated timeline: 6-8 weeks for a team of 2-3 engineers

Risk level: Medium (with proper dual-write and verification)