Shared Database Split
The Interview Question
"We have 5 microservices that all share a single PostgreSQL database. This creates tight coupling - schema changes affect all teams, and we can't deploy independently. How do we split this into database-per-service with zero downtime?"
Asked at: Google, Amazon, Microsoft, Stripe
Time to solve: 35-40 minutes
Difficulty: ⭐⭐⭐ (Senior)
Clarifying Questions to Ask
Before diving in, ask:
- "How many tables and what's the data volume?" → Affects migration time
- "Are there foreign keys between services' tables?" → Affects complexity
- "What's the current write QPS?" → Affects dual-write performance
- "Is there a timeline/deadline?" → Affects approach aggressiveness
- "Can we have brief read inconsistency?" → Affects sync strategy
The Problem Visualized
Why this is bad:
- Team A changes
userstable → Team B's queries break - Can't scale Order DB independently from User DB
- One bad query affects all services
- Can't use different databases for different needs (SQL vs NoSQL)
Solution Approach
Phase 1: Data Domain Analysis (Week 1)
First, map which tables belong to which service:
# domain_mapping.yaml
user_service:
owned_tables:
- users
- user_preferences
- user_sessions
shared_tables:
- none
order_service:
owned_tables:
- orders
- order_items
- shipping_addresses
references:
- users.id (foreign key)
payment_service:
owned_tables:
- payments
- refunds
- payment_methods
references:
- orders.id
- users.id
inventory_service:
owned_tables:
- products
- inventory
- warehouses
references:
- none
notification_service:
owned_tables:
- notifications
- notification_templates
references:
- users.id
Phase 2: Create Service APIs (Weeks 2-3)
Replace direct database access with API calls:
# BEFORE: Direct database access
class OrderService:
def get_order_with_user(self, order_id):
return db.query("""
SELECT o.*, u.name, u.email
FROM orders o
JOIN users u ON o.user_id = u.id
WHERE o.id = ?
""", order_id)
# AFTER: API composition
class OrderService:
def get_order_with_user(self, order_id):
order = self.order_repo.get(order_id)
user = self.user_client.get_user(order.user_id) # HTTP call
return OrderWithUser(order=order, user=user)
Key principle: Service owns its data, exposes via API.
Phase 3: Create Separate Database (Week 3)
-- Create new database for User Service
CREATE DATABASE user_service_db;
-- Copy schema
pg_dump shared_db --schema-only --table=users --table=user_* | psql user_service_db
-- Initial data copy (during low traffic)
INSERT INTO user_service_db.users
SELECT * FROM shared_db.users;
Phase 4: Dual Write (Weeks 4-5)
Write to both databases during transition:
class UserRepository:
def __init__(self):
self.old_db = SharedDatabase()
self.new_db = UserServiceDatabase()
self.dual_write_enabled = True
def create_user(self, user_data):
# Always write to new DB first
new_user = self.new_db.insert("users", user_data)
# Also write to old DB (for other services still reading)
if self.dual_write_enabled:
self.old_db.insert("users", user_data)
return new_user
def update_user(self, user_id, data):
self.new_db.update("users", user_id, data)
if self.dual_write_enabled:
self.old_db.update("users", user_id, data)
Phase 5: Shadow Read Verification (Week 5)
Compare reads from both databases to ensure consistency:
def get_user_with_verification(user_id):
new_result = new_db.get_user(user_id)
old_result = old_db.get_user(user_id)
if new_result != old_result:
log.error(f"Data mismatch for user {user_id}",
new=new_result, old=old_result)
metrics.increment("migration.data_mismatch")
# Alert if mismatch rate exceeds threshold
if get_mismatch_rate() > 0.001: # 0.1%
alert_oncall("High data mismatch rate during migration")
return new_result # Return new DB result
Phase 6: Cutover (Week 6)
# Feature flag for gradual cutover
def get_user(user_id):
if feature_flag.is_enabled("use_new_user_db", user_id):
return new_db.get_user(user_id)
else:
return old_db.get_user(user_id)
# Rollout: 1% → 10% → 50% → 100%
Phase 7: Cleanup (Week 7)
# After 100% on new DB for 1 week with no issues:
# 1. Disable dual writes
dual_write_enabled = False
# 2. Remove old code paths
# 3. Drop tables from shared DB (after backup!)
# 4. Update documentation
Architecture After Migration
Handling Foreign Keys
Problem: Orders table has user_id foreign key to users table.
Solution: Remove FK, enforce at application level.
-- Before: Database enforced
ALTER TABLE orders
ADD CONSTRAINT fk_user
FOREIGN KEY (user_id) REFERENCES users(id);
-- After: Application enforced
-- Remove FK constraint
ALTER TABLE orders DROP CONSTRAINT fk_user;
-- Application validates
def create_order(order_data):
# Verify user exists via API
user = user_service.get_user(order_data.user_id)
if not user:
raise ValidationError("User not found")
return order_repo.create(order_data)
Trade-offs Discussion
| Approach | Pros | Cons |
|---|---|---|
| Big Bang | Fast, simple | High risk, downtime |
| Strangler Fig | Low risk, reversible | Slow (6-8 weeks) |
| Dual Write | Zero downtime | Complexity, consistency risk |
| CDC (Debezium) | Real-time sync | Infrastructure overhead |
Recommended: Strangler Fig + Dual Write (what we described)
Follow-up Questions
"What if a service needs to join data from multiple services?"
Use API composition or CQRS (materialized view that combines data)
"How do you handle transactions across services?"
Saga pattern with compensation. See Distributed Transactions
"What about reporting that needs data from all services?"
Create a separate analytics database that receives events from all services
Key Takeaways
- Map data ownership first - Know which service owns what
- APIs before database split - Decouple access patterns first
- Dual write for safety - Maintain consistency during transition
- Feature flags for cutover - Easy rollback if issues
- Shadow reads for verification - Catch mismatches before they matter
Estimated timeline: 6-8 weeks for a team of 2-3 engineers
Risk level: Medium (with proper dual-write and verification)