Skip to main content

Monolith to Microservices Migration

The Interview Question

"We have a 10-year-old monolithic e-commerce application. 500K lines of code, single PostgreSQL database, 50 developers stepping on each other. Management wants microservices. How do you approach this?"

Asked at: Amazon, Netflix, any company modernizing legacy systems

Time to solve: 40-45 minutes

Difficulty: ⭐⭐⭐⭐ (Staff level)


Clarifying Questions to Ask

  1. "What's driving the change?" → Scale? Team autonomy? Both?
  2. "What's the timeline/budget?" → 1 year vs 3 years changes approach
  3. "How coupled is the codebase?" → Spaghetti vs some modularity?
  4. "Can we have downtime?" → Affects migration strategy
  5. "What's the team structure?" → Conway's Law applies

The Anti-Patterns to Avoid

❌ Big Bang Rewrite

"Let's rewrite everything in microservices!"

Month 1-6: Parallel development, no new features
Month 6-12: Still rewriting, bugs accumulating in monolith
Month 12-18: Almost ready! Just a few more months...
Month 18-24: Project cancelled, morale destroyed

❌ Distributed Monolith

"We have 50 microservices!"
"But they all call each other synchronously..."
"And share a database..."
"And deploy together..."

Result: All the complexity of microservices, none of the benefits

The Strangler Fig Pattern

Gradually replace the monolith piece by piece:


Step-by-Step Migration Plan

Phase 0: Preparation (Month 1-2)

preparation_tasks:
codebase_analysis:
- Map all modules and dependencies
- Identify domain boundaries
- Find "seams" - natural separation points
- Inventory database tables by domain

infrastructure_setup:
- Set up Kubernetes cluster
- Implement CI/CD pipelines
- Deploy API gateway (Kong, Envoy)
- Set up observability (Prometheus, Jaeger)

team_preparation:
- Train on microservices patterns
- Define service ownership
- Establish coding standards
- Create service template repository

Phase 1: API Gateway & Feature Flags (Month 2-3)

# Add routing layer in front of monolith
# All traffic flows through here

class APIGateway:
def __init__(self):
self.routes = {
# Initially everything goes to monolith
'/api/users/*': 'http://monolith:8080',
'/api/orders/*': 'http://monolith:8080',
'/api/products/*': 'http://monolith:8080',
}
self.feature_flags = FeatureFlagService()

def route(self, request):
for pattern, target in self.routes.items():
if request.path.matches(pattern):
# Feature flag for gradual migration
if self.feature_flags.is_enabled(
f"route_to_new_service:{pattern}",
user_id=request.user_id
):
target = self.get_new_service_target(pattern)

return self.forward(request, target)

Phase 2: Identify Service Boundaries (Month 3-4)

# Domain-Driven Design to find boundaries
bounded_contexts:
user_management:
entities: [User, UserProfile, Authentication, Session]
tables: [users, user_profiles, sessions, auth_tokens]
apis: [/users, /auth, /profile]
team: "Identity Team"

product_catalog:
entities: [Product, Category, Inventory, Price]
tables: [products, categories, inventory, prices]
apis: [/products, /categories, /search]
team: "Catalog Team"

order_management:
entities: [Order, OrderItem, Cart, Checkout]
tables: [orders, order_items, carts]
apis: [/orders, /cart, /checkout]
team: "Commerce Team"

notification:
entities: [Notification, EmailTemplate, PushSubscription]
tables: [notifications, email_logs]
apis: [/notifications]
team: "Platform Team"

Phase 3: Extract First Service (Month 4-6)

Choose wisely - start with:

  • Least coupled module
  • Highest change frequency
  • Clear boundaries
  • Not critical path (lower risk)

Example: Extract Notification Service

# Step 1: Create new service with SAME interface
# notification_service/main.py
from fastapi import FastAPI

app = FastAPI()

@app.post("/api/notifications/send")
async def send_notification(request: NotificationRequest):
# Same logic as monolith, new service
await send_email(request.email, request.template, request.data)
await send_push(request.user_id, request.message)
return {"status": "sent"}
# Step 2: In monolith, add feature flag to route to new service
class NotificationModule:
def send_notification(self, user_id, message):
if feature_flags.is_enabled('use_notification_service'):
# Call new service
return http_client.post(
'http://notification-service/api/notifications/send',
json={'user_id': user_id, 'message': message}
)
else:
# Old monolith code
return self._legacy_send(user_id, message)
# Step 3: Gradual rollout
# 1% -> 10% -> 50% -> 100% of users
feature_flags.set_rollout('use_notification_service', percentage=1)

# Monitor for issues
if error_rate > threshold:
feature_flags.set_rollout('use_notification_service', percentage=0)

Phase 4: Database Separation (Month 6-9)

The hardest part - separate data:

-- Step 1: Identify notification tables
-- notifications, email_logs, push_subscriptions

-- Step 2: Create new database
CREATE DATABASE notification_service_db;

-- Step 3: Set up replication from monolith
-- Using Debezium/CDC to sync data

-- Step 4: Service reads from new DB, writes to both
-- Double-write pattern during transition
class NotificationRepository:
def __init__(self, migration_mode: str):
self.monolith_db = MonolithDatabase()
self.new_db = NotificationDatabase()
self.mode = migration_mode # 'read_old', 'read_new', 'write_both'

def get_notification(self, id):
if self.mode in ['read_new', 'write_both']:
return self.new_db.get(id)
return self.monolith_db.get(id)

def save_notification(self, notification):
if self.mode == 'write_both':
# Write to both during migration
self.new_db.save(notification)
self.monolith_db.save(notification)
elif self.mode == 'read_new':
self.new_db.save(notification)
else:
self.monolith_db.save(notification)

Phase 5: Continue Extraction (Month 9-18)

extraction_order:
- service: notification
complexity: low
timeline: 2 months

- service: user_management
complexity: medium
timeline: 3 months
dependencies: [notification]

- service: product_catalog
complexity: medium
timeline: 3 months
dependencies: []

- service: search
complexity: high
timeline: 4 months
dependencies: [product_catalog]

- service: order_management
complexity: very_high
timeline: 6 months
dependencies: [user, product, notification]

Handling Cross-Cutting Concerns

Authentication Across Services

# Centralized auth with JWT
class AuthMiddleware:
def __init__(self, jwt_secret: str):
self.secret = jwt_secret

def authenticate(self, request):
token = request.headers.get('Authorization', '').replace('Bearer ', '')

try:
payload = jwt.decode(token, self.secret, algorithms=['HS256'])
request.user_id = payload['user_id']
request.roles = payload['roles']
except jwt.InvalidTokenError:
raise Unauthorized()

Distributed Transactions

# Saga pattern for order creation
class CreateOrderSaga:
def execute(self, order_data):
# Step 1: Reserve inventory
inventory_reservation = inventory_service.reserve(
order_data.items
)

try:
# Step 2: Process payment
payment = payment_service.charge(
order_data.user_id,
order_data.total
)
except PaymentFailedError:
# Compensate: Release inventory
inventory_service.release(inventory_reservation.id)
raise

try:
# Step 3: Create order
order = order_service.create(order_data)
except OrderCreationError:
# Compensate: Refund and release
payment_service.refund(payment.id)
inventory_service.release(inventory_reservation.id)
raise

# Step 4: Confirm inventory (commit)
inventory_service.confirm(inventory_reservation.id)

return order

Data Consistency

# Event-driven sync between services
class OrderService:
def create_order(self, order_data):
order = self.repository.save(order_data)

# Publish event for other services
self.event_bus.publish(OrderCreatedEvent(
order_id=order.id,
user_id=order.user_id,
items=order.items,
total=order.total,
timestamp=datetime.utcnow()
))

return order

# Other services subscribe and update their views
class AnalyticsService:
@event_handler(OrderCreatedEvent)
def handle_order_created(self, event):
self.update_user_stats(event.user_id, event.total)
self.update_product_stats(event.items)

Success Metrics

metrics_to_track:
technical:
- Deployment frequency per service
- Lead time for changes
- Mean time to recovery (MTTR)
- Change failure rate

business:
- Feature delivery speed
- Team autonomy (independent deploys)
- System availability
- P99 latency

migration_progress:
- Percentage of traffic through new services
- Number of monolith dependencies remaining
- Database coupling score

Key Takeaways

  1. Strangler, don't rewrite - Incremental is safer
  2. Start with the edges - Less coupled, lower risk
  3. Database last - Hardest part, do it carefully
  4. Feature flags everywhere - Enable easy rollback
  5. Event-driven for consistency - Avoid distributed transactions
  6. Measure progress - Track monolith shrinkage

Timeline reality: A 500K LOC monolith to microservices takes 2-3 years minimum. Plan for a marathon, not a sprint.