Notification Systems
TL;DR
Notification system: Deliver messages to users (push, email, SMS). Channels: Mobile push, web push, email, SMS. Challenges: Scale, reliability, preferences.
Core Concepts
1. Notification Channels
| Channel | Latency | Cost | Use Case |
|---|---|---|---|
| Push notification | Seconds | Low | Real-time alerts (Uber driver nearby) |
| Seconds-minutes | Low | Receipts, newsletters | |
| SMS | Seconds | High | OTP codes, critical alerts |
| In-app | Instant | Free | Non-urgent (new message) |
2. System Architecture
3. Key Components
Message Queue
- Async processing: Don't block main flow
- Buffering: Handle traffic spikes
- Retry: Failed notifications retry
Template Engine
Hi {{user.name}},
Your order #{{order.id}} has shipped!
Track: {{tracking_url}}
User Preferences
{
"user_id": "123",
"email": true,
"push": true,
"sms": false,
"quiet_hours": "22:00-08:00"
}
4. Push Notifications
iOS: APNS (Apple Push Notification Service)
Android: FCM (Firebase Cloud Messaging)
Web: Web Push API
Flow:
5. Rate Limiting & Throttling
Problem: Don't spam users (100 notifications/day = uninstall)
Solution:
- Frequency cap: Max 5 push notifications/day per user
- Priority: Critical alerts bypass limit
- Batching: Group notifications ("5 new messages" vs 5 separate)
- Quiet hours: Don't send 3 AM notifications
Common Interview Questions
Q1: "Design a notification system for Uber."
Answer:
- Events: Driver nearby, trip started, trip completed
- Queue: Kafka for async processing
- Service: Reads queue, checks user preferences, sends via FCM/APNS
- Rate limiting: Max 10 notifications/day
- Fallback: If push fails, try SMS (critical events only)
Q2: "How do you handle millions of notifications/second?"
Answer:
- Message queue: Kafka buffers messages
- Horizontal scaling: 100+ notification workers
- Batch sending: Send 1000 notifications per API call
- Partitioning: Partition by user_id (parallelism)
Q3: "What if push notification provider (FCM) is down?"
Answer:
- Retry with exponential backoff: Retry 3 times (1s, 2s, 4s)
- Fallback: Send SMS for critical notifications
- Queue: Messages stay in queue until delivered (persistence)
Quick Reference
Channels:
- Push: Real-time, low cost (FCM, APNS)
- Email: Receipts, newsletters (SendGrid, SES)
- SMS: OTP, critical (Twilio, expensive)
Architecture:
- Message queue (async, buffering)
- Notification service (routing, preferences)
- External providers (FCM, SendGrid, Twilio)
Best practices:
- Rate limiting (don't spam)
- User preferences (respect opt-out)
- Retry logic (handle failures)
- Templates (consistency, i18n)
Part 2 Complete! You now understand the building blocks of system design.
Next: Part 3: Advanced Patterns - Distributed systems, consistency, reliability.