Skip to main content

Notification Systems

TL;DR

Notification system: Deliver messages to users (push, email, SMS). Channels: Mobile push, web push, email, SMS. Challenges: Scale, reliability, preferences.

Core Concepts

1. Notification Channels

ChannelLatencyCostUse Case
Push notificationSecondsLowReal-time alerts (Uber driver nearby)
EmailSeconds-minutesLowReceipts, newsletters
SMSSecondsHighOTP codes, critical alerts
In-appInstantFreeNon-urgent (new message)

2. System Architecture

3. Key Components

Message Queue

  • Async processing: Don't block main flow
  • Buffering: Handle traffic spikes
  • Retry: Failed notifications retry

Template Engine

Hi {{user.name}},

Your order #{{order.id}} has shipped!

Track: {{tracking_url}}

User Preferences

{
"user_id": "123",
"email": true,
"push": true,
"sms": false,
"quiet_hours": "22:00-08:00"
}

4. Push Notifications

iOS: APNS (Apple Push Notification Service)
Android: FCM (Firebase Cloud Messaging)
Web: Web Push API

Flow:

5. Rate Limiting & Throttling

Problem: Don't spam users (100 notifications/day = uninstall)

Solution:

  • Frequency cap: Max 5 push notifications/day per user
  • Priority: Critical alerts bypass limit
  • Batching: Group notifications ("5 new messages" vs 5 separate)
  • Quiet hours: Don't send 3 AM notifications

Common Interview Questions

Q1: "Design a notification system for Uber."

Answer:

  1. Events: Driver nearby, trip started, trip completed
  2. Queue: Kafka for async processing
  3. Service: Reads queue, checks user preferences, sends via FCM/APNS
  4. Rate limiting: Max 10 notifications/day
  5. Fallback: If push fails, try SMS (critical events only)

Q2: "How do you handle millions of notifications/second?"

Answer:

  1. Message queue: Kafka buffers messages
  2. Horizontal scaling: 100+ notification workers
  3. Batch sending: Send 1000 notifications per API call
  4. Partitioning: Partition by user_id (parallelism)

Q3: "What if push notification provider (FCM) is down?"

Answer:

  • Retry with exponential backoff: Retry 3 times (1s, 2s, 4s)
  • Fallback: Send SMS for critical notifications
  • Queue: Messages stay in queue until delivered (persistence)

Quick Reference

Channels:

  • Push: Real-time, low cost (FCM, APNS)
  • Email: Receipts, newsletters (SendGrid, SES)
  • SMS: OTP, critical (Twilio, expensive)

Architecture:

  1. Message queue (async, buffering)
  2. Notification service (routing, preferences)
  3. External providers (FCM, SendGrid, Twilio)

Best practices:

  • Rate limiting (don't spam)
  • User preferences (respect opt-out)
  • Retry logic (handle failures)
  • Templates (consistency, i18n)

Part 2 Complete! You now understand the building blocks of system design.

Next: Part 3: Advanced Patterns - Distributed systems, consistency, reliability.