Design a Scalable Notification System
Architect a multi-channel notification system supporting push, email, SMS, and in-app alerts. Learn priority queues, template engines, and user preference handling.
Tags
Design a Scalable Notification System
This post applies concepts from the System Design from Zero to Hero series.
TL;DR
A scalable notification system decouples event producers from delivery channels using priority queues, supports user preferences and rate limiting, and tracks delivery across push, email, and SMS. The architecture follows a fan-out pattern where a central notification service routes messages to channel-specific workers via separate queues. Each worker handles its own template rendering, provider API integration, and retry logic. Rate limiting prevents notification fatigue, and a preference service gives users granular control over what they receive and how.
Requirements
Functional Requirements
- ›Multi-channel delivery — Support push notifications (iOS/Android), email, SMS, and in-app notifications.
- ›User preferences — Users can opt in/out of specific notification types and channels.
- ›Templating — Notifications use templates with dynamic variable substitution.
- ›Priority levels — Support critical (OTP, security alerts), high (order updates), medium (social activity), and low (marketing) priorities.
- ›Delivery tracking — Track whether each notification was sent, delivered, opened, or failed.
- ›Scheduling — Support immediate and scheduled (future) delivery.
Non-Functional Requirements
- ›At-least-once delivery — No notification should be silently lost.
- ›Low latency for critical notifications — OTP codes and security alerts must arrive within seconds.
- ›Scalability — Handle 1 billion notifications per day across all channels.
- ›Fault tolerance — A failure in one channel (e.g., SMS provider outage) must not affect other channels.
- ›Rate limiting — Prevent any single user from receiving more than N notifications per hour.
Back-of-Envelope Estimation
Assume 1 billion notifications per day across all channels:
- ›Throughput: 1B / 86,400 ≈ ~11,500 notifications/second (average), peaks at ~50,000/second
- ›Channel breakdown (typical): 60% push, 25% email, 10% in-app, 5% SMS
- ›Push: ~6,900/second average → need ~7 workers at 1,000 ops/sec each
- ›Email: ~2,900/second average → need ~3 workers
- ›SMS: ~575/second average → SMS providers are the bottleneck (provider rate limits apply)
- ›Storage: Notification metadata ≈ 500 bytes per notification → 1B * 500 = ~500 GB/day
- ›Retention: 90-day retention for tracking data → ~45 TB
High-Level Design
Event Producers (Order Service, Auth Service, Social Service, Marketing Service)
↓
Notification Service (validation, preferences, rate limiting, template rendering)
↓
Priority Queue Router
↙ ↓ ↓ ↘
Push Email SMS In-App
Queue Queue Queue Queue
↓ ↓ ↓ ↓
Push Email SMS In-App
Worker Worker Worker Worker
↓ ↓ ↓ ↓
FCM/ Send- Twi- WebSocket
APNs Grid lio /SSE
↓
Delivery Tracking Store
Flow: An event producer (e.g., the Order Service) sends a notification request to the Notification Service. The service checks user preferences, applies rate limiting, renders the template, and routes the notification to the appropriate channel queue(s). Channel-specific workers consume from their queue, call the third-party provider API, and record the delivery result.
Detailed Design
Notification Service Architecture
The Notification Service is the brain of the system. It receives notification requests from internal services and orchestrates delivery. Its responsibilities, in order:
- ›Validate the request — Check that required fields (user_id, notification_type, template_id) are present.
- ›Check user preferences — Query the Preference Service to determine which channels the user has enabled for this notification type.
- ›Apply rate limiting — Check if the user has exceeded their notification quota for the current time window. For rate limiting patterns, see Part 8: API Design and Rate Limiting.
- ›Render templates — Fetch the template and substitute variables (user name, order ID, etc.).
- ›Route to channel queues — Publish the rendered notification to one or more channel-specific queues based on user preferences.
def process_notification(request: NotificationRequest):
# 1. Validate
validate(request)
# 2. Check preferences
user_prefs = preference_service.get(request.user_id, request.type)
enabled_channels = user_prefs.enabled_channels # e.g., ["push", "email"]
if not enabled_channels:
return # User has opted out of this notification type
# 3. Rate limit check
if rate_limiter.is_limited(request.user_id, request.priority):
if request.priority != "critical":
metrics.increment("notification.rate_limited")
return
# Critical notifications bypass rate limiting
# 4. Render template
for channel in enabled_channels:
rendered = template_engine.render(
template_id=request.template_id,
channel=channel,
variables=request.variables
)
# 5. Route to channel queue
queue = get_queue(channel, request.priority)
queue.publish({
"notification_id": generate_id(),
"user_id": request.user_id,
"channel": channel,
"content": rendered,
"priority": request.priority,
"metadata": request.metadata
})Priority Queues
Not all notifications are equal. An OTP code that arrives 5 minutes late is useless, while a marketing email can tolerate minutes of delay. Use separate queues per priority level within each channel. For queue architecture patterns, see Part 6: Message Queues and Event-Driven Architecture.
Queue structure:
push-critical → processed first, dedicated workers
push-high → processed second
push-medium → processed third
push-low → processed last, throttled during peak
Workers poll the critical queue first. Only when the critical queue is empty do they process the high queue, and so on. Alternatively, assign dedicated worker pools per priority level so critical notifications are never blocked by a backlog of marketing notifications.
User Preference Management
The Preference Service stores per-user, per-notification-type channel preferences:
{
"user_id": "u_12345",
"preferences": {
"order_updates": {
"push": true,
"email": true,
"sms": false,
"in_app": true
},
"marketing": {
"push": false,
"email": true,
"sms": false,
"in_app": false
},
"security_alerts": {
"push": true,
"email": true,
"sms": true,
"in_app": true
}
},
"quiet_hours": {
"enabled": true,
"start": "22:00",
"end": "08:00",
"timezone": "America/New_York"
}
}Quiet hours: During quiet hours, non-critical notifications are held and delivered when the quiet period ends. Critical notifications (security alerts, OTPs) always bypass quiet hours.
Global unsubscribe: Some jurisdictions require a one-click unsubscribe for marketing emails (CAN-SPAM, GDPR). The preference service must support a global opt-out that overrides all marketing notification types.
Rate Limiting Per User
Rate limiting prevents notification fatigue and protects users from buggy upstream services that might fire thousands of events:
- ›Per-user limit: Maximum 30 notifications per hour per user (across all channels).
- ›Per-channel limit: Maximum 5 SMS per day per user (SMS is expensive).
- ›Per-type limit: Maximum 3 marketing notifications per day per user.
Use a sliding window counter in Redis:
def is_rate_limited(user_id: str, priority: str) -> bool:
if priority == "critical":
return False # Never rate-limit critical notifications
key = f"notif_rate:{user_id}:{current_hour()}"
count = redis.incr(key)
if count == 1:
redis.expire(key, 3600) # 1-hour TTL
return count > MAX_NOTIFICATIONS_PER_HOURTemplate Engine
Templates separate notification content from delivery logic. A template might look like:
Subject: Your order {{order_id}} has shipped!
Hi {{user_name}},
Your order #{{order_id}} has been shipped via {{carrier}}.
Track your package: {{tracking_url}}
Estimated delivery: {{delivery_date}}
Each channel has its own template variant:
- ›Push: Short text, 100-character limit, with a deep link.
- ›Email: Full HTML with branding, images, and footer.
- ›SMS: Plain text, 160-character limit, no HTML.
- ›In-app: Structured JSON with icon, title, body, and action URL.
Store templates in a database with versioning. When a template is updated, the new version takes effect for future notifications without affecting in-flight ones.
Delivery Tracking and Retry
Track the lifecycle of every notification:
PENDING → SENT → DELIVERED → OPENED → CLICKED
↘ FAILED → RETRYING → SENT (retry)
↘ DEAD_LETTERED
Retry with exponential backoff:
When a provider returns a transient error (5xx, timeout), retry with exponential backoff: 1s, 2s, 4s, 8s, 16s, up to a maximum of 5 retries. After all retries are exhausted, move the notification to a dead-letter queue (DLQ) for manual inspection or alternative channel fallback.
def deliver_with_retry(notification, provider, max_retries=5):
for attempt in range(max_retries):
try:
result = provider.send(notification)
tracking.update(notification.id, status="SENT",
provider_id=result.message_id)
return result
except TransientError:
wait_time = min(2 ** attempt, 60) # cap at 60 seconds
time.sleep(wait_time)
except PermanentError as e:
tracking.update(notification.id, status="FAILED",
error=str(e))
return None
# All retries exhausted
dead_letter_queue.publish(notification)
tracking.update(notification.id, status="DEAD_LETTERED")Third-Party Provider Integration
Each channel integrates with external providers:
- ›Push (iOS): Apple Push Notification Service (APNs) — requires device tokens, supports silent and alert notifications.
- ›Push (Android): Firebase Cloud Messaging (FCM) — supports topics and device groups for multicast.
- ›Email: SendGrid, Amazon SES, or Mailgun — support SMTP or REST APIs, provide webhooks for delivery/open/bounce tracking.
- ›SMS: Twilio, AWS SNS, or Vonage — charge per message, have strict rate limits and regulatory requirements.
Provider abstraction: Wrap each provider behind a common interface so you can swap providers without changing worker logic. This also enables multi-provider failover: if SendGrid is down, automatically route emails through Amazon SES.
class NotificationProvider(ABC):
@abstractmethod
def send(self, recipient: str, content: str, metadata: dict) -> DeliveryResult:
pass
class FCMProvider(NotificationProvider):
def send(self, recipient, content, metadata):
# Call FCM API with device token
...
class APNsProvider(NotificationProvider):
def send(self, recipient, content, metadata):
# Call APNs API with device token
...Data Model
Notifications Table
| Column | Type | Description |
|---|---|---|
| notification_id | UUID | Primary key |
| user_id | UUID | Recipient |
| type | VARCHAR | order_update / security / marketing |
| channel | VARCHAR | push / email / sms / in_app |
| priority | VARCHAR | critical / high / medium / low |
| template_id | VARCHAR | Template used |
| content | TEXT | Rendered content |
| status | VARCHAR | pending / sent / delivered / failed |
| provider_msg_id | VARCHAR | ID from the external provider |
| created_at | TIMESTAMP | When the notification was created |
| sent_at | TIMESTAMP | When it was sent to the provider |
| delivered_at | TIMESTAMP | When delivery was confirmed |
| retry_count | INT | Number of retry attempts |
User Preferences Table
| Column | Type | Description |
|---|---|---|
| user_id | UUID | Primary key (partition key) |
| notification_type | VARCHAR | Type of notification |
| channel | VARCHAR | push / email / sms / in_app |
| enabled | BOOLEAN | Whether this channel is enabled |
| updated_at | TIMESTAMP | Last preference change |
Templates Table
| Column | Type | Description |
|---|---|---|
| template_id | VARCHAR | Primary key |
| channel | VARCHAR | Channel variant |
| version | INT | Template version |
| subject | VARCHAR | Subject line (email) or title |
| body | TEXT | Template body with |
| active | BOOLEAN | Whether this version is active |
| created_at | TIMESTAMP | Creation time |
Device Tokens Table (for push notifications)
| Column | Type | Description |
|---|---|---|
| user_id | UUID | Foreign key |
| device_id | VARCHAR | Unique device identifier |
| platform | VARCHAR | ios / android / web |
| token | VARCHAR | FCM or APNs device token |
| active | BOOLEAN | Whether the token is still valid |
| updated_at | TIMESTAMP | Last token refresh |
Scaling Considerations
Horizontal scaling of workers: Each channel's worker pool scales independently. During a marketing blast (millions of emails), scale up email workers without affecting push notification latency.
Queue backpressure: Monitor queue depth per priority level. If the low-priority queue grows beyond a threshold, throttle marketing notification intake rather than letting the queue grow unbounded.
Database partitioning: Partition the notifications table by created_at (time-based partitioning). Delete partitions older than the retention period (e.g., 90 days) without expensive row-by-row deletes.
Provider rate limits: SMS providers like Twilio impose rate limits (e.g., 100 messages/second per account). Implement a per-provider token bucket rate limiter in the SMS worker to stay within limits. Queue messages that exceed the rate and process them as capacity becomes available.
Multi-region: Deploy the notification service in multiple regions. Route notifications to the region closest to the user's provider (e.g., APNs connections from a US region for US users). This reduces latency to the provider API.
Trade-offs and Alternatives
| Decision | Option A | Option B | Recommendation |
|---|---|---|---|
| Queue per channel | Separate queues | Single queue with routing | Separate queues for isolation |
| Priority handling | Priority queues | Dedicated worker pools | Dedicated pools for critical, shared for others |
| Template storage | Database | File system (Git) | Database for dynamic updates, Git for version control |
| Delivery tracking | Synchronous write | Async event stream | Async for throughput, sync for critical notifications |
| Provider strategy | Single provider | Multi-provider with failover | Multi-provider for resilience |
Why separate queues per channel? If the SMS provider has an outage, the SMS queue backs up. With a shared queue, this backlog would block push and email notifications from being processed. Separate queues provide fault isolation — a problem in one channel cannot cascade to others.
Why not just use a third-party notification service (OneSignal, Firebase)? Third-party services work well for simple use cases, but they limit control over priority routing, rate limiting logic, template management, and multi-provider failover. A custom notification service gives you full control at the cost of operational complexity. Many companies start with a third-party service and build their own as they scale.
FAQ
How do you prevent notification fatigue for users?
Implement per-user rate limiting, notification grouping and batching, priority levels that control urgency, user preference controls for each channel, and smart delivery timing based on user activity patterns. Rate limiting is the first line of defense: even if upstream services fire excessive events, the notification system caps delivery per user. Grouping collapses multiple similar notifications (e.g., "5 people liked your post" instead of 5 separate notifications). Smart timing delays non-urgent notifications to when the user is typically active, based on their historical engagement patterns.
How should a notification system handle delivery failures?
Use exponential backoff retry with dead-letter queues for permanently failed messages. Track delivery status per channel and fall back to alternative channels if the primary delivery method fails repeatedly. The retry strategy should distinguish between transient errors (provider timeout, rate limit exceeded) and permanent errors (invalid device token, user unsubscribed). Transient errors get retried; permanent errors are recorded and the relevant token or subscription is marked inactive. Dead-letter queues capture notifications that fail all retries for manual review and operational alerting.
What is the best architecture for supporting multiple notification channels?
Use a fan-out pattern where a central notification service routes messages to channel-specific workers (push, email, SMS) via separate queues. Each worker handles channel-specific logic, templates, and provider APIs. This architecture provides fault isolation (an SMS outage does not affect push delivery), independent scaling (email workers can scale up during a marketing blast), and clean separation of concerns (each worker only needs to understand one provider API). The central notification service handles cross-cutting concerns like preference checking, rate limiting, and deduplication.
Collaboration
Need help with a project?
Let's Build It
I help startups and established companies design, build, and scale world-class digital products. From deep technical architecture to pixel-perfect UI — let's bring your vision to life.
Related Articles
Design an E-Commerce Order Processing System
Design a fault-tolerant e-commerce order system with inventory management, payment processing, saga pattern for transactions, and event-driven order fulfillment.
Monitoring, Observability, and Site Reliability
Build observable systems with structured logging, distributed tracing, and metrics dashboards. Learn SRE practices including SLOs, error budgets, and incident response.
CAP Theorem and Distributed Consensus
Understand the CAP theorem, its practical implications, and distributed consensus algorithms like Raft and Paxos. Learn how real databases handle partition tolerance.