Blog/System Design/Design a Scalable Notification System

POST

October 05, 2025

LAST UPDATEDOctober 05, 2025

Design a Scalable Notification System

Architect a multi-channel notification system supporting push, email, SMS, and in-app alerts. Learn priority queues, template engines, and user preference handling.

Design a Scalable Notification System

This post applies concepts from the System Design from Zero to Hero series.

TL;DR

A scalable notification system decouples event producers from delivery channels using priority queues, supports user preferences and rate limiting, and tracks delivery across push, email, and SMS. The architecture follows a fan-out pattern where a central notification service routes messages to channel-specific workers via separate queues. Each worker handles its own template rendering, provider API integration, and retry logic. Rate limiting prevents notification fatigue, and a preference service gives users granular control over what they receive and how.

Requirements

Functional Requirements

›Multi-channel delivery — Support push notifications (iOS/Android), email, SMS, and in-app notifications.
›User preferences — Users can opt in/out of specific notification types and channels.
›Templating — Notifications use templates with dynamic variable substitution.
›Priority levels — Support critical (OTP, security alerts), high (order updates), medium (social activity), and low (marketing) priorities.
›Delivery tracking — Track whether each notification was sent, delivered, opened, or failed.
›Scheduling — Support immediate and scheduled (future) delivery.

Non-Functional Requirements

›At-least-once delivery — No notification should be silently lost.
›Low latency for critical notifications — OTP codes and security alerts must arrive within seconds.
›Scalability — Handle 1 billion notifications per day across all channels.
›Fault tolerance — A failure in one channel (e.g., SMS provider outage) must not affect other channels.
›Rate limiting — Prevent any single user from receiving more than N notifications per hour.

Back-of-Envelope Estimation

Assume 1 billion notifications per day across all channels:

›Throughput: 1B / 86,400 ≈ ~11,500 notifications/second (average), peaks at ~50,000/second
›Channel breakdown (typical): 60% push, 25% email, 10% in-app, 5% SMS
›Push: ~6,900/second average → need ~7 workers at 1,000 ops/sec each
›Email: ~2,900/second average → need ~3 workers
›SMS: ~575/second average → SMS providers are the bottleneck (provider rate limits apply)
›Storage: Notification metadata ≈ 500 bytes per notification → 1B * 500 = ~500 GB/day
›Retention: 90-day retention for tracking data → ~45 TB

High-Level Design

Event Producers (Order Service, Auth Service, Social Service, Marketing Service)
         ↓
   Notification Service (validation, preferences, rate limiting, template rendering)
         ↓
   Priority Queue Router
    ↙    ↓     ↓      ↘
Push    Email  SMS   In-App
Queue   Queue  Queue  Queue
  ↓       ↓     ↓      ↓
Push    Email  SMS   In-App
Worker  Worker Worker Worker
  ↓       ↓     ↓      ↓
FCM/    Send-  Twi-  WebSocket
APNs    Grid   lio   /SSE
         ↓
   Delivery Tracking Store

Flow: An event producer (e.g., the Order Service) sends a notification request to the Notification Service. The service checks user preferences, applies rate limiting, renders the template, and routes the notification to the appropriate channel queue(s). Channel-specific workers consume from their queue, call the third-party provider API, and record the delivery result.

Detailed Design

Notification Service Architecture

The Notification Service is the brain of the system. It receives notification requests from internal services and orchestrates delivery. Its responsibilities, in order:

›Validate the request — Check that required fields (user_id, notification_type, template_id) are present.
›Check user preferences — Query the Preference Service to determine which channels the user has enabled for this notification type.
›Apply rate limiting — Check if the user has exceeded their notification quota for the current time window. For rate limiting patterns, see Part 8: API Design and Rate Limiting.
›Render templates — Fetch the template and substitute variables (user name, order ID, etc.).
›Route to channel queues — Publish the rendered notification to one or more channel-specific queues based on user preferences.

python

def process_notification(request: NotificationRequest):
    # 1. Validate
    validate(request)
 
    # 2. Check preferences
    user_prefs = preference_service.get(request.user_id, request.type)
    enabled_channels = user_prefs.enabled_channels  # e.g., ["push", "email"]
 
    if not enabled_channels:
        return  # User has opted out of this notification type
 
    # 3. Rate limit check
    if rate_limiter.is_limited(request.user_id, request.priority):
        if request.priority != "critical":
            metrics.increment("notification.rate_limited")
            return
        # Critical notifications bypass rate limiting
 
    # 4. Render template
    for channel in enabled_channels:
        rendered = template_engine.render(
            template_id=request.template_id,
            channel=channel,
            variables=request.variables
        )
 
        # 5. Route to channel queue
        queue = get_queue(channel, request.priority)
        queue.publish({
            "notification_id": generate_id(),
            "user_id": request.user_id,
            "channel": channel,
            "content": rendered,
            "priority": request.priority,
            "metadata": request.metadata
        })

Priority Queues

Not all notifications are equal. An OTP code that arrives 5 minutes late is useless, while a marketing email can tolerate minutes of delay. Use separate queues per priority level within each channel. For queue architecture patterns, see Part 6: Message Queues and Event-Driven Architecture.

Queue structure:

push-critical    → processed first, dedicated workers
push-high        → processed second
push-medium      → processed third
push-low         → processed last, throttled during peak

Workers poll the critical queue first. Only when the critical queue is empty do they process the high queue, and so on. Alternatively, assign dedicated worker pools per priority level so critical notifications are never blocked by a backlog of marketing notifications.

User Preference Management

The Preference Service stores per-user, per-notification-type channel preferences:

json

{
  "user_id": "u_12345",
  "preferences": {
    "order_updates": {
      "push": true,
      "email": true,
      "sms": false,
      "in_app": true
    },
    "marketing": {
      "push": false,
      "email": true,
      "sms": false,
      "in_app": false
    },
    "security_alerts": {
      "push": true,
      "email": true,
      "sms": true,
      "in_app": true
    }
  },
  "quiet_hours": {
    "enabled": true,
    "start": "22:00",
    "end": "08:00",
    "timezone": "America/New_York"
  }
}

Quiet hours: During quiet hours, non-critical notifications are held and delivered when the quiet period ends. Critical notifications (security alerts, OTPs) always bypass quiet hours.

Global unsubscribe: Some jurisdictions require a one-click unsubscribe for marketing emails (CAN-SPAM, GDPR). The preference service must support a global opt-out that overrides all marketing notification types.

Rate Limiting Per User

Rate limiting prevents notification fatigue and protects users from buggy upstream services that might fire thousands of events:

›Per-user limit: Maximum 30 notifications per hour per user (across all channels).
›Per-channel limit: Maximum 5 SMS per day per user (SMS is expensive).
›Per-type limit: Maximum 3 marketing notifications per day per user.

Use a sliding window counter in Redis:

python

def is_rate_limited(user_id: str, priority: str) -> bool:
    if priority == "critical":
        return False  # Never rate-limit critical notifications
 
    key = f"notif_rate:{user_id}:{current_hour()}"
    count = redis.incr(key)
    if count == 1:
        redis.expire(key, 3600)  # 1-hour TTL
 
    return count > MAX_NOTIFICATIONS_PER_HOUR

Template Engine

Templates separate notification content from delivery logic. A template might look like:

Subject: Your order {{order_id}} has shipped!

Hi {{user_name}},

Your order #{{order_id}} has been shipped via {{carrier}}.
Track your package: {{tracking_url}}

Estimated delivery: {{delivery_date}}

Each channel has its own template variant:

›Push: Short text, 100-character limit, with a deep link.
›Email: Full HTML with branding, images, and footer.
›SMS: Plain text, 160-character limit, no HTML.
›In-app: Structured JSON with icon, title, body, and action URL.

Store templates in a database with versioning. When a template is updated, the new version takes effect for future notifications without affecting in-flight ones.

Delivery Tracking and Retry

Track the lifecycle of every notification:

PENDING → SENT → DELIVERED → OPENED → CLICKED
                     ↘ FAILED → RETRYING → SENT (retry)
                                    ↘ DEAD_LETTERED

Retry with exponential backoff:

When a provider returns a transient error (5xx, timeout), retry with exponential backoff: 1s, 2s, 4s, 8s, 16s, up to a maximum of 5 retries. After all retries are exhausted, move the notification to a dead-letter queue (DLQ) for manual inspection or alternative channel fallback.

python

def deliver_with_retry(notification, provider, max_retries=5):
    for attempt in range(max_retries):
        try:
            result = provider.send(notification)
            tracking.update(notification.id, status="SENT",
                          provider_id=result.message_id)
            return result
        except TransientError:
            wait_time = min(2 ** attempt, 60)  # cap at 60 seconds
            time.sleep(wait_time)
        except PermanentError as e:
            tracking.update(notification.id, status="FAILED",
                          error=str(e))
            return None
 
    # All retries exhausted
    dead_letter_queue.publish(notification)
    tracking.update(notification.id, status="DEAD_LETTERED")

Third-Party Provider Integration

Each channel integrates with external providers:

›Push (iOS): Apple Push Notification Service (APNs) — requires device tokens, supports silent and alert notifications.
›Push (Android): Firebase Cloud Messaging (FCM) — supports topics and device groups for multicast.
›Email: SendGrid, Amazon SES, or Mailgun — support SMTP or REST APIs, provide webhooks for delivery/open/bounce tracking.
›SMS: Twilio, AWS SNS, or Vonage — charge per message, have strict rate limits and regulatory requirements.

Provider abstraction: Wrap each provider behind a common interface so you can swap providers without changing worker logic. This also enables multi-provider failover: if SendGrid is down, automatically route emails through Amazon SES.

python

class NotificationProvider(ABC):
    @abstractmethod
    def send(self, recipient: str, content: str, metadata: dict) -> DeliveryResult:
        pass
 
class FCMProvider(NotificationProvider):
    def send(self, recipient, content, metadata):
        # Call FCM API with device token
        ...
 
class APNsProvider(NotificationProvider):
    def send(self, recipient, content, metadata):
        # Call APNs API with device token
        ...

Data Model

Notifications Table

Column	Type	Description
notification_id	UUID	Primary key
user_id	UUID	Recipient
type	VARCHAR	order_update / security / marketing
channel	VARCHAR	push / email / sms / in_app
priority	VARCHAR	critical / high / medium / low
template_id	VARCHAR	Template used
content	TEXT	Rendered content
status	VARCHAR	pending / sent / delivered / failed
provider_msg_id	VARCHAR	ID from the external provider
created_at	TIMESTAMP	When the notification was created
sent_at	TIMESTAMP	When it was sent to the provider
delivered_at	TIMESTAMP	When delivery was confirmed
retry_count	INT	Number of retry attempts

User Preferences Table

Column	Type	Description
user_id	UUID	Primary key (partition key)
notification_type	VARCHAR	Type of notification
channel	VARCHAR	push / email / sms / in_app
enabled	BOOLEAN	Whether this channel is enabled
updated_at	TIMESTAMP	Last preference change

Templates Table

Column	Type	Description
template_id	VARCHAR	Primary key
channel	VARCHAR	Channel variant
version	INT	Template version
subject	VARCHAR	Subject line (email) or title
body	TEXT	Template body with
active	BOOLEAN	Whether this version is active
created_at	TIMESTAMP	Creation time

Device Tokens Table (for push notifications)

Column	Type	Description
user_id	UUID	Foreign key
device_id	VARCHAR	Unique device identifier
platform	VARCHAR	ios / android / web
token	VARCHAR	FCM or APNs device token
active	BOOLEAN	Whether the token is still valid
updated_at	TIMESTAMP	Last token refresh

Scaling Considerations

Horizontal scaling of workers: Each channel's worker pool scales independently. During a marketing blast (millions of emails), scale up email workers without affecting push notification latency.

Queue backpressure: Monitor queue depth per priority level. If the low-priority queue grows beyond a threshold, throttle marketing notification intake rather than letting the queue grow unbounded.

Database partitioning: Partition the notifications table by created_at (time-based partitioning). Delete partitions older than the retention period (e.g., 90 days) without expensive row-by-row deletes.

Provider rate limits: SMS providers like Twilio impose rate limits (e.g., 100 messages/second per account). Implement a per-provider token bucket rate limiter in the SMS worker to stay within limits. Queue messages that exceed the rate and process them as capacity becomes available.

Multi-region: Deploy the notification service in multiple regions. Route notifications to the region closest to the user's provider (e.g., APNs connections from a US region for US users). This reduces latency to the provider API.

Trade-offs and Alternatives

Decision	Option A	Option B	Recommendation
Queue per channel	Separate queues	Single queue with routing	Separate queues for isolation
Priority handling	Priority queues	Dedicated worker pools	Dedicated pools for critical, shared for others
Template storage	Database	File system (Git)	Database for dynamic updates, Git for version control
Delivery tracking	Synchronous write	Async event stream	Async for throughput, sync for critical notifications
Provider strategy	Single provider	Multi-provider with failover	Multi-provider for resilience

Why separate queues per channel? If the SMS provider has an outage, the SMS queue backs up. With a shared queue, this backlog would block push and email notifications from being processed. Separate queues provide fault isolation — a problem in one channel cannot cascade to others.

Why not just use a third-party notification service (OneSignal, Firebase)? Third-party services work well for simple use cases, but they limit control over priority routing, rate limiting logic, template management, and multi-provider failover. A custom notification service gives you full control at the cost of operational complexity. Many companies start with a third-party service and build their own as they scale.

FAQ

How do you prevent notification fatigue for users?

Implement per-user rate limiting, notification grouping and batching, priority levels that control urgency, user preference controls for each channel, and smart delivery timing based on user activity patterns. Rate limiting is the first line of defense: even if upstream services fire excessive events, the notification system caps delivery per user. Grouping collapses multiple similar notifications (e.g., "5 people liked your post" instead of 5 separate notifications). Smart timing delays non-urgent notifications to when the user is typically active, based on their historical engagement patterns.

How should a notification system handle delivery failures?

Use exponential backoff retry with dead-letter queues for permanently failed messages. Track delivery status per channel and fall back to alternative channels if the primary delivery method fails repeatedly. The retry strategy should distinguish between transient errors (provider timeout, rate limit exceeded) and permanent errors (invalid device token, user unsubscribed). Transient errors get retried; permanent errors are recorded and the relevant token or subscription is marked inactive. Dead-letter queues capture notifications that fail all retries for manual review and operational alerting.

What is the best architecture for supporting multiple notification channels?

Use a fan-out pattern where a central notification service routes messages to channel-specific workers (push, email, SMS) via separate queues. Each worker handles channel-specific logic, templates, and provider APIs. This architecture provides fault isolation (an SMS outage does not affect push delivery), independent scaling (email workers can scale up during a marketing blast), and clean separation of concerns (each worker only needs to understand one provider API). The central notification service handles cross-cutting concerns like preference checking, rate limiting, and deduplication.

Collaboration

Need help with a project?

Let's Build It

I help startups and established companies design, build, and scale world-class digital products. From deep technical architecture to pixel-perfect UI — let's bring your vision to life.

Start a Conversation

Article Author

Sadam Hussain

Senior Full Stack Developer

Senior Full Stack Developer with over 7 years of experience building React, Next.js, Node.js, TypeScript, and AI-powered web platforms.

Implementing Feature Flags for Progressive Rollout

Multi-Agent Orchestration Patterns

Design an E-Commerce Order Processing System

Jan 10, 202612 min read

System Design

E-Commerce

Saga Pattern

Design an E-Commerce Order Processing System

Design a fault-tolerant e-commerce order system with inventory management, payment processing, saga pattern for transactions, and event-driven order fulfillment.

Read Article

Monitoring, Observability, and Site Reliability

Dec 10, 20259 min read

System Design

Observability

Monitoring

Monitoring, Observability, and Site Reliability

Build observable systems with structured logging, distributed tracing, and metrics dashboards. Learn SRE practices including SLOs, error budgets, and incident response.

Read Article

Nov 12, 202510 min read

System Design

CAP Theorem

Distributed Systems

CAP Theorem and Distributed Consensus

Understand the CAP theorem, its practical implications, and distributed consensus algorithms like Raft and Paxos. Learn how real databases handle partition tolerance.

Read Article

Design a Scalable Notification System

Design a Scalable Notification System

TL;DR

Requirements

Functional Requirements

Non-Functional Requirements

Back-of-Envelope Estimation

High-Level Design

Detailed Design

Notification Service Architecture

Priority Queues

User Preference Management

Rate Limiting Per User

Template Engine

Delivery Tracking and Retry

Third-Party Provider Integration

Data Model

Notifications Table

User Preferences Table

Templates Table

Device Tokens Table (for push notifications)

Scaling Considerations

Trade-offs and Alternatives

FAQ

How do you prevent notification fatigue for users?

How should a notification system handle delivery failures?

What is the best architecture for supporting multiple notification channels?

Need help with a project?

Let's Build It

Sadam Hussain

Related Articles

Design an E-Commerce Order Processing System

Monitoring, Observability, and Site Reliability

CAP Theorem and Distributed Consensus