Blog/System Design/Design a Scalable Notification System
POST
October 05, 2025
LAST UPDATEDOctober 05, 2025

Design a Scalable Notification System

Architect a multi-channel notification system supporting push, email, SMS, and in-app alerts. Learn priority queues, template engines, and user preference handling.

Tags

System DesignNotificationsMessage QueuePush
Design a Scalable Notification System
10 min read

Design a Scalable Notification System

This post applies concepts from the System Design from Zero to Hero series.

TL;DR

A scalable notification system decouples event producers from delivery channels using priority queues, supports user preferences and rate limiting, and tracks delivery across push, email, and SMS. The architecture follows a fan-out pattern where a central notification service routes messages to channel-specific workers via separate queues. Each worker handles its own template rendering, provider API integration, and retry logic. Rate limiting prevents notification fatigue, and a preference service gives users granular control over what they receive and how.

Requirements

Functional Requirements

  1. Multi-channel delivery — Support push notifications (iOS/Android), email, SMS, and in-app notifications.
  2. User preferences — Users can opt in/out of specific notification types and channels.
  3. Templating — Notifications use templates with dynamic variable substitution.
  4. Priority levels — Support critical (OTP, security alerts), high (order updates), medium (social activity), and low (marketing) priorities.
  5. Delivery tracking — Track whether each notification was sent, delivered, opened, or failed.
  6. Scheduling — Support immediate and scheduled (future) delivery.

Non-Functional Requirements

  1. At-least-once delivery — No notification should be silently lost.
  2. Low latency for critical notifications — OTP codes and security alerts must arrive within seconds.
  3. Scalability — Handle 1 billion notifications per day across all channels.
  4. Fault tolerance — A failure in one channel (e.g., SMS provider outage) must not affect other channels.
  5. Rate limiting — Prevent any single user from receiving more than N notifications per hour.

Back-of-Envelope Estimation

Assume 1 billion notifications per day across all channels:

  • Throughput: 1B / 86,400 ≈ ~11,500 notifications/second (average), peaks at ~50,000/second
  • Channel breakdown (typical): 60% push, 25% email, 10% in-app, 5% SMS
  • Push: ~6,900/second average → need ~7 workers at 1,000 ops/sec each
  • Email: ~2,900/second average → need ~3 workers
  • SMS: ~575/second average → SMS providers are the bottleneck (provider rate limits apply)
  • Storage: Notification metadata ≈ 500 bytes per notification → 1B * 500 = ~500 GB/day
  • Retention: 90-day retention for tracking data → ~45 TB

High-Level Design

Event Producers (Order Service, Auth Service, Social Service, Marketing Service)
         ↓
   Notification Service (validation, preferences, rate limiting, template rendering)
         ↓
   Priority Queue Router
    ↙    ↓     ↓      ↘
Push    Email  SMS   In-App
Queue   Queue  Queue  Queue
  ↓       ↓     ↓      ↓
Push    Email  SMS   In-App
Worker  Worker Worker Worker
  ↓       ↓     ↓      ↓
FCM/    Send-  Twi-  WebSocket
APNs    Grid   lio   /SSE
         ↓
   Delivery Tracking Store

Flow: An event producer (e.g., the Order Service) sends a notification request to the Notification Service. The service checks user preferences, applies rate limiting, renders the template, and routes the notification to the appropriate channel queue(s). Channel-specific workers consume from their queue, call the third-party provider API, and record the delivery result.

Detailed Design

Notification Service Architecture

The Notification Service is the brain of the system. It receives notification requests from internal services and orchestrates delivery. Its responsibilities, in order:

  1. Validate the request — Check that required fields (user_id, notification_type, template_id) are present.
  2. Check user preferences — Query the Preference Service to determine which channels the user has enabled for this notification type.
  3. Apply rate limiting — Check if the user has exceeded their notification quota for the current time window. For rate limiting patterns, see Part 8: API Design and Rate Limiting.
  4. Render templates — Fetch the template and substitute variables (user name, order ID, etc.).
  5. Route to channel queues — Publish the rendered notification to one or more channel-specific queues based on user preferences.
python
def process_notification(request: NotificationRequest):
    # 1. Validate
    validate(request)
 
    # 2. Check preferences
    user_prefs = preference_service.get(request.user_id, request.type)
    enabled_channels = user_prefs.enabled_channels  # e.g., ["push", "email"]
 
    if not enabled_channels:
        return  # User has opted out of this notification type
 
    # 3. Rate limit check
    if rate_limiter.is_limited(request.user_id, request.priority):
        if request.priority != "critical":
            metrics.increment("notification.rate_limited")
            return
        # Critical notifications bypass rate limiting
 
    # 4. Render template
    for channel in enabled_channels:
        rendered = template_engine.render(
            template_id=request.template_id,
            channel=channel,
            variables=request.variables
        )
 
        # 5. Route to channel queue
        queue = get_queue(channel, request.priority)
        queue.publish({
            "notification_id": generate_id(),
            "user_id": request.user_id,
            "channel": channel,
            "content": rendered,
            "priority": request.priority,
            "metadata": request.metadata
        })

Priority Queues

Not all notifications are equal. An OTP code that arrives 5 minutes late is useless, while a marketing email can tolerate minutes of delay. Use separate queues per priority level within each channel. For queue architecture patterns, see Part 6: Message Queues and Event-Driven Architecture.

Queue structure:

push-critical    → processed first, dedicated workers
push-high        → processed second
push-medium      → processed third
push-low         → processed last, throttled during peak

Workers poll the critical queue first. Only when the critical queue is empty do they process the high queue, and so on. Alternatively, assign dedicated worker pools per priority level so critical notifications are never blocked by a backlog of marketing notifications.

User Preference Management

The Preference Service stores per-user, per-notification-type channel preferences:

json
{
  "user_id": "u_12345",
  "preferences": {
    "order_updates": {
      "push": true,
      "email": true,
      "sms": false,
      "in_app": true
    },
    "marketing": {
      "push": false,
      "email": true,
      "sms": false,
      "in_app": false
    },
    "security_alerts": {
      "push": true,
      "email": true,
      "sms": true,
      "in_app": true
    }
  },
  "quiet_hours": {
    "enabled": true,
    "start": "22:00",
    "end": "08:00",
    "timezone": "America/New_York"
  }
}

Quiet hours: During quiet hours, non-critical notifications are held and delivered when the quiet period ends. Critical notifications (security alerts, OTPs) always bypass quiet hours.

Global unsubscribe: Some jurisdictions require a one-click unsubscribe for marketing emails (CAN-SPAM, GDPR). The preference service must support a global opt-out that overrides all marketing notification types.

Rate Limiting Per User

Rate limiting prevents notification fatigue and protects users from buggy upstream services that might fire thousands of events:

  • Per-user limit: Maximum 30 notifications per hour per user (across all channels).
  • Per-channel limit: Maximum 5 SMS per day per user (SMS is expensive).
  • Per-type limit: Maximum 3 marketing notifications per day per user.

Use a sliding window counter in Redis:

python
def is_rate_limited(user_id: str, priority: str) -> bool:
    if priority == "critical":
        return False  # Never rate-limit critical notifications
 
    key = f"notif_rate:{user_id}:{current_hour()}"
    count = redis.incr(key)
    if count == 1:
        redis.expire(key, 3600)  # 1-hour TTL
 
    return count > MAX_NOTIFICATIONS_PER_HOUR

Template Engine

Templates separate notification content from delivery logic. A template might look like:

Subject: Your order {{order_id}} has shipped!

Hi {{user_name}},

Your order #{{order_id}} has been shipped via {{carrier}}.
Track your package: {{tracking_url}}

Estimated delivery: {{delivery_date}}

Each channel has its own template variant:

  • Push: Short text, 100-character limit, with a deep link.
  • Email: Full HTML with branding, images, and footer.
  • SMS: Plain text, 160-character limit, no HTML.
  • In-app: Structured JSON with icon, title, body, and action URL.

Store templates in a database with versioning. When a template is updated, the new version takes effect for future notifications without affecting in-flight ones.

Delivery Tracking and Retry

Track the lifecycle of every notification:

PENDING → SENT → DELIVERED → OPENED → CLICKED
                     ↘ FAILED → RETRYING → SENT (retry)
                                    ↘ DEAD_LETTERED

Retry with exponential backoff:

When a provider returns a transient error (5xx, timeout), retry with exponential backoff: 1s, 2s, 4s, 8s, 16s, up to a maximum of 5 retries. After all retries are exhausted, move the notification to a dead-letter queue (DLQ) for manual inspection or alternative channel fallback.

python
def deliver_with_retry(notification, provider, max_retries=5):
    for attempt in range(max_retries):
        try:
            result = provider.send(notification)
            tracking.update(notification.id, status="SENT",
                          provider_id=result.message_id)
            return result
        except TransientError:
            wait_time = min(2 ** attempt, 60)  # cap at 60 seconds
            time.sleep(wait_time)
        except PermanentError as e:
            tracking.update(notification.id, status="FAILED",
                          error=str(e))
            return None
 
    # All retries exhausted
    dead_letter_queue.publish(notification)
    tracking.update(notification.id, status="DEAD_LETTERED")

Third-Party Provider Integration

Each channel integrates with external providers:

  • Push (iOS): Apple Push Notification Service (APNs) — requires device tokens, supports silent and alert notifications.
  • Push (Android): Firebase Cloud Messaging (FCM) — supports topics and device groups for multicast.
  • Email: SendGrid, Amazon SES, or Mailgun — support SMTP or REST APIs, provide webhooks for delivery/open/bounce tracking.
  • SMS: Twilio, AWS SNS, or Vonage — charge per message, have strict rate limits and regulatory requirements.

Provider abstraction: Wrap each provider behind a common interface so you can swap providers without changing worker logic. This also enables multi-provider failover: if SendGrid is down, automatically route emails through Amazon SES.

python
class NotificationProvider(ABC):
    @abstractmethod
    def send(self, recipient: str, content: str, metadata: dict) -> DeliveryResult:
        pass
 
class FCMProvider(NotificationProvider):
    def send(self, recipient, content, metadata):
        # Call FCM API with device token
        ...
 
class APNsProvider(NotificationProvider):
    def send(self, recipient, content, metadata):
        # Call APNs API with device token
        ...

Data Model

Notifications Table

ColumnTypeDescription
notification_idUUIDPrimary key
user_idUUIDRecipient
typeVARCHARorder_update / security / marketing
channelVARCHARpush / email / sms / in_app
priorityVARCHARcritical / high / medium / low
template_idVARCHARTemplate used
contentTEXTRendered content
statusVARCHARpending / sent / delivered / failed
provider_msg_idVARCHARID from the external provider
created_atTIMESTAMPWhen the notification was created
sent_atTIMESTAMPWhen it was sent to the provider
delivered_atTIMESTAMPWhen delivery was confirmed
retry_countINTNumber of retry attempts

User Preferences Table

ColumnTypeDescription
user_idUUIDPrimary key (partition key)
notification_typeVARCHARType of notification
channelVARCHARpush / email / sms / in_app
enabledBOOLEANWhether this channel is enabled
updated_atTIMESTAMPLast preference change

Templates Table

ColumnTypeDescription
template_idVARCHARPrimary key
channelVARCHARChannel variant
versionINTTemplate version
subjectVARCHARSubject line (email) or title
bodyTEXTTemplate body with
activeBOOLEANWhether this version is active
created_atTIMESTAMPCreation time

Device Tokens Table (for push notifications)

ColumnTypeDescription
user_idUUIDForeign key
device_idVARCHARUnique device identifier
platformVARCHARios / android / web
tokenVARCHARFCM or APNs device token
activeBOOLEANWhether the token is still valid
updated_atTIMESTAMPLast token refresh

Scaling Considerations

Horizontal scaling of workers: Each channel's worker pool scales independently. During a marketing blast (millions of emails), scale up email workers without affecting push notification latency.

Queue backpressure: Monitor queue depth per priority level. If the low-priority queue grows beyond a threshold, throttle marketing notification intake rather than letting the queue grow unbounded.

Database partitioning: Partition the notifications table by created_at (time-based partitioning). Delete partitions older than the retention period (e.g., 90 days) without expensive row-by-row deletes.

Provider rate limits: SMS providers like Twilio impose rate limits (e.g., 100 messages/second per account). Implement a per-provider token bucket rate limiter in the SMS worker to stay within limits. Queue messages that exceed the rate and process them as capacity becomes available.

Multi-region: Deploy the notification service in multiple regions. Route notifications to the region closest to the user's provider (e.g., APNs connections from a US region for US users). This reduces latency to the provider API.

Trade-offs and Alternatives

DecisionOption AOption BRecommendation
Queue per channelSeparate queuesSingle queue with routingSeparate queues for isolation
Priority handlingPriority queuesDedicated worker poolsDedicated pools for critical, shared for others
Template storageDatabaseFile system (Git)Database for dynamic updates, Git for version control
Delivery trackingSynchronous writeAsync event streamAsync for throughput, sync for critical notifications
Provider strategySingle providerMulti-provider with failoverMulti-provider for resilience

Why separate queues per channel? If the SMS provider has an outage, the SMS queue backs up. With a shared queue, this backlog would block push and email notifications from being processed. Separate queues provide fault isolation — a problem in one channel cannot cascade to others.

Why not just use a third-party notification service (OneSignal, Firebase)? Third-party services work well for simple use cases, but they limit control over priority routing, rate limiting logic, template management, and multi-provider failover. A custom notification service gives you full control at the cost of operational complexity. Many companies start with a third-party service and build their own as they scale.

FAQ

How do you prevent notification fatigue for users?

Implement per-user rate limiting, notification grouping and batching, priority levels that control urgency, user preference controls for each channel, and smart delivery timing based on user activity patterns. Rate limiting is the first line of defense: even if upstream services fire excessive events, the notification system caps delivery per user. Grouping collapses multiple similar notifications (e.g., "5 people liked your post" instead of 5 separate notifications). Smart timing delays non-urgent notifications to when the user is typically active, based on their historical engagement patterns.

How should a notification system handle delivery failures?

Use exponential backoff retry with dead-letter queues for permanently failed messages. Track delivery status per channel and fall back to alternative channels if the primary delivery method fails repeatedly. The retry strategy should distinguish between transient errors (provider timeout, rate limit exceeded) and permanent errors (invalid device token, user unsubscribed). Transient errors get retried; permanent errors are recorded and the relevant token or subscription is marked inactive. Dead-letter queues capture notifications that fail all retries for manual review and operational alerting.

What is the best architecture for supporting multiple notification channels?

Use a fan-out pattern where a central notification service routes messages to channel-specific workers (push, email, SMS) via separate queues. Each worker handles channel-specific logic, templates, and provider APIs. This architecture provides fault isolation (an SMS outage does not affect push delivery), independent scaling (email workers can scale up during a marketing blast), and clean separation of concerns (each worker only needs to understand one provider API). The central notification service handles cross-cutting concerns like preference checking, rate limiting, and deduplication.

Collaboration

Need help with a project?

Let's Build It

I help startups and established companies design, build, and scale world-class digital products. From deep technical architecture to pixel-perfect UI — let's bring your vision to life.

SH

Article Author

Sadam Hussain

Senior Full Stack Developer

Senior Full Stack Developer with over 7 years of experience building React, Next.js, Node.js, TypeScript, and AI-powered web platforms.

Related Articles

Design an E-Commerce Order Processing System
Jan 10, 202612 min read
System Design
E-Commerce
Saga Pattern

Design an E-Commerce Order Processing System

Design a fault-tolerant e-commerce order system with inventory management, payment processing, saga pattern for transactions, and event-driven order fulfillment.

Monitoring, Observability, and Site Reliability
Dec 10, 20259 min read
System Design
Observability
Monitoring

Monitoring, Observability, and Site Reliability

Build observable systems with structured logging, distributed tracing, and metrics dashboards. Learn SRE practices including SLOs, error budgets, and incident response.

CAP Theorem and Distributed Consensus
Nov 12, 202510 min read
System Design
CAP Theorem
Distributed Systems

CAP Theorem and Distributed Consensus

Understand the CAP theorem, its practical implications, and distributed consensus algorithms like Raft and Paxos. Learn how real databases handle partition tolerance.