notification system architecturepush notificationsfan-outdelivery trackingFirebase Cloud MessagingSNSTwiliosystem design

Notification System Architecture: Channels, Fan-Out, and Delivery at Scale

March 28, 2026 8 min readBy Codelit Team Discussion

Every modern application needs to tell users about things — a new message, a completed deployment, a failed payment. What starts as a single email sender quickly becomes one of the most complex subsystems in your architecture. This guide covers how to design a notification system that scales across channels, respects user preferences, and guarantees delivery tracking.

Push vs Pull#

The first architectural decision is how notifications reach the consumer.

Push Model#

The server sends notifications to clients the moment events occur. Examples: WebSocket messages, mobile push notifications, email.

Pros: Low latency, real-time user experience. Cons: Requires persistent connections or third-party push infrastructure. Must handle offline users and retry delivery.

Pull Model#

The client periodically polls the server for new notifications. Examples: badge count APIs, notification inbox endpoints.

Pros: Simple to implement, no persistent connection needed. Cons: Higher latency (bounded by poll interval), wasted requests when there is nothing new, harder to scale with short intervals.

Hybrid (Most Common)#

Push a lightweight signal (WebSocket ping or silent push) telling the client to pull the full notification payload. This gives real-time awareness without pushing large payloads over persistent channels.

Notification Channels#

A mature system supports multiple delivery channels, each with different latency, cost, and reliability characteristics.

In-App#

Delivered via WebSocket or pulled from an API.
Stored in a notification inbox (database table or document).
Cheapest channel — no third-party cost.
User must have the app open to see real-time updates.

Email#

Highest reach — every user has an email address.
High latency (seconds to minutes depending on provider).
Rich formatting (HTML templates, images, CTAs).
Deliverability is a discipline unto itself: SPF, DKIM, DMARC, warm-up, bounce handling.
Tools: Amazon SES, SendGrid, Postmark.

SMS#

High open rates (~98%) but expensive per message.
Regulatory requirements vary by country (opt-in, STOP handling, quiet hours).
Best for time-critical alerts: 2FA codes, fraud alerts, outage notifications.
Tools: Twilio, Vonage, MessageBird.

Mobile Push#

Delivered through platform-specific services: Firebase Cloud Messaging (FCM) for Android, Apple Push Notification Service (APNs) for iOS.
Silent push can wake the app to fetch data without showing a visible notification.
Payload size limits (~4 KB) — keep the push lean and let the app fetch details.
Token management: device tokens expire and rotate; maintain a registry.

WebSocket#

Ideal for real-time in-app experiences: chat messages, live dashboards, collaborative editing.
Requires connection management, heartbeats, and reconnection logic.
Scale with a pub/sub backbone (Redis Pub/Sub, NATS) so any server can push to any connected client.

Priority and Throttling#

Not all notifications are equal. A fraud alert must arrive instantly; a weekly digest can wait.

Priority Levels#

Priority	Examples	Delivery Target
Critical	Security alerts, 2FA, payment failures	Immediate, multi-channel
High	Direct messages, mentions	Seconds, push + in-app
Medium	Comments, reactions	Minutes, batched
Low	Marketing, digests	Hours, email only

Throttling Strategies#

Rate limiting per user — cap notifications per channel per time window (e.g., max 5 push notifications per hour).
Batching — aggregate low-priority notifications into a single digest. "You have 12 new comments" beats 12 separate pushes.
De-duplication — if the same event triggers multiple notifications (e.g., someone edits a comment you were already notified about), collapse them.
Quiet hours — respect user-configured or locale-based do-not-disturb windows. Queue notifications and deliver when the window opens.
Back-pressure — if the notification pipeline is overloaded, shed low-priority traffic first.

Fan-Out Patterns#

When an event needs to notify many users — a post in a channel with 100K members — fan-out becomes the bottleneck.

Fan-Out on Write#

When the event occurs, immediately write a notification record for every recipient.

Pros: Read path is fast — each user queries their own notification inbox.
Cons: Expensive for high-follower-count events. Writing 100K records synchronously blocks the producer.
Mitigation: Use a message queue (SQS, Kafka) to fan out asynchronously.

Fan-Out on Read#

Store the event once. When a user opens their notification feed, query for events relevant to them.

Pros: Write path is O(1). No wasted writes for users who never check notifications.
Cons: Read path is expensive — must compute relevance at query time.

Hybrid Fan-Out#

Fan-out on write for users with low follower counts. For "celebrity" accounts (millions of followers), store the event once and merge it into feeds at read time. This is the approach Twitter and Instagram use.

Implementation Sketch#

Event → Message Queue → Fan-Out Workers → Notification Store (per user)
                              ↓
                     Channel Dispatchers
                    (Email, SMS, Push, WS)

Fan-out workers read from the queue, determine recipients and channels based on preferences, write to the notification store, and enqueue channel-specific delivery jobs.

Delivery Tracking#

You cannot improve what you do not measure. Track every notification through its lifecycle.

Status States#

CREATED → QUEUED → SENT → DELIVERED → READ
                     ↓
                   FAILED → RETRYING → SENT
                     ↓
                   DROPPED (max retries exceeded)

Key Metrics#

Metric	What It Tells You
Send rate	Throughput of the pipeline
Delivery rate	Percentage reaching the endpoint (email inbox, device)
Open/read rate	User engagement per channel
Failure rate	Infrastructure or provider issues
Latency (event → delivered)	End-to-end pipeline health
Unsubscribe rate	Content relevance and frequency tuning

Retry Strategy#

Exponential backoff with jitter for transient failures.
Channel fallback — if push fails after N retries, escalate to SMS for critical notifications.
Dead letter queue — park permanently failed notifications for manual investigation.

Preference Management#

Respecting user preferences is both a UX requirement and a legal one (GDPR, CAN-SPAM, TCPA).

Preference Model#

UserPreferences {
  userId: string
  channels: {
    email:  { enabled: bool, frequency: "instant" | "daily" | "weekly" }
    push:   { enabled: bool, quietHours: { start: "22:00", end: "08:00" } }
    sms:    { enabled: bool, onlyCritical: bool }
    inApp:  { enabled: bool }
  }
  categories: {
    marketing:    { enabled: bool }
    social:       { enabled: bool }
    transactional:{ enabled: bool }  // often legally required — cannot disable
    security:     { enabled: bool }  // also usually mandatory
  }
}

Rules Engine#

Before dispatching a notification, the preference engine evaluates:

Is this category enabled for this user?
Is this channel enabled?
Is the user in quiet hours? If yes, queue for later.
Has the rate limit been hit? If yes, batch or drop.
Is this a mandatory notification? (2FA, legal) — override preferences.

Tools and Services#

Tool	Channel	Notes
Firebase Cloud Messaging (FCM)	Mobile push, web push	Free, unified API for Android/iOS/web
Amazon SNS	Push, SMS, email, HTTP	Fan-out via topics, integrates with SQS and Lambda
Twilio	SMS, voice, WhatsApp	Programmable messaging, global reach
Amazon SES	Email	Low cost, high deliverability with proper setup
SendGrid	Email	Template management, analytics
OneSignal	Push, in-app, email, SMS	Unified notification platform with segmentation
Novu	All channels	Open-source notification infrastructure, preference management built in

Build vs Buy#

For startups, a managed service like Novu or OneSignal accelerates shipping. For large-scale systems with custom fan-out needs, a purpose-built pipeline on Kafka/SQS with channel-specific dispatchers gives full control.

Architecture Summary#

A complete notification system has five layers:

Event ingestion — services emit events to a message bus (Kafka, SNS).
Fan-out and routing — workers determine recipients, channels, and priority using the preference engine.
Channel dispatchers — dedicated workers per channel (email sender, push sender, SMS sender) with retry logic.
Notification store — per-user inbox for in-app notifications and audit trail.
Tracking and analytics — lifecycle status, delivery metrics, and feedback loops (bounces, unsubscribes).

Design each layer to scale independently. The email sender should not block push delivery. The fan-out workers should not block event ingestion. Queues between every layer provide the buffering and back-pressure that keep the system healthy under load.

Build, explore, and share system design diagrams on codelit.io — the visual system design and diagramming tool for developers and teams.

This is article #152 in our system design and software architecture series.

Try it on Codelit

Chaos Mode

Simulate node failures and watch cascading impact across your architecture

Build this architecture →

Comments

AI search

AI-Powered Search Architecture: Semantic Search, Hybrid Search, and RAG

8 min read

AI safety

AI Safety Guardrails Architecture: Input Validation, Output Filtering, and Human-in-the-Loop

8 min read

API design

API Backward Compatibility: Ship Changes Without Breaking Consumers

6 min read

Try these templates

Uber Real-Time Location System

Handles 5M+ GPS pings per second using H3 hexagonal geospatial indexing.

6 components

E-Commerce Checkout System

Production checkout flow with Stripe payments, inventory management, and fraud detection.

11 components

Notification System

Multi-channel notification platform with preferences, templating, and delivery tracking.

9 components

Build this architecture

Generate an interactive Notification System Architecture in seconds.

Try it in Codelit →

notification system architecturepush notificationsfan-outdelivery trackingFirebase Cloud MessagingSNSTwiliosystem design

Notification System Architecture: Channels, Fan-Out, and Delivery at Scale

March 28, 2026 8 min readBy Codelit Team Discussion

Push vs Pull#

The first architectural decision is how notifications reach the consumer.

Push Model#

The server sends notifications to clients the moment events occur. Examples: WebSocket messages, mobile push notifications, email.

Pros: Low latency, real-time user experience. Cons: Requires persistent connections or third-party push infrastructure. Must handle offline users and retry delivery.

Pull Model#

The client periodically polls the server for new notifications. Examples: badge count APIs, notification inbox endpoints.

Pros: Simple to implement, no persistent connection needed. Cons: Higher latency (bounded by poll interval), wasted requests when there is nothing new, harder to scale with short intervals.

Hybrid (Most Common)#

Notification Channels#

A mature system supports multiple delivery channels, each with different latency, cost, and reliability characteristics.

In-App#

Delivered via WebSocket or pulled from an API.
Stored in a notification inbox (database table or document).
Cheapest channel — no third-party cost.
User must have the app open to see real-time updates.

Email#

Highest reach — every user has an email address.
High latency (seconds to minutes depending on provider).
Rich formatting (HTML templates, images, CTAs).
Deliverability is a discipline unto itself: SPF, DKIM, DMARC, warm-up, bounce handling.
Tools: Amazon SES, SendGrid, Postmark.

SMS#

High open rates (~98%) but expensive per message.
Regulatory requirements vary by country (opt-in, STOP handling, quiet hours).
Best for time-critical alerts: 2FA codes, fraud alerts, outage notifications.
Tools: Twilio, Vonage, MessageBird.

Mobile Push#

Delivered through platform-specific services: Firebase Cloud Messaging (FCM) for Android, Apple Push Notification Service (APNs) for iOS.
Silent push can wake the app to fetch data without showing a visible notification.
Payload size limits (~4 KB) — keep the push lean and let the app fetch details.
Token management: device tokens expire and rotate; maintain a registry.

WebSocket#

Ideal for real-time in-app experiences: chat messages, live dashboards, collaborative editing.
Requires connection management, heartbeats, and reconnection logic.
Scale with a pub/sub backbone (Redis Pub/Sub, NATS) so any server can push to any connected client.

Priority and Throttling#

Not all notifications are equal. A fraud alert must arrive instantly; a weekly digest can wait.

Priority Levels#

Priority	Examples	Delivery Target
Critical	Security alerts, 2FA, payment failures	Immediate, multi-channel
High	Direct messages, mentions	Seconds, push + in-app
Medium	Comments, reactions	Minutes, batched
Low	Marketing, digests	Hours, email only

Throttling Strategies#

Rate limiting per user — cap notifications per channel per time window (e.g., max 5 push notifications per hour).
Batching — aggregate low-priority notifications into a single digest. "You have 12 new comments" beats 12 separate pushes.
De-duplication — if the same event triggers multiple notifications (e.g., someone edits a comment you were already notified about), collapse them.
Quiet hours — respect user-configured or locale-based do-not-disturb windows. Queue notifications and deliver when the window opens.
Back-pressure — if the notification pipeline is overloaded, shed low-priority traffic first.

Fan-Out Patterns#

When an event needs to notify many users — a post in a channel with 100K members — fan-out becomes the bottleneck.

Fan-Out on Write#

When the event occurs, immediately write a notification record for every recipient.

Pros: Read path is fast — each user queries their own notification inbox.
Cons: Expensive for high-follower-count events. Writing 100K records synchronously blocks the producer.
Mitigation: Use a message queue (SQS, Kafka) to fan out asynchronously.

Fan-Out on Read#

Store the event once. When a user opens their notification feed, query for events relevant to them.

Pros: Write path is O(1). No wasted writes for users who never check notifications.
Cons: Read path is expensive — must compute relevance at query time.

Hybrid Fan-Out#

Implementation Sketch#

Event → Message Queue → Fan-Out Workers → Notification Store (per user)
                              ↓
                     Channel Dispatchers
                    (Email, SMS, Push, WS)

Fan-out workers read from the queue, determine recipients and channels based on preferences, write to the notification store, and enqueue channel-specific delivery jobs.

Delivery Tracking#

You cannot improve what you do not measure. Track every notification through its lifecycle.

Status States#

CREATED → QUEUED → SENT → DELIVERED → READ
                     ↓
                   FAILED → RETRYING → SENT
                     ↓
                   DROPPED (max retries exceeded)

Key Metrics#

Metric	What It Tells You
Send rate	Throughput of the pipeline
Delivery rate	Percentage reaching the endpoint (email inbox, device)
Open/read rate	User engagement per channel
Failure rate	Infrastructure or provider issues
Latency (event → delivered)	End-to-end pipeline health
Unsubscribe rate	Content relevance and frequency tuning

Retry Strategy#

Exponential backoff with jitter for transient failures.
Channel fallback — if push fails after N retries, escalate to SMS for critical notifications.
Dead letter queue — park permanently failed notifications for manual investigation.

Preference Management#

Respecting user preferences is both a UX requirement and a legal one (GDPR, CAN-SPAM, TCPA).

Preference Model#

UserPreferences {
  userId: string
  channels: {
    email:  { enabled: bool, frequency: "instant" | "daily" | "weekly" }
    push:   { enabled: bool, quietHours: { start: "22:00", end: "08:00" } }
    sms:    { enabled: bool, onlyCritical: bool }
    inApp:  { enabled: bool }
  }
  categories: {
    marketing:    { enabled: bool }
    social:       { enabled: bool }
    transactional:{ enabled: bool }  // often legally required — cannot disable
    security:     { enabled: bool }  // also usually mandatory
  }
}

Rules Engine#

Before dispatching a notification, the preference engine evaluates:

Is this category enabled for this user?
Is this channel enabled?
Is the user in quiet hours? If yes, queue for later.
Has the rate limit been hit? If yes, batch or drop.
Is this a mandatory notification? (2FA, legal) — override preferences.

Tools and Services#

Tool	Channel	Notes
Firebase Cloud Messaging (FCM)	Mobile push, web push	Free, unified API for Android/iOS/web
Amazon SNS	Push, SMS, email, HTTP	Fan-out via topics, integrates with SQS and Lambda
Twilio	SMS, voice, WhatsApp	Programmable messaging, global reach
Amazon SES	Email	Low cost, high deliverability with proper setup
SendGrid	Email	Template management, analytics
OneSignal	Push, in-app, email, SMS	Unified notification platform with segmentation
Novu	All channels	Open-source notification infrastructure, preference management built in

Build vs Buy#

Architecture Summary#

A complete notification system has five layers:

Event ingestion — services emit events to a message bus (Kafka, SNS).
Fan-out and routing — workers determine recipients, channels, and priority using the preference engine.
Channel dispatchers — dedicated workers per channel (email sender, push sender, SMS sender) with retry logic.
Notification store — per-user inbox for in-app notifications and audit trail.
Tracking and analytics — lifecycle status, delivery metrics, and feedback loops (bounces, unsubscribes).

Build, explore, and share system design diagrams on codelit.io — the visual system design and diagramming tool for developers and teams.

This is article #152 in our system design and software architecture series.

Try it on Codelit

Chaos Mode

Simulate node failures and watch cascading impact across your architecture

Build this architecture →

Comments

AI search

Build this architecture

Generate an interactive Notification System Architecture in seconds.

Try it in Codelit →

Notification System Architecture: Channels, Fan-Out, and Delivery at Scale

Push vs Pull#

Push Model#

Pull Model#

Hybrid (Most Common)#

Notification Channels#

In-App#

Email#

SMS#

Mobile Push#

WebSocket#

Priority and Throttling#

Priority Levels#

Throttling Strategies#

Fan-Out Patterns#

Fan-Out on Write#

Fan-Out on Read#

Hybrid Fan-Out#

Implementation Sketch#

Delivery Tracking#

Status States#

Key Metrics#

Retry Strategy#

Preference Management#

Preference Model#

Rules Engine#

Tools and Services#

Build vs Buy#

Architecture Summary#

Comments

Related articles

AI-Powered Search Architecture: Semantic Search, Hybrid Search, and RAG

AI Safety Guardrails Architecture: Input Validation, Output Filtering, and Human-in-the-Loop

API Backward Compatibility: Ship Changes Without Breaking Consumers

Try these templates

Uber Real-Time Location System

E-Commerce Checkout System

Notification System

Build this architecture

Notification System Architecture: Channels, Fan-Out, and Delivery at Scale

Push vs Pull#

Push Model#

Pull Model#

Hybrid (Most Common)#

Notification Channels#

In-App#

Email#

SMS#

Mobile Push#

WebSocket#

Priority and Throttling#

Priority Levels#

Throttling Strategies#

Fan-Out Patterns#

Fan-Out on Write#

Fan-Out on Read#

Hybrid Fan-Out#

Implementation Sketch#

Delivery Tracking#

Status States#

Key Metrics#

Retry Strategy#

Preference Management#

Preference Model#

Rules Engine#

Tools and Services#

Build vs Buy#

Architecture Summary#

Comments

Related articles

AI-Powered Search Architecture: Semantic Search, Hybrid Search, and RAG

AI Safety Guardrails Architecture: Input Validation, Output Filtering, and Human-in-the-Loop

API Backward Compatibility: Ship Changes Without Breaking Consumers

Try these templates

Uber Real-Time Location System

E-Commerce Checkout System

Notification System

Build this architecture