push notification system designnotification architectureFCMAPNsdelivery pipelinesystem designdistributed systems

Push Notification System Design: From Delivery Pipeline to Scale

March 28, 2026 7 min readBy Codelit Team Discussion

Every modern application needs to reach users across channels — push notifications, SMS, and email. A well-designed notification system balances reliability, latency, and user respect (nobody wants to be spammed). This guide walks through the architecture of a production-grade push notification system from first principles.

Functional Requirements#

Support multiple notification channels: mobile push (iOS/Android), web push, SMS, and email.
Allow users to manage notification preferences per channel and per category.
Provide a template engine for composable, localized messages.
Track delivery status and engagement analytics (delivered, opened, clicked).
Support scheduled and triggered notifications.

Non-Functional Requirements#

Latency: Transactional notifications (OTP, order confirmation) delivered within 2 seconds.
Throughput: Handle 100K+ notifications per minute during peak events.
Reliability: At-least-once delivery with idempotency to prevent duplicates.
Extensibility: Adding a new channel should not require rewriting the pipeline.

Notification Types#

Type	Channel	Latency Target	Example
Transactional	Push, SMS, Email	< 2 s	Password reset, OTP
Engagement	Push, Email	< 30 s	"Your friend posted a photo"
Marketing	Email, Push	Minutes	Weekly digest, promotions
System	Push, In-App	< 5 s	Maintenance window, outage

Separating types matters because each has different priority, throttling, and opt-out rules.

High-Level Architecture#

┌──────────┐     ┌──────────────┐     ┌──────────────┐
│ Services  │────▶│ Notification │────▶│   Priority   │
│ (triggers)│     │   Ingress    │     │    Queue     │
└──────────┘     └──────────────┘     └──────┬───────┘
                                             │
                      ┌──────────────────────┤
                      ▼                      ▼
               ┌─────────────┐      ┌──────────────┐
               │  Preference  │      │   Template   │
               │   Service    │      │    Engine    │
               └──────┬──────┘      └──────┬───────┘
                      │                     │
                      ▼                     ▼
               ┌─────────────────────────────────┐
               │        Channel Router           │
               └──┬──────┬──────┬──────┬────────┘
                  ▼      ▼      ▼      ▼
               Push    SMS   Email   WebPush

Notification Ingress#

Every service publishes a notification event to a message broker (Kafka or SQS). The event includes: user_id, notification_type, template_id, payload, and optional scheduled_at. The ingress validates the schema, deduplicates by idempotency key, and enqueues.

Priority Queue#

Not all notifications are equal. A Kafka topic with multiple partitions can be split by priority tier:

P0 — Critical: OTP, security alerts. Dedicated consumer group, no throttling.
P1 — High: Order updates, friend requests. Standard consumer group.
P2 — Low: Marketing, digests. Batch-processed, rate-limited.

Consumers pull from higher-priority partitions first using weighted consumption.

User Preferences Service#

Users control what they receive and how. The preference model stores per-user, per-category, per-channel settings:

{
  "user_id": "u_abc123",
  "preferences": {
    "order_updates": { "push": true, "email": true, "sms": false },
    "marketing": { "push": false, "email": true, "sms": false },
    "security": { "push": true, "email": true, "sms": true }
  },
  "quiet_hours": { "start": "22:00", "end": "07:00", "timezone": "America/New_York" }
}

The preference service is consulted before rendering or routing. If a user has opted out of push for marketing, the pipeline skips that channel entirely. Quiet hours defer non-critical notifications to a scheduled retry.

Device Token Management#

Mobile push requires device tokens registered with FCM (Firebase Cloud Messaging) for Android and APNs (Apple Push Notification service) for iOS.

Token lifecycle:

Registration — The client obtains a token from the OS and sends it to the backend.
Storage — Tokens are stored in a device registry keyed by (user_id, device_id). A user may have multiple devices.
Refresh — Tokens expire or rotate. The client re-registers on app launch; the backend upserts.
Invalidation — When FCM/APNs returns a "not registered" error, the token is soft-deleted to avoid wasting delivery attempts.

Store tokens in a low-latency database (DynamoDB or Redis-backed Postgres) since every push delivery requires a token lookup.

FCM and APNs Integration#

Both services expose HTTP/2 APIs. Key differences:

Aspect	FCM	APNs
Auth	OAuth 2.0 service account	JWT or certificate
Payload limit	4 KB	4 KB
Topic/group send	Yes (topics, conditions)	Yes (topic-based)
Feedback	Immediate HTTP response	Immediate HTTP/2 response

Wrap both behind a push provider abstraction so the channel router calls a unified interface. If you later add Huawei Push Kit or Amazon ADM, you add an adapter — no pipeline changes.

Retry and Backoff#

FCM and APNs can return transient errors (429, 503). Use exponential backoff with jitter. After N retries, move the notification to a dead-letter queue for manual inspection.

Template Engine#

Hardcoding message strings is a maintenance nightmare. A template engine decouples copy from code:

// Template: order_shipped
{
  "push": {
    "title": "Your order is on its way!",
    "body": "{{item_name}} shipped via {{carrier}}. Track: {{tracking_url}}"
  },
  "email": {
    "subject": "Order #{{order_id}} shipped",
    "html_template": "order_shipped.html"
  }
}

Templates support localization (keyed by locale), A/B variants (for engagement experiments), and rich media (images, action buttons). The engine resolves variables from the event payload and returns channel-specific rendered content.

Channel Router#

After preference filtering and template rendering, the router fans out to channel-specific delivery services. Each channel service:

Looks up the delivery address (device token, phone number, email).
Calls the external provider (FCM, Twilio, SendGrid).
Records the delivery attempt and provider response.

Channels run in parallel — a single notification event can produce a push, an email, and an SMS simultaneously.

Analytics Pipeline#

Track every notification through its lifecycle:

Event	Timestamp	Source
created	Ingress receives event	Backend
filtered	Skipped due to preferences	Preference service
rendered	Template resolved	Template engine
sent	Handed to provider	Channel service
delivered	Provider confirms delivery	FCM/APNs callback
opened	User taps notification	Client SDK
clicked	User taps CTA in notification	Client SDK / deep link

Store events in a time-series database (ClickHouse, TimescaleDB) and compute metrics: delivery rate, open rate, click-through rate, opt-out rate. These metrics feed back into the template engine for A/B test decisions and into the throttling system.

Throttling and Rate Limiting#

Throttling protects both users and downstream providers:

Per-user throttle: No more than N push notifications per hour (configurable per category). Excess notifications are batched into a digest.
Per-provider throttle: Respect FCM/APNs rate limits. Use a token-bucket rate limiter in front of each provider client.
Global throttle: During mass sends (marketing campaigns), ramp up gradually to avoid provider penalties and monitor bounce rates.

Implement throttling as middleware in the channel router so it applies uniformly.

Scaling Considerations#

Horizontal scaling — Each stage of the pipeline (ingress, preference lookup, rendering, routing, delivery) is a stateless service behind an auto-scaling group. Kafka partitions provide natural parallelism.

Multi-region — For global users, deploy delivery services close to provider endpoints. FCM and APNs have regional endpoints; routing to the nearest one reduces latency.

Database sharding — The device token registry and preference store are sharded by user_id. Analytics data is partitioned by time.

Observability — Emit structured logs, distributed traces (OpenTelemetry), and metrics (Prometheus) at every stage. Alert on delivery rate drops, provider error spikes, and queue depth anomalies.

Failure Modes#

Failure	Mitigation
Provider outage (FCM down)	Circuit breaker; queue retries; failover to alternate channel
Token expired	Soft-delete token; suppress future attempts until re-registration
Template missing variable	Render with fallback copy; alert content team
Kafka consumer lag	Auto-scale consumers; alert on lag threshold
Duplicate delivery	Idempotency key at ingress; dedup at channel service

Summary#

A production push notification system is a multi-stage pipeline: ingest, prioritize, filter by preferences, render templates, route to channels, deliver, and measure. The keys to doing it well are separating concerns at each stage, abstracting providers behind clean interfaces, and respecting user preferences at every step.

Build and explore system designs like this interactively at codelit.io.

This is article #197 in the Codelit engineering blog series.

Try it on Codelit

Chaos Mode

Simulate node failures and watch cascading impact across your architecture

Build this architecture →

Comments

AI search

AI-Powered Search Architecture: Semantic Search, Hybrid Search, and RAG

8 min read

AI safety

AI Safety Guardrails Architecture: Input Validation, Output Filtering, and Human-in-the-Loop

8 min read

API design

API Backward Compatibility: Ship Changes Without Breaking Consumers

6 min read

Try these templates

Uber Real-Time Location System

Handles 5M+ GPS pings per second using H3 hexagonal geospatial indexing.

6 components

OpenAI API Request Pipeline

7-stage pipeline from API call to token generation, handling millions of requests per minute.

8 components

E-Commerce Checkout System

Production checkout flow with Stripe payments, inventory management, and fraud detection.

11 components

Build this architecture

Generate an interactive architecture for Push Notification System Design in seconds.

Try it in Codelit →

push notification system designnotification architectureFCMAPNsdelivery pipelinesystem designdistributed systems

Push Notification System Design: From Delivery Pipeline to Scale

March 28, 2026 7 min readBy Codelit Team Discussion

Functional Requirements#

Support multiple notification channels: mobile push (iOS/Android), web push, SMS, and email.
Allow users to manage notification preferences per channel and per category.
Provide a template engine for composable, localized messages.
Track delivery status and engagement analytics (delivered, opened, clicked).
Support scheduled and triggered notifications.

Non-Functional Requirements#

Latency: Transactional notifications (OTP, order confirmation) delivered within 2 seconds.
Throughput: Handle 100K+ notifications per minute during peak events.
Reliability: At-least-once delivery with idempotency to prevent duplicates.
Extensibility: Adding a new channel should not require rewriting the pipeline.

Notification Types#

Type	Channel	Latency Target	Example
Transactional	Push, SMS, Email	< 2 s	Password reset, OTP
Engagement	Push, Email	< 30 s	"Your friend posted a photo"
Marketing	Email, Push	Minutes	Weekly digest, promotions
System	Push, In-App	< 5 s	Maintenance window, outage

Separating types matters because each has different priority, throttling, and opt-out rules.

High-Level Architecture#

┌──────────┐     ┌──────────────┐     ┌──────────────┐
│ Services  │────▶│ Notification │────▶│   Priority   │
│ (triggers)│     │   Ingress    │     │    Queue     │
└──────────┘     └──────────────┘     └──────┬───────┘
                                             │
                      ┌──────────────────────┤
                      ▼                      ▼
               ┌─────────────┐      ┌──────────────┐
               │  Preference  │      │   Template   │
               │   Service    │      │    Engine    │
               └──────┬──────┘      └──────┬───────┘
                      │                     │
                      ▼                     ▼
               ┌─────────────────────────────────┐
               │        Channel Router           │
               └──┬──────┬──────┬──────┬────────┘
                  ▼      ▼      ▼      ▼
               Push    SMS   Email   WebPush

Notification Ingress#

Priority Queue#

Not all notifications are equal. A Kafka topic with multiple partitions can be split by priority tier:

P0 — Critical: OTP, security alerts. Dedicated consumer group, no throttling.
P1 — High: Order updates, friend requests. Standard consumer group.
P2 — Low: Marketing, digests. Batch-processed, rate-limited.

Consumers pull from higher-priority partitions first using weighted consumption.

User Preferences Service#

Users control what they receive and how. The preference model stores per-user, per-category, per-channel settings:

{
  "user_id": "u_abc123",
  "preferences": {
    "order_updates": { "push": true, "email": true, "sms": false },
    "marketing": { "push": false, "email": true, "sms": false },
    "security": { "push": true, "email": true, "sms": true }
  },
  "quiet_hours": { "start": "22:00", "end": "07:00", "timezone": "America/New_York" }
}

Device Token Management#

Mobile push requires device tokens registered with FCM (Firebase Cloud Messaging) for Android and APNs (Apple Push Notification service) for iOS.

Token lifecycle:

Registration — The client obtains a token from the OS and sends it to the backend.
Storage — Tokens are stored in a device registry keyed by (user_id, device_id). A user may have multiple devices.
Refresh — Tokens expire or rotate. The client re-registers on app launch; the backend upserts.
Invalidation — When FCM/APNs returns a "not registered" error, the token is soft-deleted to avoid wasting delivery attempts.

Store tokens in a low-latency database (DynamoDB or Redis-backed Postgres) since every push delivery requires a token lookup.

FCM and APNs Integration#

Both services expose HTTP/2 APIs. Key differences:

Aspect	FCM	APNs
Auth	OAuth 2.0 service account	JWT or certificate
Payload limit	4 KB	4 KB
Topic/group send	Yes (topics, conditions)	Yes (topic-based)
Feedback	Immediate HTTP response	Immediate HTTP/2 response

Wrap both behind a push provider abstraction so the channel router calls a unified interface. If you later add Huawei Push Kit or Amazon ADM, you add an adapter — no pipeline changes.

Retry and Backoff#

FCM and APNs can return transient errors (429, 503). Use exponential backoff with jitter. After N retries, move the notification to a dead-letter queue for manual inspection.

Template Engine#

Hardcoding message strings is a maintenance nightmare. A template engine decouples copy from code:

// Template: order_shipped
{
  "push": {
    "title": "Your order is on its way!",
    "body": "{{item_name}} shipped via {{carrier}}. Track: {{tracking_url}}"
  },
  "email": {
    "subject": "Order #{{order_id}} shipped",
    "html_template": "order_shipped.html"
  }
}

Channel Router#

After preference filtering and template rendering, the router fans out to channel-specific delivery services. Each channel service:

Looks up the delivery address (device token, phone number, email).
Calls the external provider (FCM, Twilio, SendGrid).
Records the delivery attempt and provider response.

Channels run in parallel — a single notification event can produce a push, an email, and an SMS simultaneously.

Analytics Pipeline#

Track every notification through its lifecycle:

Event	Timestamp	Source
created	Ingress receives event	Backend
filtered	Skipped due to preferences	Preference service
rendered	Template resolved	Template engine
sent	Handed to provider	Channel service
delivered	Provider confirms delivery	FCM/APNs callback
opened	User taps notification	Client SDK
clicked	User taps CTA in notification	Client SDK / deep link

Throttling and Rate Limiting#

Throttling protects both users and downstream providers:

Per-user throttle: No more than N push notifications per hour (configurable per category). Excess notifications are batched into a digest.
Per-provider throttle: Respect FCM/APNs rate limits. Use a token-bucket rate limiter in front of each provider client.
Global throttle: During mass sends (marketing campaigns), ramp up gradually to avoid provider penalties and monitor bounce rates.

Implement throttling as middleware in the channel router so it applies uniformly.

Scaling Considerations#

Multi-region — For global users, deploy delivery services close to provider endpoints. FCM and APNs have regional endpoints; routing to the nearest one reduces latency.

Database sharding — The device token registry and preference store are sharded by user_id. Analytics data is partitioned by time.

Failure Modes#

Failure	Mitigation
Provider outage (FCM down)	Circuit breaker; queue retries; failover to alternate channel
Token expired	Soft-delete token; suppress future attempts until re-registration
Template missing variable	Render with fallback copy; alert content team
Kafka consumer lag	Auto-scale consumers; alert on lag threshold
Duplicate delivery	Idempotency key at ingress; dedup at channel service

Summary#

Build and explore system designs like this interactively at codelit.io.

This is article #197 in the Codelit engineering blog series.

Try it on Codelit

Chaos Mode

Simulate node failures and watch cascading impact across your architecture

Build this architecture →

Comments

AI search

Build this architecture

Generate an interactive architecture for Push Notification System Design in seconds.

Try it in Codelit →

Push Notification System Design: From Delivery Pipeline to Scale

Functional Requirements#

Non-Functional Requirements#

Notification Types#

High-Level Architecture#

Notification Ingress#

Priority Queue#

User Preferences Service#

Device Token Management#

FCM and APNs Integration#

Retry and Backoff#

Template Engine#

Channel Router#

Analytics Pipeline#

Throttling and Rate Limiting#

Scaling Considerations#

Failure Modes#

Summary#

Comments

Related articles

AI-Powered Search Architecture: Semantic Search, Hybrid Search, and RAG

AI Safety Guardrails Architecture: Input Validation, Output Filtering, and Human-in-the-Loop

API Backward Compatibility: Ship Changes Without Breaking Consumers

Try these templates

Uber Real-Time Location System

OpenAI API Request Pipeline

E-Commerce Checkout System

Build this architecture

Push Notification System Design: From Delivery Pipeline to Scale

Functional Requirements#

Non-Functional Requirements#

Notification Types#

High-Level Architecture#

Notification Ingress#

Priority Queue#

User Preferences Service#

Device Token Management#

FCM and APNs Integration#

Retry and Backoff#

Template Engine#

Channel Router#

Analytics Pipeline#

Throttling and Rate Limiting#

Scaling Considerations#

Failure Modes#

Summary#

Comments

Related articles

AI-Powered Search Architecture: Semantic Search, Hybrid Search, and RAG

AI Safety Guardrails Architecture: Input Validation, Output Filtering, and Human-in-the-Loop

API Backward Compatibility: Ship Changes Without Breaking Consumers

Try these templates

Uber Real-Time Location System

OpenAI API Request Pipeline

E-Commerce Checkout System

Build this architecture