Push Notification System Design: From Delivery Pipeline to Scale
Every modern application needs to reach users across channels — push notifications, SMS, and email. A well-designed notification system balances reliability, latency, and user respect (nobody wants to be spammed). This guide walks through the architecture of a production-grade push notification system from first principles.
Functional Requirements#
- Support multiple notification channels: mobile push (iOS/Android), web push, SMS, and email.
- Allow users to manage notification preferences per channel and per category.
- Provide a template engine for composable, localized messages.
- Track delivery status and engagement analytics (delivered, opened, clicked).
- Support scheduled and triggered notifications.
Non-Functional Requirements#
- Latency: Transactional notifications (OTP, order confirmation) delivered within 2 seconds.
- Throughput: Handle 100K+ notifications per minute during peak events.
- Reliability: At-least-once delivery with idempotency to prevent duplicates.
- Extensibility: Adding a new channel should not require rewriting the pipeline.
Notification Types#
| Type | Channel | Latency Target | Example |
|---|---|---|---|
| Transactional | Push, SMS, Email | < 2 s | Password reset, OTP |
| Engagement | Push, Email | < 30 s | "Your friend posted a photo" |
| Marketing | Email, Push | Minutes | Weekly digest, promotions |
| System | Push, In-App | < 5 s | Maintenance window, outage |
Separating types matters because each has different priority, throttling, and opt-out rules.
High-Level Architecture#
┌──────────┐ ┌──────────────┐ ┌──────────────┐
│ Services │────▶│ Notification │────▶│ Priority │
│ (triggers)│ │ Ingress │ │ Queue │
└──────────┘ └──────────────┘ └──────┬───────┘
│
┌──────────────────────┤
▼ ▼
┌─────────────┐ ┌──────────────┐
│ Preference │ │ Template │
│ Service │ │ Engine │
└──────┬──────┘ └──────┬───────┘
│ │
▼ ▼
┌─────────────────────────────────┐
│ Channel Router │
└──┬──────┬──────┬──────┬────────┘
▼ ▼ ▼ ▼
Push SMS Email WebPush
Notification Ingress#
Every service publishes a notification event to a message broker (Kafka or SQS). The event includes: user_id, notification_type, template_id, payload, and optional scheduled_at. The ingress validates the schema, deduplicates by idempotency key, and enqueues.
Priority Queue#
Not all notifications are equal. A Kafka topic with multiple partitions can be split by priority tier:
- P0 — Critical: OTP, security alerts. Dedicated consumer group, no throttling.
- P1 — High: Order updates, friend requests. Standard consumer group.
- P2 — Low: Marketing, digests. Batch-processed, rate-limited.
Consumers pull from higher-priority partitions first using weighted consumption.
User Preferences Service#
Users control what they receive and how. The preference model stores per-user, per-category, per-channel settings:
{
"user_id": "u_abc123",
"preferences": {
"order_updates": { "push": true, "email": true, "sms": false },
"marketing": { "push": false, "email": true, "sms": false },
"security": { "push": true, "email": true, "sms": true }
},
"quiet_hours": { "start": "22:00", "end": "07:00", "timezone": "America/New_York" }
}
The preference service is consulted before rendering or routing. If a user has opted out of push for marketing, the pipeline skips that channel entirely. Quiet hours defer non-critical notifications to a scheduled retry.
Device Token Management#
Mobile push requires device tokens registered with FCM (Firebase Cloud Messaging) for Android and APNs (Apple Push Notification service) for iOS.
Token lifecycle:
- Registration — The client obtains a token from the OS and sends it to the backend.
- Storage — Tokens are stored in a device registry keyed by
(user_id, device_id). A user may have multiple devices. - Refresh — Tokens expire or rotate. The client re-registers on app launch; the backend upserts.
- Invalidation — When FCM/APNs returns a "not registered" error, the token is soft-deleted to avoid wasting delivery attempts.
Store tokens in a low-latency database (DynamoDB or Redis-backed Postgres) since every push delivery requires a token lookup.
FCM and APNs Integration#
Both services expose HTTP/2 APIs. Key differences:
| Aspect | FCM | APNs |
|---|---|---|
| Auth | OAuth 2.0 service account | JWT or certificate |
| Payload limit | 4 KB | 4 KB |
| Topic/group send | Yes (topics, conditions) | Yes (topic-based) |
| Feedback | Immediate HTTP response | Immediate HTTP/2 response |
Wrap both behind a push provider abstraction so the channel router calls a unified interface. If you later add Huawei Push Kit or Amazon ADM, you add an adapter — no pipeline changes.
Retry and Backoff#
FCM and APNs can return transient errors (429, 503). Use exponential backoff with jitter. After N retries, move the notification to a dead-letter queue for manual inspection.
Template Engine#
Hardcoding message strings is a maintenance nightmare. A template engine decouples copy from code:
// Template: order_shipped
{
"push": {
"title": "Your order is on its way!",
"body": "{{item_name}} shipped via {{carrier}}. Track: {{tracking_url}}"
},
"email": {
"subject": "Order #{{order_id}} shipped",
"html_template": "order_shipped.html"
}
}
Templates support localization (keyed by locale), A/B variants (for engagement experiments), and rich media (images, action buttons). The engine resolves variables from the event payload and returns channel-specific rendered content.
Channel Router#
After preference filtering and template rendering, the router fans out to channel-specific delivery services. Each channel service:
- Looks up the delivery address (device token, phone number, email).
- Calls the external provider (FCM, Twilio, SendGrid).
- Records the delivery attempt and provider response.
Channels run in parallel — a single notification event can produce a push, an email, and an SMS simultaneously.
Analytics Pipeline#
Track every notification through its lifecycle:
| Event | Timestamp | Source |
|---|---|---|
| created | Ingress receives event | Backend |
| filtered | Skipped due to preferences | Preference service |
| rendered | Template resolved | Template engine |
| sent | Handed to provider | Channel service |
| delivered | Provider confirms delivery | FCM/APNs callback |
| opened | User taps notification | Client SDK |
| clicked | User taps CTA in notification | Client SDK / deep link |
Store events in a time-series database (ClickHouse, TimescaleDB) and compute metrics: delivery rate, open rate, click-through rate, opt-out rate. These metrics feed back into the template engine for A/B test decisions and into the throttling system.
Throttling and Rate Limiting#
Throttling protects both users and downstream providers:
- Per-user throttle: No more than N push notifications per hour (configurable per category). Excess notifications are batched into a digest.
- Per-provider throttle: Respect FCM/APNs rate limits. Use a token-bucket rate limiter in front of each provider client.
- Global throttle: During mass sends (marketing campaigns), ramp up gradually to avoid provider penalties and monitor bounce rates.
Implement throttling as middleware in the channel router so it applies uniformly.
Scaling Considerations#
Horizontal scaling — Each stage of the pipeline (ingress, preference lookup, rendering, routing, delivery) is a stateless service behind an auto-scaling group. Kafka partitions provide natural parallelism.
Multi-region — For global users, deploy delivery services close to provider endpoints. FCM and APNs have regional endpoints; routing to the nearest one reduces latency.
Database sharding — The device token registry and preference store are sharded by user_id. Analytics data is partitioned by time.
Observability — Emit structured logs, distributed traces (OpenTelemetry), and metrics (Prometheus) at every stage. Alert on delivery rate drops, provider error spikes, and queue depth anomalies.
Failure Modes#
| Failure | Mitigation |
|---|---|
| Provider outage (FCM down) | Circuit breaker; queue retries; failover to alternate channel |
| Token expired | Soft-delete token; suppress future attempts until re-registration |
| Template missing variable | Render with fallback copy; alert content team |
| Kafka consumer lag | Auto-scale consumers; alert on lag threshold |
| Duplicate delivery | Idempotency key at ingress; dedup at channel service |
Summary#
A production push notification system is a multi-stage pipeline: ingest, prioritize, filter by preferences, render templates, route to channels, deliver, and measure. The keys to doing it well are separating concerns at each stage, abstracting providers behind clean interfaces, and respecting user preferences at every step.
Build and explore system designs like this interactively at codelit.io.
This is article #197 in the Codelit engineering blog series.
Try it on Codelit
Chaos Mode
Simulate node failures and watch cascading impact across your architecture
Related articles
Try these templates
Uber Real-Time Location System
Handles 5M+ GPS pings per second using H3 hexagonal geospatial indexing.
6 componentsOpenAI API Request Pipeline
7-stage pipeline from API call to token generation, handling millions of requests per minute.
8 componentsE-Commerce Checkout System
Production checkout flow with Stripe payments, inventory management, and fraud detection.
11 componentsBuild this architecture
Generate an interactive architecture for Push Notification System Design in seconds.
Try it in Codelit →
Comments