Chat System Architecture: Building Real-Time Messaging at Scale
Chat System Architecture#
Chat systems sit at the intersection of the hardest distributed systems problems: real-time delivery, strict message ordering, persistent storage, presence tracking, and encryption — all while serving millions of concurrent connections.
This guide covers the full architecture from connection handling to production scaling.
Core Requirements#
Functional#
- 1:1 messaging: Send and receive text messages between two users
- Group chats: Multi-participant conversations (up to thousands of members)
- Read receipts: Notify senders when messages are read
- Typing indicators: Show when a user is typing
- Media attachments: Images, videos, files
- Push notifications: Notify offline users
- Message history: Persistent, scrollable chat history
- Presence: Online/offline/last-seen status
Non-Functional#
- Low latency: Messages delivered in under 200ms for online users
- Ordering: Messages appear in the correct order for all participants
- Reliability: No message loss, even during server failures
- Scalability: Support millions of concurrent connections
- Security: End-to-end encryption for private conversations
Real-Time Messaging: WebSocket#
HTTP polling is too slow and wasteful. Long polling is better but still suboptimal. WebSocket is the standard for real-time chat.
Why WebSocket#
- Full-duplex: server pushes messages to clients without polling
- Persistent connection: no repeated TCP/TLS handshake overhead
- Low latency: message delivery in single-digit milliseconds over the wire
Connection Flow#
1. Client opens WebSocket: wss://chat.example.com/ws
2. Server authenticates via token in the handshake
3. Connection is registered in the connection registry
4. Bidirectional message flow begins
5. Heartbeat pings every 30s to detect dead connections
Connection Servers#
Dedicated stateful servers that hold WebSocket connections. These servers do minimal business logic — they route messages between clients and the backend.
Each connection server holds tens of thousands of connections. A fleet of connection servers sits behind a load balancer using sticky sessions (or connection-aware routing).
Message Flow#
Sending a Message (1:1)#
1. Sender's client sends message via WebSocket to Connection Server A
2. Connection Server A publishes message to Message Broker (Kafka/Redis Pub-Sub)
3. Message is written to Message Storage (async)
4. Message Broker routes to Connection Server B (where recipient is connected)
5. Connection Server B pushes message to recipient's WebSocket
6. If recipient is offline → Push Notification Service
Sending a Message (Group Chat)#
Same flow, but the broker fans out the message to all connection servers holding group members. For large groups (1,000+ members), fan-out is batched to avoid thundering herd.
Message Storage#
Database Choice#
Chat messages are write-heavy and read-sequential. The access pattern is almost always: "give me the last N messages in this conversation, ordered by time."
Wide-column stores (Cassandra, ScyllaDB) excel here:
- Partition key:
conversation_id - Clustering key:
message_id(time-ordered) - Sequential reads within a partition are extremely fast
- Horizontal scaling is built in
Alternative: PostgreSQL with partitioning by conversation works for smaller-scale systems (under 100M messages).
Message Schema#
conversation_id (partition key)
message_id (clustering key, time-based UUID / Snowflake ID)
sender_id
content_type (text, image, video, file)
content (text body or media URL)
created_at
status (sent, delivered, read)
Message ID and Ordering#
Use a time-based distributed ID generator (Snowflake, ULID, or Twitter's Snowflake variant):
- Timestamp component: Ensures rough chronological ordering
- Machine ID component: Prevents collisions across servers
- Sequence component: Orders messages within the same millisecond
This gives you globally unique, roughly time-ordered IDs without coordination between servers.
Read Receipts#
When a user reads messages in a conversation:
- Client sends a
read_receiptevent:{conversation_id, last_read_message_id} - Server updates the read cursor for that user in that conversation
- Server notifies the sender(s) via the message broker
Store read cursors per user per conversation:
user_id + conversation_id → last_read_message_id
This is a small, hot dataset — store it in Redis for fast access. Persist to the database asynchronously.
Optimization: Batch read receipt updates. If a user scrolls through 50 unread messages, send one receipt for the last message, not 50 individual receipts.
Typing Indicators#
Typing indicators are ephemeral — they should never touch the database.
1. Client sends "typing" event via WebSocket
2. Connection server publishes to presence channel for the conversation
3. Other participants' connection servers push "user X is typing" to clients
4. Client-side timeout hides the indicator after 3 seconds of inactivity
Use Redis Pub/Sub or a lightweight message bus for typing events. These are fire-and-forget — losing a typing indicator is not a problem.
Group Chat Design#
Small Groups (< 100 members)#
- Store member list in the database
- Fan-out on write: when a message arrives, push it to all online members immediately
- Offline members receive the message when they reconnect (pull from message storage)
Large Groups / Channels (100–100,000 members)#
- Fan-out on read: store the message once, each client pulls new messages on connection
- Use a timeline or inbox model: each user has a "feed" of conversations with unread counts
- Push notifications are batched and throttled to avoid overwhelming users
Media Attachments#
Never route media through your chat servers. Use a dedicated media pipeline.
1. Client uploads file to Object Storage (S3) via pre-signed URL
2. Client receives the media URL
3. Client sends a chat message with content_type: "image" and the media URL
4. Recipients render the media by fetching from CDN-backed object storage
For images, generate thumbnails asynchronously. For videos, trigger transcoding. Store metadata (dimensions, duration, file size) alongside the message.
Push Notifications#
When the recipient is offline (no active WebSocket connection):
1. Message broker detects no active connection for recipient
2. Routes message to Push Notification Service
3. Service formats notification (title, body, badge count)
4. Sends via APNs (iOS) / FCM (Android)
Key considerations:
- Deduplication: If a user has multiple devices, deliver to all but avoid duplicate notifications
- Muting: Respect per-conversation mute settings
- Batching: For group chats, batch notifications ("3 new messages in Design Team")
- Token management: Handle expired/invalid device tokens gracefully
Presence (Online/Offline/Last Seen)#
Heartbeat-Based Presence#
1. Client sends heartbeat every 30 seconds
2. Server updates last_seen timestamp in Redis
3. If no heartbeat for 60 seconds → mark user offline
4. Publish presence change to subscribers (contacts/group members)
Scaling Presence#
For millions of users, broadcasting every presence change to every contact is expensive. Optimizations:
- Lazy presence: Only fetch presence when a user opens a conversation
- Subscribe on demand: Only push presence updates for users currently visible on screen
- Batch updates: Aggregate presence changes and push every few seconds instead of instantly
End-to-End Encryption (E2EE)#
The server should never be able to read message content.
Signal Protocol (industry standard)#
- Key exchange: Each user generates a public/private key pair. Public keys are stored on the server.
- Session setup: When Alice messages Bob, she fetches Bob's public key and establishes a shared session using X3DH (Extended Triple Diffie-Hellman).
- Message encryption: Each message is encrypted with a unique message key derived via the Double Ratchet algorithm.
- Decryption: Only the recipient's device can decrypt using their private key.
Group E2EE#
Use the Sender Keys protocol: each member generates a sender key, distributes it to the group, and encrypts messages with it. When a member leaves, all sender keys are rotated.
Trade-off: E2EE makes server-side search, moderation, and link previews impossible. The client must handle all of these.
Scaling the Architecture#
Connection Server Tier#
- Stateful: each server holds active WebSocket connections
- Scale horizontally by adding more connection servers
- Use a connection registry (Redis) to map
user_id → connection_server_id - Load balancer routes new connections; the registry routes messages
Message Broker Tier#
- Kafka for durable, ordered message delivery
- Partition by
conversation_idto maintain ordering within a conversation - Consumer groups for each connection server to receive relevant messages
Storage Tier#
- Cassandra/ScyllaDB partitioned by
conversation_id - Read replicas for message history queries
- Cold storage (S3 + Parquet) for messages older than 1 year
High-Level Architecture#
┌──────────────────────────┐
│ Load Balancer │
└────────────┬─────────────┘
│
┌─────────────────┼─────────────────┐
│ │ │
┌─────▼─────┐ ┌─────▼─────┐ ┌─────▼─────┐
│ Connection │ │ Connection │ │ Connection │
│ Server 1 │ │ Server 2 │ │ Server 3 │
└─────┬─────┘ └─────┬─────┘ └─────┬─────┘
│ │ │
┌─────▼─────────────────▼─────────────────▼─────┐
│ Message Broker (Kafka) │
└──────┬───────────────┬───────────────┬────────┘
│ │ │
┌──────▼──────┐ ┌──────▼──────┐ ┌──────▼──────┐
│ Message DB │ │ Redis │ │ Push │
│ (Cassandra) │ │ (Presence, │ │ Notification│
│ │ │ Cursors, │ │ Service │
│ │ │ Registry) │ │ │
└─────────────┘ └─────────────┘ └─────────────┘
Key Design Decisions Summary#
| Component | Choice | Reason |
|---|---|---|
| Real-time transport | WebSocket | Full-duplex, low latency, persistent |
| Message broker | Kafka | Durable, ordered, horizontally scalable |
| Message storage | Cassandra | Write-heavy, time-ordered, partition-friendly |
| Presence / cursors | Redis | Hot data, sub-millisecond access |
| Media storage | S3 + CDN | Offload from chat servers, global delivery |
| Push notifications | APNs + FCM | Platform-native, reliable delivery |
| Encryption | Signal Protocol | Industry-proven E2EE |
| Message IDs | Snowflake / ULID | Time-ordered, distributed, no coordination |
What Interviewers Are Testing#
The chat system question tests whether you can:
- Choose the right transport and explain why WebSocket beats polling
- Design for ordering in a distributed environment without global locks
- Separate hot and cold paths (presence in Redis, messages in Cassandra, media in S3)
- Handle fan-out differently for small groups vs large channels
- Think about security — E2EE is not optional in modern chat systems
The complexity is in the operational details: connection management, graceful failover, and keeping latency low as you scale.
Design, visualize, and share system architecture diagrams instantly on codelit.io.
This is article #180 in the Codelit engineering blog series.
Try it on Codelit
Chaos Mode
Simulate node failures and watch cascading impact across your architecture
Related articles
Try these templates
Uber Real-Time Location System
Handles 5M+ GPS pings per second using H3 hexagonal geospatial indexing.
6 componentsReal-Time Collaborative Editor
Notion-like document editor with real-time collaboration, conflict resolution, and rich media.
9 componentsE-Commerce Checkout System
Production checkout flow with Stripe payments, inventory management, and fraud detection.
11 componentsBuild this architecture
Generate an interactive Chat System Architecture in seconds.
Try it in Codelit →
Comments