chat system architecturesystem designWebSocketreal-time messagingdistributed systemsscaling

Chat System Architecture: Building Real-Time Messaging at Scale

March 28, 2026 9 min readBy Codelit Team Discussion

Chat System Architecture#

Chat systems sit at the intersection of the hardest distributed systems problems: real-time delivery, strict message ordering, persistent storage, presence tracking, and encryption — all while serving millions of concurrent connections.

This guide covers the full architecture from connection handling to production scaling.

Core Requirements#

Functional#

1:1 messaging: Send and receive text messages between two users
Group chats: Multi-participant conversations (up to thousands of members)
Read receipts: Notify senders when messages are read
Typing indicators: Show when a user is typing
Media attachments: Images, videos, files
Push notifications: Notify offline users
Message history: Persistent, scrollable chat history
Presence: Online/offline/last-seen status

Non-Functional#

Low latency: Messages delivered in under 200ms for online users
Ordering: Messages appear in the correct order for all participants
Reliability: No message loss, even during server failures
Scalability: Support millions of concurrent connections
Security: End-to-end encryption for private conversations

Real-Time Messaging: WebSocket#

HTTP polling is too slow and wasteful. Long polling is better but still suboptimal. WebSocket is the standard for real-time chat.

Why WebSocket#

Full-duplex: server pushes messages to clients without polling
Persistent connection: no repeated TCP/TLS handshake overhead
Low latency: message delivery in single-digit milliseconds over the wire

Connection Flow#

1. Client opens WebSocket: wss://chat.example.com/ws
2. Server authenticates via token in the handshake
3. Connection is registered in the connection registry
4. Bidirectional message flow begins
5. Heartbeat pings every 30s to detect dead connections

Connection Servers#

Dedicated stateful servers that hold WebSocket connections. These servers do minimal business logic — they route messages between clients and the backend.

Each connection server holds tens of thousands of connections. A fleet of connection servers sits behind a load balancer using sticky sessions (or connection-aware routing).

Message Flow#

Sending a Message (1:1)#

1. Sender's client sends message via WebSocket to Connection Server A
2. Connection Server A publishes message to Message Broker (Kafka/Redis Pub-Sub)
3. Message is written to Message Storage (async)
4. Message Broker routes to Connection Server B (where recipient is connected)
5. Connection Server B pushes message to recipient's WebSocket
6. If recipient is offline → Push Notification Service

Sending a Message (Group Chat)#

Same flow, but the broker fans out the message to all connection servers holding group members. For large groups (1,000+ members), fan-out is batched to avoid thundering herd.

Message Storage#

Database Choice#

Chat messages are write-heavy and read-sequential. The access pattern is almost always: "give me the last N messages in this conversation, ordered by time."

Wide-column stores (Cassandra, ScyllaDB) excel here:

Partition key: conversation_id
Clustering key: message_id (time-ordered)
Sequential reads within a partition are extremely fast
Horizontal scaling is built in

Alternative: PostgreSQL with partitioning by conversation works for smaller-scale systems (under 100M messages).

Message Schema#

conversation_id  (partition key)
message_id       (clustering key, time-based UUID / Snowflake ID)
sender_id
content_type     (text, image, video, file)
content          (text body or media URL)
created_at
status           (sent, delivered, read)

Message ID and Ordering#

Use a time-based distributed ID generator (Snowflake, ULID, or Twitter's Snowflake variant):

Timestamp component: Ensures rough chronological ordering
Machine ID component: Prevents collisions across servers
Sequence component: Orders messages within the same millisecond

This gives you globally unique, roughly time-ordered IDs without coordination between servers.

Read Receipts#

When a user reads messages in a conversation:

Client sends a read_receipt event: {conversation_id, last_read_message_id}
Server updates the read cursor for that user in that conversation
Server notifies the sender(s) via the message broker

Store read cursors per user per conversation:

user_id + conversation_id → last_read_message_id

This is a small, hot dataset — store it in Redis for fast access. Persist to the database asynchronously.

Optimization: Batch read receipt updates. If a user scrolls through 50 unread messages, send one receipt for the last message, not 50 individual receipts.

Typing Indicators#

Typing indicators are ephemeral — they should never touch the database.

1. Client sends "typing" event via WebSocket
2. Connection server publishes to presence channel for the conversation
3. Other participants' connection servers push "user X is typing" to clients
4. Client-side timeout hides the indicator after 3 seconds of inactivity

Use Redis Pub/Sub or a lightweight message bus for typing events. These are fire-and-forget — losing a typing indicator is not a problem.

Group Chat Design#

Small Groups (< 100 members)#

Store member list in the database
Fan-out on write: when a message arrives, push it to all online members immediately
Offline members receive the message when they reconnect (pull from message storage)

Large Groups / Channels (100–100,000 members)#

Fan-out on read: store the message once, each client pulls new messages on connection
Use a timeline or inbox model: each user has a "feed" of conversations with unread counts
Push notifications are batched and throttled to avoid overwhelming users

Media Attachments#

Never route media through your chat servers. Use a dedicated media pipeline.

1. Client uploads file to Object Storage (S3) via pre-signed URL
2. Client receives the media URL
3. Client sends a chat message with content_type: "image" and the media URL
4. Recipients render the media by fetching from CDN-backed object storage

For images, generate thumbnails asynchronously. For videos, trigger transcoding. Store metadata (dimensions, duration, file size) alongside the message.

Push Notifications#

When the recipient is offline (no active WebSocket connection):

1. Message broker detects no active connection for recipient
2. Routes message to Push Notification Service
3. Service formats notification (title, body, badge count)
4. Sends via APNs (iOS) / FCM (Android)

Key considerations:

Deduplication: If a user has multiple devices, deliver to all but avoid duplicate notifications
Muting: Respect per-conversation mute settings
Batching: For group chats, batch notifications ("3 new messages in Design Team")
Token management: Handle expired/invalid device tokens gracefully

Presence (Online/Offline/Last Seen)#

Heartbeat-Based Presence#

1. Client sends heartbeat every 30 seconds
2. Server updates last_seen timestamp in Redis
3. If no heartbeat for 60 seconds → mark user offline
4. Publish presence change to subscribers (contacts/group members)

Scaling Presence#

For millions of users, broadcasting every presence change to every contact is expensive. Optimizations:

Lazy presence: Only fetch presence when a user opens a conversation
Subscribe on demand: Only push presence updates for users currently visible on screen
Batch updates: Aggregate presence changes and push every few seconds instead of instantly

End-to-End Encryption (E2EE)#

The server should never be able to read message content.

Signal Protocol (industry standard)#

Key exchange: Each user generates a public/private key pair. Public keys are stored on the server.
Session setup: When Alice messages Bob, she fetches Bob's public key and establishes a shared session using X3DH (Extended Triple Diffie-Hellman).
Message encryption: Each message is encrypted with a unique message key derived via the Double Ratchet algorithm.
Decryption: Only the recipient's device can decrypt using their private key.

Group E2EE#

Use the Sender Keys protocol: each member generates a sender key, distributes it to the group, and encrypts messages with it. When a member leaves, all sender keys are rotated.

Trade-off: E2EE makes server-side search, moderation, and link previews impossible. The client must handle all of these.

Scaling the Architecture#

Connection Server Tier#

Stateful: each server holds active WebSocket connections
Scale horizontally by adding more connection servers
Use a connection registry (Redis) to map user_id → connection_server_id
Load balancer routes new connections; the registry routes messages

Message Broker Tier#

Kafka for durable, ordered message delivery
Partition by conversation_id to maintain ordering within a conversation
Consumer groups for each connection server to receive relevant messages

Storage Tier#

Cassandra/ScyllaDB partitioned by conversation_id
Read replicas for message history queries
Cold storage (S3 + Parquet) for messages older than 1 year

High-Level Architecture#

              ┌──────────────────────────┐
              │      Load Balancer       │
              └────────────┬─────────────┘
                           │
         ┌─────────────────┼─────────────────┐
         │                 │                 │
   ┌─────▼─────┐    ┌─────▼─────┐    ┌─────▼─────┐
   │ Connection │    │ Connection │    │ Connection │
   │  Server 1  │    │  Server 2  │    │  Server 3  │
   └─────┬─────┘    └─────┬─────┘    └─────┬─────┘
         │                 │                 │
   ┌─────▼─────────────────▼─────────────────▼─────┐
   │              Message Broker (Kafka)            │
   └──────┬───────────────┬───────────────┬────────┘
          │               │               │
   ┌──────▼──────┐ ┌──────▼──────┐ ┌──────▼──────┐
   │  Message DB  │ │   Redis     │ │   Push      │
   │ (Cassandra)  │ │ (Presence,  │ │ Notification│
   │              │ │  Cursors,   │ │  Service    │
   │              │ │  Registry)  │ │             │
   └─────────────┘ └─────────────┘ └─────────────┘

Key Design Decisions Summary#

Component	Choice	Reason
Real-time transport	WebSocket	Full-duplex, low latency, persistent
Message broker	Kafka	Durable, ordered, horizontally scalable
Message storage	Cassandra	Write-heavy, time-ordered, partition-friendly
Presence / cursors	Redis	Hot data, sub-millisecond access
Media storage	S3 + CDN	Offload from chat servers, global delivery
Push notifications	APNs + FCM	Platform-native, reliable delivery
Encryption	Signal Protocol	Industry-proven E2EE
Message IDs	Snowflake / ULID	Time-ordered, distributed, no coordination

What Interviewers Are Testing#

The chat system question tests whether you can:

Choose the right transport and explain why WebSocket beats polling
Design for ordering in a distributed environment without global locks
Separate hot and cold paths (presence in Redis, messages in Cassandra, media in S3)
Handle fan-out differently for small groups vs large channels
Think about security — E2EE is not optional in modern chat systems

The complexity is in the operational details: connection management, graceful failover, and keeping latency low as you scale.

Design, visualize, and share system architecture diagrams instantly on codelit.io.

This is article #180 in the Codelit engineering blog series.

Try it on Codelit

Chaos Mode

Simulate node failures and watch cascading impact across your architecture

Build this architecture →

Comments

AI search

AI-Powered Search Architecture: Semantic Search, Hybrid Search, and RAG

8 min read

AI safety

AI Safety Guardrails Architecture: Input Validation, Output Filtering, and Human-in-the-Loop

8 min read

API design

API Backward Compatibility: Ship Changes Without Breaking Consumers

6 min read

Try these templates

Uber Real-Time Location System

Handles 5M+ GPS pings per second using H3 hexagonal geospatial indexing.

6 components

Real-Time Collaborative Editor

Notion-like document editor with real-time collaboration, conflict resolution, and rich media.

9 components

E-Commerce Checkout System

Production checkout flow with Stripe payments, inventory management, and fraud detection.

11 components

Build this architecture

Generate an interactive Chat System Architecture in seconds.

Try it in Codelit →

chat system architecturesystem designWebSocketreal-time messagingdistributed systemsscaling

Chat System Architecture: Building Real-Time Messaging at Scale

March 28, 2026 9 min readBy Codelit Team Discussion

Chat System Architecture#

This guide covers the full architecture from connection handling to production scaling.

Core Requirements#

Functional#

1:1 messaging: Send and receive text messages between two users
Group chats: Multi-participant conversations (up to thousands of members)
Read receipts: Notify senders when messages are read
Typing indicators: Show when a user is typing
Media attachments: Images, videos, files
Push notifications: Notify offline users
Message history: Persistent, scrollable chat history
Presence: Online/offline/last-seen status

Non-Functional#

Low latency: Messages delivered in under 200ms for online users
Ordering: Messages appear in the correct order for all participants
Reliability: No message loss, even during server failures
Scalability: Support millions of concurrent connections
Security: End-to-end encryption for private conversations

Real-Time Messaging: WebSocket#

HTTP polling is too slow and wasteful. Long polling is better but still suboptimal. WebSocket is the standard for real-time chat.

Why WebSocket#

Full-duplex: server pushes messages to clients without polling
Persistent connection: no repeated TCP/TLS handshake overhead
Low latency: message delivery in single-digit milliseconds over the wire

Connection Flow#

1. Client opens WebSocket: wss://chat.example.com/ws
2. Server authenticates via token in the handshake
3. Connection is registered in the connection registry
4. Bidirectional message flow begins
5. Heartbeat pings every 30s to detect dead connections

Connection Servers#

Dedicated stateful servers that hold WebSocket connections. These servers do minimal business logic — they route messages between clients and the backend.

Each connection server holds tens of thousands of connections. A fleet of connection servers sits behind a load balancer using sticky sessions (or connection-aware routing).

Message Flow#

Sending a Message (1:1)#

1. Sender's client sends message via WebSocket to Connection Server A
2. Connection Server A publishes message to Message Broker (Kafka/Redis Pub-Sub)
3. Message is written to Message Storage (async)
4. Message Broker routes to Connection Server B (where recipient is connected)
5. Connection Server B pushes message to recipient's WebSocket
6. If recipient is offline → Push Notification Service

Sending a Message (Group Chat)#

Same flow, but the broker fans out the message to all connection servers holding group members. For large groups (1,000+ members), fan-out is batched to avoid thundering herd.

Message Storage#

Database Choice#

Chat messages are write-heavy and read-sequential. The access pattern is almost always: "give me the last N messages in this conversation, ordered by time."

Wide-column stores (Cassandra, ScyllaDB) excel here:

Partition key: conversation_id
Clustering key: message_id (time-ordered)
Sequential reads within a partition are extremely fast
Horizontal scaling is built in

Alternative: PostgreSQL with partitioning by conversation works for smaller-scale systems (under 100M messages).

Message Schema#

conversation_id  (partition key)
message_id       (clustering key, time-based UUID / Snowflake ID)
sender_id
content_type     (text, image, video, file)
content          (text body or media URL)
created_at
status           (sent, delivered, read)

Message ID and Ordering#

Use a time-based distributed ID generator (Snowflake, ULID, or Twitter's Snowflake variant):

Timestamp component: Ensures rough chronological ordering
Machine ID component: Prevents collisions across servers
Sequence component: Orders messages within the same millisecond

This gives you globally unique, roughly time-ordered IDs without coordination between servers.

Read Receipts#

When a user reads messages in a conversation:

Client sends a read_receipt event: {conversation_id, last_read_message_id}
Server updates the read cursor for that user in that conversation
Server notifies the sender(s) via the message broker

Store read cursors per user per conversation:

user_id + conversation_id → last_read_message_id

This is a small, hot dataset — store it in Redis for fast access. Persist to the database asynchronously.

Optimization: Batch read receipt updates. If a user scrolls through 50 unread messages, send one receipt for the last message, not 50 individual receipts.

Typing Indicators#

Typing indicators are ephemeral — they should never touch the database.

1. Client sends "typing" event via WebSocket
2. Connection server publishes to presence channel for the conversation
3. Other participants' connection servers push "user X is typing" to clients
4. Client-side timeout hides the indicator after 3 seconds of inactivity

Use Redis Pub/Sub or a lightweight message bus for typing events. These are fire-and-forget — losing a typing indicator is not a problem.

Group Chat Design#

Small Groups (< 100 members)#

Store member list in the database
Fan-out on write: when a message arrives, push it to all online members immediately
Offline members receive the message when they reconnect (pull from message storage)

Large Groups / Channels (100–100,000 members)#

Fan-out on read: store the message once, each client pulls new messages on connection
Use a timeline or inbox model: each user has a "feed" of conversations with unread counts
Push notifications are batched and throttled to avoid overwhelming users

Media Attachments#

Never route media through your chat servers. Use a dedicated media pipeline.

1. Client uploads file to Object Storage (S3) via pre-signed URL
2. Client receives the media URL
3. Client sends a chat message with content_type: "image" and the media URL
4. Recipients render the media by fetching from CDN-backed object storage

For images, generate thumbnails asynchronously. For videos, trigger transcoding. Store metadata (dimensions, duration, file size) alongside the message.

Push Notifications#

When the recipient is offline (no active WebSocket connection):

1. Message broker detects no active connection for recipient
2. Routes message to Push Notification Service
3. Service formats notification (title, body, badge count)
4. Sends via APNs (iOS) / FCM (Android)

Key considerations:

Deduplication: If a user has multiple devices, deliver to all but avoid duplicate notifications
Muting: Respect per-conversation mute settings
Batching: For group chats, batch notifications ("3 new messages in Design Team")
Token management: Handle expired/invalid device tokens gracefully

Presence (Online/Offline/Last Seen)#

Heartbeat-Based Presence#

1. Client sends heartbeat every 30 seconds
2. Server updates last_seen timestamp in Redis
3. If no heartbeat for 60 seconds → mark user offline
4. Publish presence change to subscribers (contacts/group members)

Scaling Presence#

For millions of users, broadcasting every presence change to every contact is expensive. Optimizations:

Lazy presence: Only fetch presence when a user opens a conversation
Subscribe on demand: Only push presence updates for users currently visible on screen
Batch updates: Aggregate presence changes and push every few seconds instead of instantly

End-to-End Encryption (E2EE)#

The server should never be able to read message content.

Signal Protocol (industry standard)#

Key exchange: Each user generates a public/private key pair. Public keys are stored on the server.
Session setup: When Alice messages Bob, she fetches Bob's public key and establishes a shared session using X3DH (Extended Triple Diffie-Hellman).
Message encryption: Each message is encrypted with a unique message key derived via the Double Ratchet algorithm.
Decryption: Only the recipient's device can decrypt using their private key.

Group E2EE#

Use the Sender Keys protocol: each member generates a sender key, distributes it to the group, and encrypts messages with it. When a member leaves, all sender keys are rotated.

Trade-off: E2EE makes server-side search, moderation, and link previews impossible. The client must handle all of these.

Scaling the Architecture#

Connection Server Tier#

Stateful: each server holds active WebSocket connections
Scale horizontally by adding more connection servers
Use a connection registry (Redis) to map user_id → connection_server_id
Load balancer routes new connections; the registry routes messages

Message Broker Tier#

Kafka for durable, ordered message delivery
Partition by conversation_id to maintain ordering within a conversation
Consumer groups for each connection server to receive relevant messages

Storage Tier#

Cassandra/ScyllaDB partitioned by conversation_id
Read replicas for message history queries
Cold storage (S3 + Parquet) for messages older than 1 year

High-Level Architecture#

              ┌──────────────────────────┐
              │      Load Balancer       │
              └────────────┬─────────────┘
                           │
         ┌─────────────────┼─────────────────┐
         │                 │                 │
   ┌─────▼─────┐    ┌─────▼─────┐    ┌─────▼─────┐
   │ Connection │    │ Connection │    │ Connection │
   │  Server 1  │    │  Server 2  │    │  Server 3  │
   └─────┬─────┘    └─────┬─────┘    └─────┬─────┘
         │                 │                 │
   ┌─────▼─────────────────▼─────────────────▼─────┐
   │              Message Broker (Kafka)            │
   └──────┬───────────────┬───────────────┬────────┘
          │               │               │
   ┌──────▼──────┐ ┌──────▼──────┐ ┌──────▼──────┐
   │  Message DB  │ │   Redis     │ │   Push      │
   │ (Cassandra)  │ │ (Presence,  │ │ Notification│
   │              │ │  Cursors,   │ │  Service    │
   │              │ │  Registry)  │ │             │
   └─────────────┘ └─────────────┘ └─────────────┘

Key Design Decisions Summary#

Component	Choice	Reason
Real-time transport	WebSocket	Full-duplex, low latency, persistent
Message broker	Kafka	Durable, ordered, horizontally scalable
Message storage	Cassandra	Write-heavy, time-ordered, partition-friendly
Presence / cursors	Redis	Hot data, sub-millisecond access
Media storage	S3 + CDN	Offload from chat servers, global delivery
Push notifications	APNs + FCM	Platform-native, reliable delivery
Encryption	Signal Protocol	Industry-proven E2EE
Message IDs	Snowflake / ULID	Time-ordered, distributed, no coordination

What Interviewers Are Testing#

The chat system question tests whether you can:

Choose the right transport and explain why WebSocket beats polling
Design for ordering in a distributed environment without global locks
Separate hot and cold paths (presence in Redis, messages in Cassandra, media in S3)
Handle fan-out differently for small groups vs large channels
Think about security — E2EE is not optional in modern chat systems

The complexity is in the operational details: connection management, graceful failover, and keeping latency low as you scale.

Design, visualize, and share system architecture diagrams instantly on codelit.io.

This is article #180 in the Codelit engineering blog series.

Try it on Codelit

Chaos Mode

Simulate node failures and watch cascading impact across your architecture

Build this architecture →

Comments

AI search

Build this architecture

Generate an interactive Chat System Architecture in seconds.

Try it in Codelit →

Chat System Architecture: Building Real-Time Messaging at Scale

Chat System Architecture#

Core Requirements#

Functional#

Non-Functional#

Real-Time Messaging: WebSocket#

Why WebSocket#

Connection Flow#

Connection Servers#

Message Flow#

Sending a Message (1:1)#

Sending a Message (Group Chat)#

Message Storage#

Database Choice#

Message Schema#

Message ID and Ordering#

Read Receipts#

Typing Indicators#

Group Chat Design#

Small Groups (< 100 members)#

Large Groups / Channels (100–100,000 members)#

Media Attachments#

Push Notifications#

Presence (Online/Offline/Last Seen)#

Heartbeat-Based Presence#

Scaling Presence#

End-to-End Encryption (E2EE)#

Signal Protocol (industry standard)#

Group E2EE#

Scaling the Architecture#

Connection Server Tier#

Message Broker Tier#

Storage Tier#

High-Level Architecture#

Key Design Decisions Summary#

What Interviewers Are Testing#

Comments

Related articles

AI-Powered Search Architecture: Semantic Search, Hybrid Search, and RAG

AI Safety Guardrails Architecture: Input Validation, Output Filtering, and Human-in-the-Loop

API Backward Compatibility: Ship Changes Without Breaking Consumers

Try these templates

Uber Real-Time Location System

Real-Time Collaborative Editor

E-Commerce Checkout System

Build this architecture

Chat System Architecture: Building Real-Time Messaging at Scale

Chat System Architecture#

Core Requirements#

Functional#

Non-Functional#

Real-Time Messaging: WebSocket#

Why WebSocket#

Connection Flow#

Connection Servers#

Message Flow#

Sending a Message (1:1)#

Sending a Message (Group Chat)#

Message Storage#

Database Choice#

Message Schema#

Message ID and Ordering#

Read Receipts#

Typing Indicators#

Group Chat Design#

Small Groups (< 100 members)#

Large Groups / Channels (100–100,000 members)#

Media Attachments#

Push Notifications#

Presence (Online/Offline/Last Seen)#

Heartbeat-Based Presence#

Scaling Presence#

End-to-End Encryption (E2EE)#

Signal Protocol (industry standard)#

Group E2EE#

Scaling the Architecture#

Connection Server Tier#

Message Broker Tier#

Storage Tier#

High-Level Architecture#