Design a Chat System — The Complete System Design Walkthrough
Why this question keeps coming up#
"Design a chat system" tests everything: real-time communication, data modeling, offline support, scale, and failure handling. If you can design chat, you can design most real-time systems.
Start with requirements#
Functional:
- 1:1 messaging
- Group chats (up to 500 members)
- Online/offline status
- Read receipts
- Message history
- Media sharing (images, files)
Non-functional:
- Low latency (< 200ms message delivery)
- High availability (users expect 99.99%)
- Messages must never be lost
- Support offline message delivery
High-level architecture#
Mobile/Web → WebSocket Gateway → Message Service → Database
↕ ↕
Presence Service Push Notifications
↕ ↕
Redis FCM/APNs
The connection layer#
Why WebSocket? HTTP requires the client to ask "any new messages?" repeatedly (polling). WebSocket keeps a persistent connection — the server pushes messages instantly.
Connection gateway: Each server handles ~100K concurrent WebSocket connections. Users connect to the nearest gateway (geo-routing). If a gateway crashes, clients reconnect to another.
Connection state: Store {userId → gatewayServer} mapping in Redis. When sending a message to User B, look up which gateway they're connected to.
Sending a message (1:1)#
- User A sends message via WebSocket to their gateway
- Gateway forwards to Message Service
- Message Service stores in database
- Message Service looks up User B's gateway in Redis
- If online → push via WebSocket to B's gateway → deliver to B
- If offline → store in offline queue → send push notification
Key insight: Store first, deliver second. The message is safe in the database before we try to deliver it. If delivery fails, it's still there for later.
Group messages#
Small groups (< 100 members): Fan-out on write. When User A sends a message, create a copy for each member's inbox. Simple, fast reads.
Large groups (100-500 members): Fan-out on read. Store the message once. When members open the group, fetch recent messages. Saves storage, slower reads.
Why the split? Fan-out on write for 10 members = 10 writes. For 500 members = 500 writes per message. The cost becomes prohibitive.
Presence (online/offline)#
Heartbeat approach: Client sends a heartbeat every 30 seconds. Server marks user as "online" with a 60-second TTL in Redis. If heartbeat stops → TTL expires → user is "offline."
Don't broadcast presence to everyone. Only notify users who are currently viewing a conversation with the offline user. Otherwise, one person going offline triggers notifications to thousands.
Message storage#
Schema:
messages:
id: UUID
conversation_id: UUID
sender_id: UUID
content: text
type: text|image|file
created_at: timestamp
conversations:
id: UUID
type: 1:1|group
members: [user_ids]
last_message_at: timestamp
Partitioning: Partition by conversation_id. All messages in a conversation live on the same shard → efficient range queries for history.
Database choice: Cassandra or ScyllaDB for write-heavy workloads. PostgreSQL works fine up to millions of messages.
Read receipts#
- User B reads a message
- Client sends "read" event with last-read message ID
- Server updates
read_positionfor User B in that conversation - Server notifies User A (via WebSocket) that B has read up to message X
Optimization: Don't send individual read receipts. Batch: "User B has read up to message #47." One event covers multiple messages.
Offline message delivery#
When User B comes back online:
- Client connects via WebSocket
- Client sends last-known message ID per conversation
- Server returns all messages after that ID
- Messages stream in chronologically
Push notifications: When the user is offline, send a push notification via FCM (Android) or APNs (iOS). Rate-limit to avoid notification spam for active group chats.
Scaling considerations#
- WebSocket servers: Horizontally scale behind a load balancer with sticky sessions
- Message database: Partition by conversation, add read replicas
- Redis: Cluster mode for presence and connection mapping
- Media: Store in S3/GCS, serve via CDN with signed URLs
See the full architecture#
On Codelit, search "WhatsApp" or "Slack" in ⌘K to see complete chat system architectures with all components — WebSocket gateway, message routing, presence, offline queues, and media handling.
Practice this interview question: search "WhatsApp" on Codelit.io and explore every component of the chat architecture.
Try it on Codelit
Chaos Mode
Simulate node failures and watch cascading impact across your architecture
90+ Templates
Practice with real-world architectures — Uber, Netflix, Slack, and more
Related articles
Try these templates
Uber Real-Time Location System
Handles 5M+ GPS pings per second using H3 hexagonal geospatial indexing.
6 componentsReal-Time Collaborative Editor
Notion-like document editor with real-time collaboration, conflict resolution, and rich media.
9 componentsE-Commerce Checkout System
Production checkout flow with Stripe payments, inventory management, and fraud detection.
11 componentsBuild this architecture
Generate an interactive architecture for Design a Chat System in seconds.
Try it in Codelit →
Comments