WebSocket Scaling: Sticky Sessions, Redis Pub/Sub, and Horizontal Scale
WebSocket Scaling: From One Server to Thousands#
A single server handles WebSocket connections fine. Add a second server and everything breaks. Client A connects to Server 1, Client B connects to Server 2. Server 1 broadcasts a message — Client B never receives it.
Scaling WebSockets requires solving connection affinity, cross-server messaging, and graceful failure handling.
Why WebSockets are harder to scale than HTTP#
HTTP is stateless. Any server can handle any request. Load balancers round-robin freely.
WebSockets are stateful. A persistent TCP connection ties a client to a specific server. That connection holds in-memory state (subscriptions, authentication context, pending messages). You cannot just route the next message to a different server.
The core problems:
- Connection affinity — clients must reach the same server for the duration of the session
- Cross-server messaging — a message on Server 1 must reach clients on Server 2
- Connection limits — each server has a finite number of sockets it can hold
- Failure recovery — when a server dies, thousands of connections drop simultaneously
Sticky sessions#
Sticky sessions (session affinity) ensure a client always routes to the same backend server.
How it works#
The load balancer inspects the initial HTTP upgrade request and pins the client to a server. Subsequent frames on that connection naturally stay on the same TCP stream.
Common sticky strategies:
1. Cookie-based:
Load balancer sets a cookie (e.g., SERVERID=backend-2)
Client sends cookie on reconnect → routed to same server
2. IP hash:
hash(client_ip) % num_servers = target server
Simple but breaks with NAT / shared IPs
3. Connection ID:
Application generates a session token
Load balancer uses token to route
Most reliable for WebSocket reconnection
Configuration with Nginx#
upstream websocket_servers {
ip_hash;
server backend1:8080;
server backend2:8080;
server backend3:8080;
}
server {
location /ws {
proxy_pass http://websocket_servers;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
proxy_read_timeout 86400s;
}
}
Limitation: Sticky sessions solve routing but not cross-server messaging. If User A (Server 1) sends a chat message to User B (Server 2), Server 1 has no way to push it to Server 2's clients.
Redis pub/sub for fan-out#
Redis pub/sub is the standard solution for cross-server WebSocket messaging.
Architecture#
Client A → Server 1 → publishes to Redis channel "chat:room-42"
↓
Redis fans out to all subscribers
↓
Server 1 (subscribed) → pushes to local clients
Server 2 (subscribed) → pushes to local clients
Server 3 (subscribed) → pushes to local clients
Every WebSocket server subscribes to the Redis channels it needs. When a message arrives on any server, it publishes to Redis. Redis delivers to all subscribers. Each server then pushes to its locally connected clients.
Implementation pattern#
// Server-side (Node.js with Redis)
const redisSub = createClient() // subscriber connection
const redisPub = createClient() // publisher connection
// Subscribe to channels for rooms this server has clients in
redisSub.subscribe("chat:room-42", (message) => {
// Fan out to all local WebSocket clients in room-42
const room = localRooms.get("room-42")
room.forEach(ws => ws.send(message))
})
// When a client sends a message
ws.on("message", (data) => {
redisPub.publish("chat:room-42", data)
})
Important: Use separate Redis connections for subscribe and publish. A connection in subscribe mode cannot run other commands.
Redis pub/sub limitations#
- No persistence — if a server is down when a message publishes, it misses that message
- No replay — new subscribers do not receive historical messages
- Fan-out cost — every message goes to every server, even if that server has zero clients in the channel
For persistence, use Redis Streams instead of pub/sub. Streams retain messages and support consumer groups.
Horizontal scaling strategies#
Scaling the WebSocket tier#
┌──────────────┐
│ Load Balancer│
│(sticky sessions)
└──────┬───────┘
┌───────────┼───────────┐
▼ ▼ ▼
┌─────────┐ ┌─────────┐ ┌─────────┐
│ WS Srv 1│ │ WS Srv 2│ │ WS Srv 3│
│ 50K conn│ │ 50K conn│ │ 50K conn│
└────┬────┘ └────┬────┘ └────┬────┘
└───────────┼───────────┘
┌─────▼─────┐
│Redis Pub/ │
│Sub Cluster│
└───────────┘
Per-server capacity planning:
- A single Linux server can hold ~100K-500K concurrent WebSocket connections (tuned)
- Each connection consumes ~10-50 KB of memory (depends on buffering)
- File descriptor limits: increase
ulimit -nandfs.file-max - Ephemeral port range: widen with
net.ipv4.ip_local_port_range
Kernel tuning for high connection counts#
# /etc/sysctl.conf
net.core.somaxconn = 65535
net.ipv4.ip_local_port_range = 1024 65535
fs.file-max = 1000000
net.ipv4.tcp_tw_reuse = 1
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
Connection limits and backpressure#
Server-side limits#
Never allow unlimited connections. Set a per-server maximum and reject new connections gracefully when full.
const MAX_CONNECTIONS = 50000
let activeConnections = 0
wss.on("connection", (ws) => {
if (activeConnections >= MAX_CONNECTIONS) {
ws.close(1013, "Server full, try another")
return
}
activeConnections++
ws.on("close", () => activeConnections--)
})
Backpressure handling#
When a client reads slower than the server writes, the send buffer grows. Detect this and drop non-critical messages.
function safeSend(ws, message, priority) {
if (ws.bufferedAmount > HIGH_WATER_MARK && priority !== "critical") {
return // drop non-critical messages
}
ws.send(message)
}
Heartbeat and ping-pong#
WebSocket connections can silently die — the client's network drops, a proxy closes the idle connection, or the process crashes without sending a close frame.
Server-side ping#
The server sends a WebSocket ping frame at regular intervals. If no pong returns within a timeout, the connection is dead.
const HEARTBEAT_INTERVAL = 30000 // 30 seconds
const PONG_TIMEOUT = 10000 // 10 seconds
function heartbeat(ws) {
ws.isAlive = true
}
wss.on("connection", (ws) => {
ws.isAlive = true
ws.on("pong", () => heartbeat(ws))
})
setInterval(() => {
wss.clients.forEach((ws) => {
if (!ws.isAlive) {
ws.terminate()
return
}
ws.isAlive = false
ws.ping()
})
}, HEARTBEAT_INTERVAL)
Client-side heartbeat#
Clients should also detect dead connections. If no message (or pong) arrives within the expected interval, trigger reconnection.
Reconnection strategies#
Clients will disconnect. Networks are unreliable. Your reconnection strategy determines user experience.
Exponential backoff with jitter#
function reconnect(attempt) {
const baseDelay = 1000 // 1 second
const maxDelay = 30000 // 30 seconds
const delay = Math.min(
baseDelay * Math.pow(2, attempt),
maxDelay
)
// Add jitter to prevent thundering herd
const jitter = delay * 0.5 * Math.random()
setTimeout(() => connect(), delay + jitter)
}
Why jitter matters: If your server restarts and 50,000 clients all reconnect at exactly the same time with the same exponential delay, they hit the server in synchronized waves. Jitter randomizes the reconnection timing, spreading the load.
Reconnection with state recovery#
After reconnecting, the client needs to recover missed messages. Two patterns:
Pattern 1 — Last event ID:
Client sends: "last_event_id=12345"
Server replays events 12346+ from Redis Streams
Pattern 2 — Timestamp-based:
Client sends: "resume_after=1711670400000"
Server queries message store for messages after that timestamp
Pattern 3 — Full state sync:
Client reconnects and requests full current state
Simpler but more bandwidth
Socket.IO cluster adapter#
Socket.IO provides a built-in solution for multi-server WebSocket scaling through adapters.
Redis adapter#
// Server setup with Socket.IO + Redis adapter
import { createAdapter } from "@socket.io/redis-adapter"
import { createClient } from "redis"
const pubClient = createClient({ url: "redis://redis:6379" })
const subClient = pubClient.duplicate()
await Promise.all([pubClient.connect(), subClient.connect()])
const io = new Server(httpServer)
io.adapter(createAdapter(pubClient, subClient))
// Now io.emit() automatically fans out across all servers
io.to("room-42").emit("message", { text: "Hello everyone" })
The adapter handles:
- Cross-server room broadcasts
- Room membership synchronization
- Acknowledgment collection across servers
Alternative adapters#
┌──────────────────┬──────────────────────────────────────┐
│ Adapter │ Best for │
├──────────────────┼──────────────────────────────────────┤
│ Redis │ Most deployments (simple, fast) │
│ Redis Streams │ Need message persistence / replay │
│ Postgres │ Already running Postgres, lower load │
│ MongoDB │ Already running Mongo, change streams│
│ Cluster (Node) │ Single machine, multiple CPU cores │
│ NATS │ Existing NATS infrastructure │
└──────────────────┴──────────────────────────────────────┘
Architecture at scale#
Full WebSocket scaling architecture:
CDN / Edge
↓ (terminates TLS, routes upgrade)
Load Balancer (HAProxy / ALB)
↓ (sticky sessions via cookie)
┌─────────────────────────────────────────┐
│ WebSocket Server Tier (auto-scaled) │
│ Server 1 ── Server 2 ── Server 3 ──… │
│ ↕ Redis Pub/Sub ↕ │
└─────────────────────────────────────────┘
↓ (persist events)
Redis Streams / Kafka (message durability)
↓
Message Store (Postgres / Cassandra)
Design your real-time architecture at codelit.io — generate interactive diagrams showing WebSocket fan-out, pub/sub, and reconnection flows.
Summary#
- Sticky sessions route clients to the same server — use cookie or connection ID, not IP hash
- Redis pub/sub fans out messages across servers — every server subscribes, every server publishes
- Horizontal scaling — tune kernel params, set connection limits, plan for 50K-100K connections per server
- Heartbeat ping-pong detects dead connections — 30-second interval, 10-second pong timeout
- Exponential backoff with jitter prevents thundering herd on reconnection
- Socket.IO Redis adapter handles cross-server rooms and broadcasts out of the box
Article #444 in the Codelit engineering series. Explore our full library of system design, infrastructure, and architecture guides at codelit.io.
Try it on Codelit
Cost Estimator
See estimated AWS monthly costs for every component in your architecture
Related articles
Try these templates
Uber Real-Time Location System
Handles 5M+ GPS pings per second using H3 hexagonal geospatial indexing.
6 componentsReal-Time Collaborative Editor
Notion-like document editor with real-time collaboration, conflict resolution, and rich media.
9 componentsReal-Time Analytics Dashboard
Live analytics platform with event ingestion, stream processing, and interactive dashboards.
8 componentsBuild this architecture
Generate an interactive architecture for WebSocket Scaling in seconds.
Try it in Codelit →
Comments