WebSocketreal-timescalinginfrastructuresystem design

WebSocket Scaling: Sticky Sessions, Redis Pub/Sub, and Horizontal Scale

March 29, 2026 8 min readBy Codelit Team Discussion

WebSocket Scaling: From One Server to Thousands#

A single server handles WebSocket connections fine. Add a second server and everything breaks. Client A connects to Server 1, Client B connects to Server 2. Server 1 broadcasts a message — Client B never receives it.

Scaling WebSockets requires solving connection affinity, cross-server messaging, and graceful failure handling.

Why WebSockets are harder to scale than HTTP#

HTTP is stateless. Any server can handle any request. Load balancers round-robin freely.

WebSockets are stateful. A persistent TCP connection ties a client to a specific server. That connection holds in-memory state (subscriptions, authentication context, pending messages). You cannot just route the next message to a different server.

The core problems:

Connection affinity — clients must reach the same server for the duration of the session
Cross-server messaging — a message on Server 1 must reach clients on Server 2
Connection limits — each server has a finite number of sockets it can hold
Failure recovery — when a server dies, thousands of connections drop simultaneously

Sticky sessions#

Sticky sessions (session affinity) ensure a client always routes to the same backend server.

How it works#

The load balancer inspects the initial HTTP upgrade request and pins the client to a server. Subsequent frames on that connection naturally stay on the same TCP stream.

Common sticky strategies:

1. Cookie-based:
   Load balancer sets a cookie (e.g., SERVERID=backend-2)
   Client sends cookie on reconnect → routed to same server

2. IP hash:
   hash(client_ip) % num_servers = target server
   Simple but breaks with NAT / shared IPs

3. Connection ID:
   Application generates a session token
   Load balancer uses token to route
   Most reliable for WebSocket reconnection

Configuration with Nginx#

upstream websocket_servers {
    ip_hash;
    server backend1:8080;
    server backend2:8080;
    server backend3:8080;
}

server {
    location /ws {
        proxy_pass http://websocket_servers;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "upgrade";
        proxy_read_timeout 86400s;
    }
}

Limitation: Sticky sessions solve routing but not cross-server messaging. If User A (Server 1) sends a chat message to User B (Server 2), Server 1 has no way to push it to Server 2's clients.

Redis pub/sub for fan-out#

Redis pub/sub is the standard solution for cross-server WebSocket messaging.

Architecture#

Client A → Server 1 → publishes to Redis channel "chat:room-42"
                              ↓
           Redis fans out to all subscribers
                              ↓
           Server 1 (subscribed) → pushes to local clients
           Server 2 (subscribed) → pushes to local clients
           Server 3 (subscribed) → pushes to local clients

Every WebSocket server subscribes to the Redis channels it needs. When a message arrives on any server, it publishes to Redis. Redis delivers to all subscribers. Each server then pushes to its locally connected clients.

Implementation pattern#

// Server-side (Node.js with Redis)
const redisSub = createClient()  // subscriber connection
const redisPub = createClient()  // publisher connection

// Subscribe to channels for rooms this server has clients in
redisSub.subscribe("chat:room-42", (message) => {
    // Fan out to all local WebSocket clients in room-42
    const room = localRooms.get("room-42")
    room.forEach(ws => ws.send(message))
})

// When a client sends a message
ws.on("message", (data) => {
    redisPub.publish("chat:room-42", data)
})

Important: Use separate Redis connections for subscribe and publish. A connection in subscribe mode cannot run other commands.

Redis pub/sub limitations#

No persistence — if a server is down when a message publishes, it misses that message
No replay — new subscribers do not receive historical messages
Fan-out cost — every message goes to every server, even if that server has zero clients in the channel

For persistence, use Redis Streams instead of pub/sub. Streams retain messages and support consumer groups.

Horizontal scaling strategies#

Scaling the WebSocket tier#

                    ┌──────────────┐
                    │ Load Balancer│
                    │(sticky sessions)
                    └──────┬───────┘
               ┌───────────┼───────────┐
               ▼           ▼           ▼
          ┌─────────┐ ┌─────────┐ ┌─────────┐
          │ WS Srv 1│ │ WS Srv 2│ │ WS Srv 3│
          │ 50K conn│ │ 50K conn│ │ 50K conn│
          └────┬────┘ └────┬────┘ └────┬────┘
               └───────────┼───────────┘
                     ┌─────▼─────┐
                     │Redis Pub/ │
                     │Sub Cluster│
                     └───────────┘

Per-server capacity planning:

A single Linux server can hold ~100K-500K concurrent WebSocket connections (tuned)
Each connection consumes ~10-50 KB of memory (depends on buffering)
File descriptor limits: increase ulimit -n and fs.file-max
Ephemeral port range: widen with net.ipv4.ip_local_port_range

Kernel tuning for high connection counts#

# /etc/sysctl.conf
net.core.somaxconn = 65535
net.ipv4.ip_local_port_range = 1024 65535
fs.file-max = 1000000
net.ipv4.tcp_tw_reuse = 1
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216

Connection limits and backpressure#

Server-side limits#

Never allow unlimited connections. Set a per-server maximum and reject new connections gracefully when full.

const MAX_CONNECTIONS = 50000
let activeConnections = 0

wss.on("connection", (ws) => {
    if (activeConnections >= MAX_CONNECTIONS) {
        ws.close(1013, "Server full, try another")
        return
    }
    activeConnections++
    ws.on("close", () => activeConnections--)
})

Backpressure handling#

When a client reads slower than the server writes, the send buffer grows. Detect this and drop non-critical messages.

function safeSend(ws, message, priority) {
    if (ws.bufferedAmount > HIGH_WATER_MARK && priority !== "critical") {
        return // drop non-critical messages
    }
    ws.send(message)
}

Heartbeat and ping-pong#

WebSocket connections can silently die — the client's network drops, a proxy closes the idle connection, or the process crashes without sending a close frame.

Server-side ping#

The server sends a WebSocket ping frame at regular intervals. If no pong returns within a timeout, the connection is dead.

const HEARTBEAT_INTERVAL = 30000  // 30 seconds
const PONG_TIMEOUT = 10000        // 10 seconds

function heartbeat(ws) {
    ws.isAlive = true
}

wss.on("connection", (ws) => {
    ws.isAlive = true
    ws.on("pong", () => heartbeat(ws))
})

setInterval(() => {
    wss.clients.forEach((ws) => {
        if (!ws.isAlive) {
            ws.terminate()
            return
        }
        ws.isAlive = false
        ws.ping()
    })
}, HEARTBEAT_INTERVAL)

Client-side heartbeat#

Clients should also detect dead connections. If no message (or pong) arrives within the expected interval, trigger reconnection.

Reconnection strategies#

Clients will disconnect. Networks are unreliable. Your reconnection strategy determines user experience.

Exponential backoff with jitter#

function reconnect(attempt) {
    const baseDelay = 1000         // 1 second
    const maxDelay = 30000         // 30 seconds
    const delay = Math.min(
        baseDelay * Math.pow(2, attempt),
        maxDelay
    )
    // Add jitter to prevent thundering herd
    const jitter = delay * 0.5 * Math.random()
    setTimeout(() => connect(), delay + jitter)
}

Why jitter matters: If your server restarts and 50,000 clients all reconnect at exactly the same time with the same exponential delay, they hit the server in synchronized waves. Jitter randomizes the reconnection timing, spreading the load.

Reconnection with state recovery#

After reconnecting, the client needs to recover missed messages. Two patterns:

Pattern 1 — Last event ID:
  Client sends: "last_event_id=12345"
  Server replays events 12346+ from Redis Streams

Pattern 2 — Timestamp-based:
  Client sends: "resume_after=1711670400000"
  Server queries message store for messages after that timestamp

Pattern 3 — Full state sync:
  Client reconnects and requests full current state
  Simpler but more bandwidth

Socket.IO cluster adapter#

Socket.IO provides a built-in solution for multi-server WebSocket scaling through adapters.

Redis adapter#

// Server setup with Socket.IO + Redis adapter
import { createAdapter } from "@socket.io/redis-adapter"
import { createClient } from "redis"

const pubClient = createClient({ url: "redis://redis:6379" })
const subClient = pubClient.duplicate()

await Promise.all([pubClient.connect(), subClient.connect()])

const io = new Server(httpServer)
io.adapter(createAdapter(pubClient, subClient))

// Now io.emit() automatically fans out across all servers
io.to("room-42").emit("message", { text: "Hello everyone" })

The adapter handles:

Cross-server room broadcasts
Room membership synchronization
Acknowledgment collection across servers

Alternative adapters#

┌──────────────────┬──────────────────────────────────────┐
│ Adapter          │ Best for                             │
├──────────────────┼──────────────────────────────────────┤
│ Redis            │ Most deployments (simple, fast)      │
│ Redis Streams    │ Need message persistence / replay    │
│ Postgres         │ Already running Postgres, lower load │
│ MongoDB          │ Already running Mongo, change streams│
│ Cluster (Node)   │ Single machine, multiple CPU cores   │
│ NATS             │ Existing NATS infrastructure         │
└──────────────────┴──────────────────────────────────────┘

Architecture at scale#

Full WebSocket scaling architecture:

CDN / Edge
    ↓ (terminates TLS, routes upgrade)
Load Balancer (HAProxy / ALB)
    ↓ (sticky sessions via cookie)
┌─────────────────────────────────────────┐
│ WebSocket Server Tier (auto-scaled)     │
│  Server 1 ── Server 2 ── Server 3 ──…  │
│         ↕ Redis Pub/Sub ↕               │
└─────────────────────────────────────────┘
    ↓ (persist events)
Redis Streams / Kafka (message durability)
    ↓
Message Store (Postgres / Cassandra)

Design your real-time architecture at codelit.io — generate interactive diagrams showing WebSocket fan-out, pub/sub, and reconnection flows.

Summary#

Sticky sessions route clients to the same server — use cookie or connection ID, not IP hash
Redis pub/sub fans out messages across servers — every server subscribes, every server publishes
Horizontal scaling — tune kernel params, set connection limits, plan for 50K-100K connections per server
Heartbeat ping-pong detects dead connections — 30-second interval, 10-second pong timeout
Exponential backoff with jitter prevents thundering herd on reconnection
Socket.IO Redis adapter handles cross-server rooms and broadcasts out of the box

Article #444 in the Codelit engineering series. Explore our full library of system design, infrastructure, and architecture guides at codelit.io.

Try it on Codelit

Cost Estimator

See estimated AWS monthly costs for every component in your architecture

Build this architecture →

Comments

AI search

AI-Powered Search Architecture: Semantic Search, Hybrid Search, and RAG

8 min read

AI safety

AI Safety Guardrails Architecture: Input Validation, Output Filtering, and Human-in-the-Loop

8 min read

API design

API Backward Compatibility: Ship Changes Without Breaking Consumers

6 min read

Try these templates

Uber Real-Time Location System

Handles 5M+ GPS pings per second using H3 hexagonal geospatial indexing.

6 components

Real-Time Collaborative Editor

Notion-like document editor with real-time collaboration, conflict resolution, and rich media.

9 components

Real-Time Analytics Dashboard

Live analytics platform with event ingestion, stream processing, and interactive dashboards.

8 components

Build this architecture

Generate an interactive architecture for WebSocket Scaling in seconds.

Try it in Codelit →

WebSocketreal-timescalinginfrastructuresystem design

WebSocket Scaling: Sticky Sessions, Redis Pub/Sub, and Horizontal Scale

March 29, 2026 8 min readBy Codelit Team Discussion

WebSocket Scaling: From One Server to Thousands#

Scaling WebSockets requires solving connection affinity, cross-server messaging, and graceful failure handling.

Why WebSockets are harder to scale than HTTP#

HTTP is stateless. Any server can handle any request. Load balancers round-robin freely.

The core problems:

Connection affinity — clients must reach the same server for the duration of the session
Cross-server messaging — a message on Server 1 must reach clients on Server 2
Connection limits — each server has a finite number of sockets it can hold
Failure recovery — when a server dies, thousands of connections drop simultaneously

Sticky sessions#

Sticky sessions (session affinity) ensure a client always routes to the same backend server.

How it works#

The load balancer inspects the initial HTTP upgrade request and pins the client to a server. Subsequent frames on that connection naturally stay on the same TCP stream.

Common sticky strategies:

1. Cookie-based:
   Load balancer sets a cookie (e.g., SERVERID=backend-2)
   Client sends cookie on reconnect → routed to same server

2. IP hash:
   hash(client_ip) % num_servers = target server
   Simple but breaks with NAT / shared IPs

3. Connection ID:
   Application generates a session token
   Load balancer uses token to route
   Most reliable for WebSocket reconnection

Configuration with Nginx#

upstream websocket_servers {
    ip_hash;
    server backend1:8080;
    server backend2:8080;
    server backend3:8080;
}

server {
    location /ws {
        proxy_pass http://websocket_servers;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "upgrade";
        proxy_read_timeout 86400s;
    }
}

Limitation: Sticky sessions solve routing but not cross-server messaging. If User A (Server 1) sends a chat message to User B (Server 2), Server 1 has no way to push it to Server 2's clients.

Redis pub/sub for fan-out#

Redis pub/sub is the standard solution for cross-server WebSocket messaging.

Architecture#

Client A → Server 1 → publishes to Redis channel "chat:room-42"
                              ↓
           Redis fans out to all subscribers
                              ↓
           Server 1 (subscribed) → pushes to local clients
           Server 2 (subscribed) → pushes to local clients
           Server 3 (subscribed) → pushes to local clients

Implementation pattern#

// Server-side (Node.js with Redis)
const redisSub = createClient()  // subscriber connection
const redisPub = createClient()  // publisher connection

// Subscribe to channels for rooms this server has clients in
redisSub.subscribe("chat:room-42", (message) => {
    // Fan out to all local WebSocket clients in room-42
    const room = localRooms.get("room-42")
    room.forEach(ws => ws.send(message))
})

// When a client sends a message
ws.on("message", (data) => {
    redisPub.publish("chat:room-42", data)
})

Important: Use separate Redis connections for subscribe and publish. A connection in subscribe mode cannot run other commands.

Redis pub/sub limitations#

No persistence — if a server is down when a message publishes, it misses that message
No replay — new subscribers do not receive historical messages
Fan-out cost — every message goes to every server, even if that server has zero clients in the channel

For persistence, use Redis Streams instead of pub/sub. Streams retain messages and support consumer groups.

Horizontal scaling strategies#

Scaling the WebSocket tier#

                    ┌──────────────┐
                    │ Load Balancer│
                    │(sticky sessions)
                    └──────┬───────┘
               ┌───────────┼───────────┐
               ▼           ▼           ▼
          ┌─────────┐ ┌─────────┐ ┌─────────┐
          │ WS Srv 1│ │ WS Srv 2│ │ WS Srv 3│
          │ 50K conn│ │ 50K conn│ │ 50K conn│
          └────┬────┘ └────┬────┘ └────┬────┘
               └───────────┼───────────┘
                     ┌─────▼─────┐
                     │Redis Pub/ │
                     │Sub Cluster│
                     └───────────┘

Per-server capacity planning:

A single Linux server can hold ~100K-500K concurrent WebSocket connections (tuned)
Each connection consumes ~10-50 KB of memory (depends on buffering)
File descriptor limits: increase ulimit -n and fs.file-max
Ephemeral port range: widen with net.ipv4.ip_local_port_range

Kernel tuning for high connection counts#

# /etc/sysctl.conf
net.core.somaxconn = 65535
net.ipv4.ip_local_port_range = 1024 65535
fs.file-max = 1000000
net.ipv4.tcp_tw_reuse = 1
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216

Connection limits and backpressure#

Server-side limits#

Never allow unlimited connections. Set a per-server maximum and reject new connections gracefully when full.

const MAX_CONNECTIONS = 50000
let activeConnections = 0

wss.on("connection", (ws) => {
    if (activeConnections >= MAX_CONNECTIONS) {
        ws.close(1013, "Server full, try another")
        return
    }
    activeConnections++
    ws.on("close", () => activeConnections--)
})

Backpressure handling#

When a client reads slower than the server writes, the send buffer grows. Detect this and drop non-critical messages.

function safeSend(ws, message, priority) {
    if (ws.bufferedAmount > HIGH_WATER_MARK && priority !== "critical") {
        return // drop non-critical messages
    }
    ws.send(message)
}

Heartbeat and ping-pong#

WebSocket connections can silently die — the client's network drops, a proxy closes the idle connection, or the process crashes without sending a close frame.

Server-side ping#

The server sends a WebSocket ping frame at regular intervals. If no pong returns within a timeout, the connection is dead.

const HEARTBEAT_INTERVAL = 30000  // 30 seconds
const PONG_TIMEOUT = 10000        // 10 seconds

function heartbeat(ws) {
    ws.isAlive = true
}

wss.on("connection", (ws) => {
    ws.isAlive = true
    ws.on("pong", () => heartbeat(ws))
})

setInterval(() => {
    wss.clients.forEach((ws) => {
        if (!ws.isAlive) {
            ws.terminate()
            return
        }
        ws.isAlive = false
        ws.ping()
    })
}, HEARTBEAT_INTERVAL)

Client-side heartbeat#

Clients should also detect dead connections. If no message (or pong) arrives within the expected interval, trigger reconnection.

Reconnection strategies#

Clients will disconnect. Networks are unreliable. Your reconnection strategy determines user experience.

Exponential backoff with jitter#

function reconnect(attempt) {
    const baseDelay = 1000         // 1 second
    const maxDelay = 30000         // 30 seconds
    const delay = Math.min(
        baseDelay * Math.pow(2, attempt),
        maxDelay
    )
    // Add jitter to prevent thundering herd
    const jitter = delay * 0.5 * Math.random()
    setTimeout(() => connect(), delay + jitter)
}

Reconnection with state recovery#

After reconnecting, the client needs to recover missed messages. Two patterns:

Pattern 1 — Last event ID:
  Client sends: "last_event_id=12345"
  Server replays events 12346+ from Redis Streams

Pattern 2 — Timestamp-based:
  Client sends: "resume_after=1711670400000"
  Server queries message store for messages after that timestamp

Pattern 3 — Full state sync:
  Client reconnects and requests full current state
  Simpler but more bandwidth

Socket.IO cluster adapter#

Socket.IO provides a built-in solution for multi-server WebSocket scaling through adapters.

Redis adapter#

// Server setup with Socket.IO + Redis adapter
import { createAdapter } from "@socket.io/redis-adapter"
import { createClient } from "redis"

const pubClient = createClient({ url: "redis://redis:6379" })
const subClient = pubClient.duplicate()

await Promise.all([pubClient.connect(), subClient.connect()])

const io = new Server(httpServer)
io.adapter(createAdapter(pubClient, subClient))

// Now io.emit() automatically fans out across all servers
io.to("room-42").emit("message", { text: "Hello everyone" })

The adapter handles:

Cross-server room broadcasts
Room membership synchronization
Acknowledgment collection across servers

Alternative adapters#

┌──────────────────┬──────────────────────────────────────┐
│ Adapter          │ Best for                             │
├──────────────────┼──────────────────────────────────────┤
│ Redis            │ Most deployments (simple, fast)      │
│ Redis Streams    │ Need message persistence / replay    │
│ Postgres         │ Already running Postgres, lower load │
│ MongoDB          │ Already running Mongo, change streams│
│ Cluster (Node)   │ Single machine, multiple CPU cores   │
│ NATS             │ Existing NATS infrastructure         │
└──────────────────┴──────────────────────────────────────┘

Architecture at scale#

Full WebSocket scaling architecture:

CDN / Edge
    ↓ (terminates TLS, routes upgrade)
Load Balancer (HAProxy / ALB)
    ↓ (sticky sessions via cookie)
┌─────────────────────────────────────────┐
│ WebSocket Server Tier (auto-scaled)     │
│  Server 1 ── Server 2 ── Server 3 ──…  │
│         ↕ Redis Pub/Sub ↕               │
└─────────────────────────────────────────┘
    ↓ (persist events)
Redis Streams / Kafka (message durability)
    ↓
Message Store (Postgres / Cassandra)

Design your real-time architecture at codelit.io — generate interactive diagrams showing WebSocket fan-out, pub/sub, and reconnection flows.

Summary#

Sticky sessions route clients to the same server — use cookie or connection ID, not IP hash
Redis pub/sub fans out messages across servers — every server subscribes, every server publishes
Horizontal scaling — tune kernel params, set connection limits, plan for 50K-100K connections per server
Heartbeat ping-pong detects dead connections — 30-second interval, 10-second pong timeout
Exponential backoff with jitter prevents thundering herd on reconnection
Socket.IO Redis adapter handles cross-server rooms and broadcasts out of the box

Article #444 in the Codelit engineering series. Explore our full library of system design, infrastructure, and architecture guides at codelit.io.

Try it on Codelit

Cost Estimator

See estimated AWS monthly costs for every component in your architecture

Build this architecture →

Comments

AI search

Build this architecture

Generate an interactive architecture for WebSocket Scaling in seconds.

Try it in Codelit →

WebSocket Scaling: Sticky Sessions, Redis Pub/Sub, and Horizontal Scale

WebSocket Scaling: From One Server to Thousands#

Why WebSockets are harder to scale than HTTP#

Sticky sessions#

How it works#

Configuration with Nginx#

Redis pub/sub for fan-out#

Architecture#

Implementation pattern#

Redis pub/sub limitations#

Horizontal scaling strategies#

Scaling the WebSocket tier#

Kernel tuning for high connection counts#

Connection limits and backpressure#

Server-side limits#

Backpressure handling#

Heartbeat and ping-pong#

Server-side ping#

Client-side heartbeat#

Reconnection strategies#

Exponential backoff with jitter#

Reconnection with state recovery#

Socket.IO cluster adapter#

Redis adapter#

Alternative adapters#

Architecture at scale#

Summary#

Comments

Related articles

AI-Powered Search Architecture: Semantic Search, Hybrid Search, and RAG

AI Safety Guardrails Architecture: Input Validation, Output Filtering, and Human-in-the-Loop

API Backward Compatibility: Ship Changes Without Breaking Consumers

Try these templates

Uber Real-Time Location System

Real-Time Collaborative Editor

Real-Time Analytics Dashboard

Build this architecture

WebSocket Scaling: Sticky Sessions, Redis Pub/Sub, and Horizontal Scale

WebSocket Scaling: From One Server to Thousands#

Why WebSockets are harder to scale than HTTP#

Sticky sessions#

How it works#

Configuration with Nginx#

Redis pub/sub for fan-out#

Architecture#

Implementation pattern#

Redis pub/sub limitations#

Horizontal scaling strategies#

Scaling the WebSocket tier#

Kernel tuning for high connection counts#

Connection limits and backpressure#

Server-side limits#

Backpressure handling#

Heartbeat and ping-pong#

Server-side ping#

Client-side heartbeat#

Reconnection strategies#

Exponential backoff with jitter#

Reconnection with state recovery#

Socket.IO cluster adapter#

Redis adapter#

Alternative adapters#

Architecture at scale#

Summary#

Comments

Related articles

AI-Powered Search Architecture: Semantic Search, Hybrid Search, and RAG

AI Safety Guardrails Architecture: Input Validation, Output Filtering, and Human-in-the-Loop

API Backward Compatibility: Ship Changes Without Breaking Consumers

Try these templates

Uber Real-Time Location System

Real-Time Collaborative Editor

Real-Time Analytics Dashboard

Build this architecture