Membership Protocols — SWIM, Gossip, and How Distributed Systems Track Who's Alive
The problem: who's in the cluster?#
You have 50 nodes in a distributed system. A new node joins. Two nodes crash. One node's network cable gets unplugged for 30 seconds and then reconnects.
Every node needs to answer one question: who is currently alive and reachable? That's the membership problem.
Why membership matters#
Without accurate membership information, distributed systems can't do:
- Load balancing — you'll route traffic to dead nodes
- Replication — you'll try to replicate data to machines that don't exist
- Consensus — you need to know who's in the quorum
- Failure recovery — you can't replace a failed node if you don't know it failed
The naive approach: heartbeats to a central server#
The simplest design: every node sends a heartbeat to a central coordinator every few seconds. If the coordinator stops receiving heartbeats, it marks the node as dead.
Problems:
- Single point of failure — if the coordinator dies, nobody knows anything
- Scalability bottleneck — with 1,000 nodes sending heartbeats every 2 seconds, the coordinator handles 500 messages per second
- False positives — network partition between coordinator and a healthy node looks like node failure
All-to-all heartbeats#
Every node heartbeats to every other node. No central coordinator.
Problems:
- O(n^2) message overhead — 100 nodes means 9,900 heartbeat messages per interval
- Bandwidth explosion — scales terribly past a few dozen nodes
SWIM: the scalable solution#
SWIM (Scalable Weakly-consistent Infection-style Process Group Membership Protocol) was introduced in 2002 and solves both problems elegantly.
How SWIM failure detection works#
Each node runs this protocol every T seconds:
- Pick a random target — node A randomly selects node B
- Direct ping — A sends a ping to B and waits for an ack
- If B responds — B is alive. Done.
- If B doesn't respond — A picks k random nodes (called "indirect probers") and asks them to ping B on its behalf
- If any indirect probe gets an ack — B is alive (A just couldn't reach B directly)
- If no indirect probes succeed — B is declared suspect or failed
This gives you O(n) total message load per protocol period, regardless of cluster size.
Why indirect probing matters#
Direct ping failure doesn't always mean the target is dead. Common causes:
- Asymmetric network partition — A can't reach B, but C can
- Temporary congestion — B's network buffer is full but it's otherwise healthy
- Firewall rules — specific port blocked between A and B only
Indirect probing dramatically reduces false positives.
The suspicion mechanism#
SWIM's original paper declares nodes as either alive or failed. The improved version adds a suspect state:
- Alive — node is responding normally
- Suspect — node failed a ping round but hasn't been confirmed dead
- Dead — node has been suspect for longer than the suspicion timeout
Why suspect matters:
A node under heavy load might miss a ping deadline. Without the suspect state, it gets immediately marked dead, triggering unnecessary rebalancing. The suspect state gives it a grace period to recover.
Alive → (missed ping round) → Suspect → (timeout expires) → Dead
↓
(responds to ping) → Alive
Suspicion timeout is typically 3-5x the protocol period. Too short and you get false positives. Too long and genuinely failed nodes stay in the membership list.
Gossip-based dissemination#
SWIM detects failures, but how does every node learn about membership changes? Gossip protocol (also called epidemic protocol).
How gossip works#
When a node learns something new (join, leave, failure), it piggybacks that information on the protocol messages it's already sending:
- Node A detects that node B failed
- A attaches "B is dead" to its next ping messages
- Each node that receives this information attaches it to their pings
- The information spreads exponentially through the cluster
Key property: Information reaches all N nodes in O(log N) protocol rounds. In a 1,000-node cluster, every node knows about a failure within about 10 rounds.
Crux: no extra messages#
Gossip information piggybacks on existing SWIM protocol messages. No additional bandwidth cost. This is what makes the combination so efficient.
Join, leave, and fail events#
Join#
- New node contacts any existing member (a "seed node")
- Seed node adds the new node to its local membership list
- Seed gossips "new node joined" to the cluster
- All nodes learn about the new member within O(log N) rounds
Graceful leave#
- Departing node broadcasts a "leave" message
- Other nodes remove it from their membership list
- No suspicion period needed — the node explicitly announced departure
Failure (ungraceful)#
- SWIM protocol detects the failure through ping/indirect-ping
- Node enters suspect state
- If suspicion timeout expires without recovery, node is marked dead
- Failure event is gossiped to all nodes
- Application layer handles reassignment of the dead node's responsibilities
Incarnation numbers#
What happens when a node is falsely suspected but is actually alive? It needs a way to refute the suspicion.
Each node maintains an incarnation number (starting at 0). When a node hears that it's been suspected, it:
- Increments its incarnation number
- Broadcasts an "alive" message with the new incarnation number
- Higher incarnation numbers override lower ones
This prevents stale "suspect" messages from overriding fresh "alive" messages.
Memberlist: HashiCorp's implementation#
Memberlist is HashiCorp's Go library that implements SWIM with several enhancements:
- Full push/pull sync — periodically, two nodes exchange their complete membership list to catch any missed gossip
- Configurable protocol parameters — tune ping interval, suspicion multiplier, indirect probe count
- Encryption — supports symmetric-key encryption for all protocol messages
- Delegate interface — hook into join/leave/update events for application logic
// Basic Memberlist configuration
config := memberlist.DefaultLANConfig()
config.Name = "node-1"
config.BindPort = 7946
list, err := memberlist.Create(config)
if err != nil {
log.Fatal(err)
}
// Join an existing cluster
_, err = list.Join([]string{"10.0.0.1:7946"})
// Get current members
for _, member := range list.Members() {
fmt.Printf("Member: %s %s\n", member.Name, member.Addr)
}
LAN vs WAN configuration#
Memberlist ships with two presets:
| Parameter | LAN | WAN |
|---|---|---|
| Probe interval | 1 second | 5 seconds |
| Suspicion multiplier | 4 | 6 |
| Retransmit multiplier | 4 | 4 |
| Push/pull interval | 30 seconds | 60 seconds |
WAN configuration accounts for higher latency and more frequent transient failures.
Serf: cluster orchestration on top of Memberlist#
Serf is HashiCorp's tool that builds on Memberlist to provide:
- Cluster membership — nodes discover and track each other automatically
- Custom events — broadcast application-specific events to all nodes
- Queries — send a question to all nodes and collect responses
- Tags — attach key-value metadata to nodes (role, datacenter, version)
Serf is what Consul uses under the hood for service discovery and health checking.
# Start a Serf agent
serf agent -node=web-1 -bind=10.0.0.1:7946 -tag role=web
# Join an existing cluster
serf join 10.0.0.2:7946
# Query cluster members
serf members
# web-1 10.0.0.1:7946 alive role=web
# web-2 10.0.0.2:7946 alive role=web
# db-1 10.0.0.3:7946 alive role=db
# Send a custom event
serf event deploy v2.1.0
Comparison: membership protocol approaches#
| Approach | Messages/round | Failure detection | Scalability |
|---|---|---|---|
| Central heartbeat | O(n) | Seconds | Poor (SPOF) |
| All-to-all | O(n^2) | Fast | Poor |
| SWIM | O(n) | Tunable | Excellent |
| SWIM + gossip | O(n) | Tunable | Excellent |
When to use what#
Use Memberlist directly when you're building a custom distributed system in Go and need low-level control over membership events.
Use Serf when you need cluster membership plus event broadcasting without writing the protocol layer yourself.
Use Consul when you need full service discovery, health checking, and KV store — it uses Serf/Memberlist internally.
The practical takeaway#
Membership protocols are the foundation that everything else in a distributed system builds on. Before you can elect a leader, replicate data, or balance load, you need to know who's in the cluster.
SWIM solved the scalability problem by replacing O(n^2) heartbeats with randomized probing. Gossip solved the dissemination problem by piggybacking on existing messages. Together, they give you accurate, efficient membership tracking that scales to thousands of nodes.
Article #449 in the Codelit engineering series. Explore our full library of system design, infrastructure, and architecture guides at codelit.io.
Try it on Codelit
Chaos Mode
Simulate node failures and watch cascading impact across your architecture
Cost Estimator
See estimated AWS monthly costs for every component in your architecture
Related articles
Try these templates
Distributed Rate Limiter
API rate limiting with sliding window, token bucket, and per-user quotas.
7 componentsProject Management Platform
Jira/Linear-like tool with issues, sprints, boards, workflows, and real-time collaboration.
8 componentsDistributed Key-Value Store
Redis/DynamoDB-like distributed KV store with consistent hashing, replication, and tunable consistency.
8 componentsBuild this architecture
Generate an interactive architecture for Membership Protocols in seconds.
Try it in Codelit →
Comments