distributed-systemsinfrastructuresystem-designnetworking

Membership Protocols — SWIM, Gossip, and How Distributed Systems Track Who's Alive

March 29, 2026 8 min readBy Codelit Team Discussion

The problem: who's in the cluster?#

You have 50 nodes in a distributed system. A new node joins. Two nodes crash. One node's network cable gets unplugged for 30 seconds and then reconnects.

Every node needs to answer one question: who is currently alive and reachable? That's the membership problem.

Why membership matters#

Without accurate membership information, distributed systems can't do:

Load balancing — you'll route traffic to dead nodes
Replication — you'll try to replicate data to machines that don't exist
Consensus — you need to know who's in the quorum
Failure recovery — you can't replace a failed node if you don't know it failed

The naive approach: heartbeats to a central server#

The simplest design: every node sends a heartbeat to a central coordinator every few seconds. If the coordinator stops receiving heartbeats, it marks the node as dead.

Problems:

Single point of failure — if the coordinator dies, nobody knows anything
Scalability bottleneck — with 1,000 nodes sending heartbeats every 2 seconds, the coordinator handles 500 messages per second
False positives — network partition between coordinator and a healthy node looks like node failure

All-to-all heartbeats#

Every node heartbeats to every other node. No central coordinator.

Problems:

O(n^2) message overhead — 100 nodes means 9,900 heartbeat messages per interval
Bandwidth explosion — scales terribly past a few dozen nodes

SWIM: the scalable solution#

SWIM (Scalable Weakly-consistent Infection-style Process Group Membership Protocol) was introduced in 2002 and solves both problems elegantly.

How SWIM failure detection works#

Each node runs this protocol every T seconds:

Pick a random target — node A randomly selects node B
Direct ping — A sends a ping to B and waits for an ack
If B responds — B is alive. Done.
If B doesn't respond — A picks k random nodes (called "indirect probers") and asks them to ping B on its behalf
If any indirect probe gets an ack — B is alive (A just couldn't reach B directly)
If no indirect probes succeed — B is declared suspect or failed

This gives you O(n) total message load per protocol period, regardless of cluster size.

Why indirect probing matters#

Direct ping failure doesn't always mean the target is dead. Common causes:

Asymmetric network partition — A can't reach B, but C can
Temporary congestion — B's network buffer is full but it's otherwise healthy
Firewall rules — specific port blocked between A and B only

Indirect probing dramatically reduces false positives.

The suspicion mechanism#

SWIM's original paper declares nodes as either alive or failed. The improved version adds a suspect state:

Alive — node is responding normally
Suspect — node failed a ping round but hasn't been confirmed dead
Dead — node has been suspect for longer than the suspicion timeout

Why suspect matters:

A node under heavy load might miss a ping deadline. Without the suspect state, it gets immediately marked dead, triggering unnecessary rebalancing. The suspect state gives it a grace period to recover.

Alive → (missed ping round) → Suspect → (timeout expires) → Dead
                                  ↓
                           (responds to ping) → Alive

Suspicion timeout is typically 3-5x the protocol period. Too short and you get false positives. Too long and genuinely failed nodes stay in the membership list.

Gossip-based dissemination#

SWIM detects failures, but how does every node learn about membership changes? Gossip protocol (also called epidemic protocol).

How gossip works#

When a node learns something new (join, leave, failure), it piggybacks that information on the protocol messages it's already sending:

Node A detects that node B failed
A attaches "B is dead" to its next ping messages
Each node that receives this information attaches it to their pings
The information spreads exponentially through the cluster

Key property: Information reaches all N nodes in O(log N) protocol rounds. In a 1,000-node cluster, every node knows about a failure within about 10 rounds.

Crux: no extra messages#

Gossip information piggybacks on existing SWIM protocol messages. No additional bandwidth cost. This is what makes the combination so efficient.

Join, leave, and fail events#

Join#

New node contacts any existing member (a "seed node")
Seed node adds the new node to its local membership list
Seed gossips "new node joined" to the cluster
All nodes learn about the new member within O(log N) rounds

Graceful leave#

Departing node broadcasts a "leave" message
Other nodes remove it from their membership list
No suspicion period needed — the node explicitly announced departure

Failure (ungraceful)#

SWIM protocol detects the failure through ping/indirect-ping
Node enters suspect state
If suspicion timeout expires without recovery, node is marked dead
Failure event is gossiped to all nodes
Application layer handles reassignment of the dead node's responsibilities

Incarnation numbers#

What happens when a node is falsely suspected but is actually alive? It needs a way to refute the suspicion.

Each node maintains an incarnation number (starting at 0). When a node hears that it's been suspected, it:

Increments its incarnation number
Broadcasts an "alive" message with the new incarnation number
Higher incarnation numbers override lower ones

This prevents stale "suspect" messages from overriding fresh "alive" messages.

Memberlist: HashiCorp's implementation#

Memberlist is HashiCorp's Go library that implements SWIM with several enhancements:

Full push/pull sync — periodically, two nodes exchange their complete membership list to catch any missed gossip
Configurable protocol parameters — tune ping interval, suspicion multiplier, indirect probe count
Encryption — supports symmetric-key encryption for all protocol messages
Delegate interface — hook into join/leave/update events for application logic

// Basic Memberlist configuration
config := memberlist.DefaultLANConfig()
config.Name = "node-1"
config.BindPort = 7946

list, err := memberlist.Create(config)
if err != nil {
    log.Fatal(err)
}

// Join an existing cluster
_, err = list.Join([]string{"10.0.0.1:7946"})

// Get current members
for _, member := range list.Members() {
    fmt.Printf("Member: %s %s\n", member.Name, member.Addr)
}

LAN vs WAN configuration#

Memberlist ships with two presets:

Parameter	LAN	WAN
Probe interval	1 second	5 seconds
Suspicion multiplier	4	6
Retransmit multiplier	4	4
Push/pull interval	30 seconds	60 seconds

WAN configuration accounts for higher latency and more frequent transient failures.

Serf: cluster orchestration on top of Memberlist#

Serf is HashiCorp's tool that builds on Memberlist to provide:

Cluster membership — nodes discover and track each other automatically
Custom events — broadcast application-specific events to all nodes
Queries — send a question to all nodes and collect responses
Tags — attach key-value metadata to nodes (role, datacenter, version)

Serf is what Consul uses under the hood for service discovery and health checking.

# Start a Serf agent
serf agent -node=web-1 -bind=10.0.0.1:7946 -tag role=web

# Join an existing cluster
serf join 10.0.0.2:7946

# Query cluster members
serf members
# web-1  10.0.0.1:7946  alive  role=web
# web-2  10.0.0.2:7946  alive  role=web
# db-1   10.0.0.3:7946  alive  role=db

# Send a custom event
serf event deploy v2.1.0

Comparison: membership protocol approaches#

Approach	Messages/round	Failure detection	Scalability
Central heartbeat	O(n)	Seconds	Poor (SPOF)
All-to-all	O(n^2)	Fast	Poor
SWIM	O(n)	Tunable	Excellent
SWIM + gossip	O(n)	Tunable	Excellent

When to use what#

Use Memberlist directly when you're building a custom distributed system in Go and need low-level control over membership events.

Use Serf when you need cluster membership plus event broadcasting without writing the protocol layer yourself.

Use Consul when you need full service discovery, health checking, and KV store — it uses Serf/Memberlist internally.

The practical takeaway#

Membership protocols are the foundation that everything else in a distributed system builds on. Before you can elect a leader, replicate data, or balance load, you need to know who's in the cluster.

SWIM solved the scalability problem by replacing O(n^2) heartbeats with randomized probing. Gossip solved the dissemination problem by piggybacking on existing messages. Together, they give you accurate, efficient membership tracking that scales to thousands of nodes.

Article #449 in the Codelit engineering series. Explore our full library of system design, infrastructure, and architecture guides at codelit.io.

Try it on Codelit

Chaos Mode

Simulate node failures and watch cascading impact across your architecture

Cost Estimator

See estimated AWS monthly costs for every component in your architecture

Build this architecture →

Comments

api design

Batch API Endpoints — Patterns for Bulk Operations, Partial Success, and Idempotency

8 min read

system design

Circuit Breaker Implementation — State Machine, Failure Counting, Fallbacks, and Resilience4j

7 min read

api

API-First Design Methodology — Design Before You Implement

7 min read

Try these templates

Distributed Rate Limiter

API rate limiting with sliding window, token bucket, and per-user quotas.

7 components

Project Management Platform

Jira/Linear-like tool with issues, sprints, boards, workflows, and real-time collaboration.

8 components

Distributed Key-Value Store

Redis/DynamoDB-like distributed KV store with consistent hashing, replication, and tunable consistency.

8 components

Build this architecture

Generate an interactive architecture for Membership Protocols in seconds.

Try it in Codelit →

distributed-systemsinfrastructuresystem-designnetworking

Membership Protocols — SWIM, Gossip, and How Distributed Systems Track Who's Alive

March 29, 2026 8 min readBy Codelit Team Discussion

The problem: who's in the cluster?#

You have 50 nodes in a distributed system. A new node joins. Two nodes crash. One node's network cable gets unplugged for 30 seconds and then reconnects.

Every node needs to answer one question: who is currently alive and reachable? That's the membership problem.

Why membership matters#

Without accurate membership information, distributed systems can't do:

Load balancing — you'll route traffic to dead nodes
Replication — you'll try to replicate data to machines that don't exist
Consensus — you need to know who's in the quorum
Failure recovery — you can't replace a failed node if you don't know it failed

The naive approach: heartbeats to a central server#

The simplest design: every node sends a heartbeat to a central coordinator every few seconds. If the coordinator stops receiving heartbeats, it marks the node as dead.

Problems:

Single point of failure — if the coordinator dies, nobody knows anything
Scalability bottleneck — with 1,000 nodes sending heartbeats every 2 seconds, the coordinator handles 500 messages per second
False positives — network partition between coordinator and a healthy node looks like node failure

All-to-all heartbeats#

Every node heartbeats to every other node. No central coordinator.

Problems:

O(n^2) message overhead — 100 nodes means 9,900 heartbeat messages per interval
Bandwidth explosion — scales terribly past a few dozen nodes

SWIM: the scalable solution#

SWIM (Scalable Weakly-consistent Infection-style Process Group Membership Protocol) was introduced in 2002 and solves both problems elegantly.

How SWIM failure detection works#

Each node runs this protocol every T seconds:

Pick a random target — node A randomly selects node B
Direct ping — A sends a ping to B and waits for an ack
If B responds — B is alive. Done.
If B doesn't respond — A picks k random nodes (called "indirect probers") and asks them to ping B on its behalf
If any indirect probe gets an ack — B is alive (A just couldn't reach B directly)
If no indirect probes succeed — B is declared suspect or failed

This gives you O(n) total message load per protocol period, regardless of cluster size.

Why indirect probing matters#

Direct ping failure doesn't always mean the target is dead. Common causes:

Asymmetric network partition — A can't reach B, but C can
Temporary congestion — B's network buffer is full but it's otherwise healthy
Firewall rules — specific port blocked between A and B only

Indirect probing dramatically reduces false positives.

The suspicion mechanism#

SWIM's original paper declares nodes as either alive or failed. The improved version adds a suspect state:

Alive — node is responding normally
Suspect — node failed a ping round but hasn't been confirmed dead
Dead — node has been suspect for longer than the suspicion timeout

Why suspect matters:

Alive → (missed ping round) → Suspect → (timeout expires) → Dead
                                  ↓
                           (responds to ping) → Alive

Suspicion timeout is typically 3-5x the protocol period. Too short and you get false positives. Too long and genuinely failed nodes stay in the membership list.

Gossip-based dissemination#

SWIM detects failures, but how does every node learn about membership changes? Gossip protocol (also called epidemic protocol).

How gossip works#

When a node learns something new (join, leave, failure), it piggybacks that information on the protocol messages it's already sending:

Node A detects that node B failed
A attaches "B is dead" to its next ping messages
Each node that receives this information attaches it to their pings
The information spreads exponentially through the cluster

Key property: Information reaches all N nodes in O(log N) protocol rounds. In a 1,000-node cluster, every node knows about a failure within about 10 rounds.

Crux: no extra messages#

Gossip information piggybacks on existing SWIM protocol messages. No additional bandwidth cost. This is what makes the combination so efficient.

Join, leave, and fail events#

Join#

New node contacts any existing member (a "seed node")
Seed node adds the new node to its local membership list
Seed gossips "new node joined" to the cluster
All nodes learn about the new member within O(log N) rounds

Graceful leave#

Departing node broadcasts a "leave" message
Other nodes remove it from their membership list
No suspicion period needed — the node explicitly announced departure

Failure (ungraceful)#

SWIM protocol detects the failure through ping/indirect-ping
Node enters suspect state
If suspicion timeout expires without recovery, node is marked dead
Failure event is gossiped to all nodes
Application layer handles reassignment of the dead node's responsibilities

Incarnation numbers#

What happens when a node is falsely suspected but is actually alive? It needs a way to refute the suspicion.

Each node maintains an incarnation number (starting at 0). When a node hears that it's been suspected, it:

Increments its incarnation number
Broadcasts an "alive" message with the new incarnation number
Higher incarnation numbers override lower ones

This prevents stale "suspect" messages from overriding fresh "alive" messages.

Memberlist: HashiCorp's implementation#

Memberlist is HashiCorp's Go library that implements SWIM with several enhancements:

Full push/pull sync — periodically, two nodes exchange their complete membership list to catch any missed gossip
Configurable protocol parameters — tune ping interval, suspicion multiplier, indirect probe count
Encryption — supports symmetric-key encryption for all protocol messages
Delegate interface — hook into join/leave/update events for application logic

// Basic Memberlist configuration
config := memberlist.DefaultLANConfig()
config.Name = "node-1"
config.BindPort = 7946

list, err := memberlist.Create(config)
if err != nil {
    log.Fatal(err)
}

// Join an existing cluster
_, err = list.Join([]string{"10.0.0.1:7946"})

// Get current members
for _, member := range list.Members() {
    fmt.Printf("Member: %s %s\n", member.Name, member.Addr)
}

LAN vs WAN configuration#

Memberlist ships with two presets:

Parameter	LAN	WAN
Probe interval	1 second	5 seconds
Suspicion multiplier	4	6
Retransmit multiplier	4	4
Push/pull interval	30 seconds	60 seconds

WAN configuration accounts for higher latency and more frequent transient failures.

Serf: cluster orchestration on top of Memberlist#

Serf is HashiCorp's tool that builds on Memberlist to provide:

Cluster membership — nodes discover and track each other automatically
Custom events — broadcast application-specific events to all nodes
Queries — send a question to all nodes and collect responses
Tags — attach key-value metadata to nodes (role, datacenter, version)

Serf is what Consul uses under the hood for service discovery and health checking.

# Start a Serf agent
serf agent -node=web-1 -bind=10.0.0.1:7946 -tag role=web

# Join an existing cluster
serf join 10.0.0.2:7946

# Query cluster members
serf members
# web-1  10.0.0.1:7946  alive  role=web
# web-2  10.0.0.2:7946  alive  role=web
# db-1   10.0.0.3:7946  alive  role=db

# Send a custom event
serf event deploy v2.1.0

Comparison: membership protocol approaches#

Approach	Messages/round	Failure detection	Scalability
Central heartbeat	O(n)	Seconds	Poor (SPOF)
All-to-all	O(n^2)	Fast	Poor
SWIM	O(n)	Tunable	Excellent
SWIM + gossip	O(n)	Tunable	Excellent

When to use what#

Use Memberlist directly when you're building a custom distributed system in Go and need low-level control over membership events.

Use Serf when you need cluster membership plus event broadcasting without writing the protocol layer yourself.

Use Consul when you need full service discovery, health checking, and KV store — it uses Serf/Memberlist internally.

The practical takeaway#

Membership protocols are the foundation that everything else in a distributed system builds on. Before you can elect a leader, replicate data, or balance load, you need to know who's in the cluster.

Article #449 in the Codelit engineering series. Explore our full library of system design, infrastructure, and architecture guides at codelit.io.

Try it on Codelit

Chaos Mode

Simulate node failures and watch cascading impact across your architecture

Cost Estimator

See estimated AWS monthly costs for every component in your architecture

Build this architecture →

Comments

api design

Build this architecture

Generate an interactive architecture for Membership Protocols in seconds.

Try it in Codelit →

Membership Protocols — SWIM, Gossip, and How Distributed Systems Track Who's Alive

The problem: who's in the cluster?#

Why membership matters#

The naive approach: heartbeats to a central server#

All-to-all heartbeats#

SWIM: the scalable solution#

How SWIM failure detection works#

Why indirect probing matters#

The suspicion mechanism#

Gossip-based dissemination#

How gossip works#

Crux: no extra messages#

Join, leave, and fail events#

Join#

Graceful leave#

Failure (ungraceful)#

Incarnation numbers#

Memberlist: HashiCorp's implementation#

LAN vs WAN configuration#

Serf: cluster orchestration on top of Memberlist#

Comparison: membership protocol approaches#

When to use what#

The practical takeaway#

Comments

Related articles

Batch API Endpoints — Patterns for Bulk Operations, Partial Success, and Idempotency

Circuit Breaker Implementation — State Machine, Failure Counting, Fallbacks, and Resilience4j

API-First Design Methodology — Design Before You Implement

Try these templates

Distributed Rate Limiter

Project Management Platform

Distributed Key-Value Store

Build this architecture

Membership Protocols — SWIM, Gossip, and How Distributed Systems Track Who's Alive

The problem: who's in the cluster?#

Why membership matters#

The naive approach: heartbeats to a central server#

All-to-all heartbeats#

SWIM: the scalable solution#

How SWIM failure detection works#

Why indirect probing matters#

The suspicion mechanism#

Gossip-based dissemination#

How gossip works#

Crux: no extra messages#

Join, leave, and fail events#

Join#

Graceful leave#

Failure (ungraceful)#

Incarnation numbers#

Memberlist: HashiCorp's implementation#

LAN vs WAN configuration#

Serf: cluster orchestration on top of Memberlist#

Comparison: membership protocol approaches#

When to use what#

The practical takeaway#

Comments

Related articles

Batch API Endpoints — Patterns for Bulk Operations, Partial Success, and Idempotency

Circuit Breaker Implementation — State Machine, Failure Counting, Fallbacks, and Resilience4j

API-First Design Methodology — Design Before You Implement

Try these templates

Distributed Rate Limiter

Project Management Platform

Distributed Key-Value Store

Build this architecture