load balancing algorithmsround robinleast connectionsconsistent hashingnginx load balancingHAProxy vs Nginxsystem designdistributed systems

Load Balancing Algorithms Explained: Round Robin, Least Connections, Consistent Hashing & More

March 28, 2026 8 min readBy Codelit Team Discussion

Every production system eventually outgrows a single server. When that happens, you need a load balancer — a component that distributes incoming traffic across multiple backend servers to improve reliability, throughput, and latency.

This guide covers the algorithms behind load balancing, the difference between L4 and L7 balancing, health checks, session persistence, graceful draining, and hands-on configuration for the most popular tools.

Why Load Balance?#

Availability — if one server crashes, others continue serving traffic.
Scalability — add servers horizontally instead of buying bigger hardware.
Performance — spread work so no single server becomes a bottleneck.
Maintainability — roll out deployments one server at a time with zero downtime.

Load Balancing Algorithms#

1. Round Robin#

The simplest approach: distribute requests to servers in sequential order.

Server A → Server B → Server C → Server A → ...

Good when all servers have identical specs and requests cost roughly the same. This is the default in Nginx.

2. Weighted Round Robin#

Assign a weight to each server proportional to its capacity. A server with weight 3 receives three times the traffic of a server with weight 1.

upstream backend {
    server 10.0.0.1 weight=3;
    server 10.0.0.2 weight=1;
    server 10.0.0.3 weight=1;
}

Use this when your fleet is heterogeneous — mixing instance sizes or generations.

3. Least Connections#

Route each new request to the server with the fewest active connections. This naturally adapts to varying request durations: a server processing a slow query won't receive more traffic until it catches up.

upstream backend {
    least_conn;
    server 10.0.0.1;
    server 10.0.0.2;
    server 10.0.0.3;
}

Least connections is often the best default for APIs with unpredictable response times.

4. IP Hash#

Hash the client's IP address to deterministically assign it to a server. The same client always reaches the same backend, providing a simple form of session affinity without cookies.

upstream backend {
    ip_hash;
    server 10.0.0.1;
    server 10.0.0.2;
    server 10.0.0.3;
}

Downside: if a server is removed, a large portion of clients get redistributed.

5. Consistent Hashing#

An improvement over simple IP hashing. Servers are placed on a virtual ring, and each request is hashed to a point on that ring. When a server is added or removed, only a small fraction of keys are remapped.

upstream backend {
    hash $request_uri consistent;
    server 10.0.0.1;
    server 10.0.0.2;
    server 10.0.0.3;
}

Consistent hashing is essential for caching layers where you want cache locality — the same URL should hit the same cache server.

6. Random (with Two Choices)#

Pick two servers at random, then send the request to whichever has fewer connections. This "power of two choices" approach provides near-optimal distribution with minimal coordination — useful in large distributed systems.

L4 vs L7 Load Balancing#

Aspect	L4 (Transport)	L7 (Application)
Operates on	TCP/UDP packets	HTTP requests
Speed	Faster — no payload inspection	Slightly slower
Routing granularity	IP + port	URL path, headers, cookies
SSL termination	Pass-through or terminate	Typically terminates
Use case	Database connections, gRPC streams	Web APIs, microservices

L4 load balancers (AWS NLB, HAProxy in TCP mode) are ideal for raw throughput and non-HTTP protocols. L7 load balancers (AWS ALB, Nginx, Envoy) let you route /api to one cluster and /static to another.

Health Checks#

A load balancer must detect unhealthy servers and stop sending them traffic.

Passive health checks — the balancer monitors responses from normal traffic. If a server returns repeated 5xx errors or times out, it is marked down.

Active health checks — the balancer periodically sends a probe (HTTP GET, TCP connect, or a custom script) to each server.

backend web_servers
    option httpchk GET /healthz
    http-check expect status 200

    server web1 10.0.0.1:8080 check inter 5s fall 3 rise 2
    server web2 10.0.0.2:8080 check inter 5s fall 3 rise 2

In this HAProxy config, each server is probed every 5 seconds. After 3 consecutive failures (fall 3) it is marked down; after 2 successes (rise 2) it is restored.

Session Persistence (Sticky Sessions)#

Some applications store session state in memory. For those, you need the same client to reach the same server across multiple requests.

Common approaches:

Cookie-based — the load balancer injects a cookie identifying the backend server.
IP hash — deterministic routing by client IP (see above).
Application-level — store sessions in Redis or a database so any server can serve any client (the best long-term solution).

upstream backend {
    server 10.0.0.1;
    server 10.0.0.2;

    sticky cookie srv_id expires=1h path=/;
}

Graceful Draining#

When you remove a server for maintenance or deployment, you don't want to drop in-flight requests. Graceful draining means:

Mark the server as "draining" — no new connections are sent to it.
Existing connections are allowed to complete (up to a timeout).
Once all connections close, the server is fully removed.

In HAProxy, set a server to drain state via the runtime API:

echo "set server web_servers/web1 state drain" | socat stdio /var/run/haproxy.sock

In Kubernetes, this is handled automatically when a pod enters the Terminating state — the endpoints controller removes it from the Service while the pod's preStop hook and terminationGracePeriodSeconds allow in-flight requests to finish.

Tools Compared: HAProxy vs Nginx vs Cloud vs Service Mesh#

Nginx#

The most popular reverse proxy and load balancer for HTTP workloads.

http {
    upstream api {
        least_conn;
        server 10.0.0.1:3000 max_fails=3 fail_timeout=30s;
        server 10.0.0.2:3000 max_fails=3 fail_timeout=30s;
    }

    server {
        listen 80;

        location / {
            proxy_pass http://api;
            proxy_set_header X-Real-IP $remote_addr;
        }
    }
}

Strengths: simple config, huge ecosystem, doubles as a web server and reverse proxy.

HAProxy#

Purpose-built for high-performance load balancing. Supports both L4 and L7.

frontend http_front
    bind *:80
    default_backend web_servers

backend web_servers
    balance leastconn
    option httpchk GET /healthz
    server web1 10.0.0.1:8080 check weight 3
    server web2 10.0.0.2:8080 check weight 1

Strengths: advanced health checks, runtime API, detailed metrics, battle-tested at extreme scale.

HAProxy vs Nginx#

Feature	HAProxy	Nginx
Primary role	Load balancer	Web server + reverse proxy
L4 support	Native	Stream module
Active health checks	Built-in (free)	Nginx Plus (paid) or third-party
Runtime API	Yes	Limited in open-source
Config reload	Hitless	Brief worker drain

For pure load balancing, HAProxy has the edge. For serving static files alongside proxying, Nginx is more convenient.

AWS ALB and NLB#

ALB (Application Load Balancer) — L7, supports path-based routing, WebSockets, gRPC, and native WAF integration.
NLB (Network Load Balancer) — L4, millions of requests per second with ultra-low latency, static IP support.

Use ALB for typical web applications. Use NLB when you need raw TCP/UDP performance or a static IP.

Envoy#

A modern L4/L7 proxy designed for service meshes (Istio, Consul Connect). Supports advanced features like circuit breaking, retries with budgets, outlier detection, and distributed tracing out of the box.

Traefik#

A cloud-native reverse proxy that auto-discovers services from Docker, Kubernetes, and Consul. Automatic TLS via Let's Encrypt. Great for smaller deployments or teams that want minimal configuration.

# traefik dynamic config
http:
  services:
    my-service:
      loadBalancer:
        servers:
          - url: "http://10.0.0.1:8080"
          - url: "http://10.0.0.2:8080"
        healthCheck:
          path: /healthz
          interval: "10s"

Choosing the Right Algorithm#

Scenario	Recommended Algorithm
Homogeneous servers, uniform requests	Round robin
Mixed instance sizes	Weighted round robin
Variable request durations	Least connections
Caching tier	Consistent hashing
Simple session affinity	IP hash
Large-scale distributed system	Random (two choices)

Key Takeaways#

Start with least connections — it adapts to real-world variance better than round robin.
Use consistent hashing for caches — minimize cache misses when scaling.
Always configure health checks — a load balancer without health checks is a traffic router, not a reliability tool.
Plan for graceful draining — deployments should never drop requests.
Choose L4 vs L7 based on routing needs — don't pay the L7 overhead if you only need IP + port routing.

Load balancing is one of the foundational building blocks of scalable architecture. Get it right and your system handles traffic spikes, rolling deployments, and server failures without your users ever noticing.

Explore more system design fundamentals, scaling patterns, and production-ready guides at codelit.io.

129 articles on system design at codelit.io/blog.

Try it on Codelit

GitHub Integration

Paste any repo URL to generate an interactive architecture diagram from real code

Build this architecture →

Comments

AI search

AI-Powered Search Architecture: Semantic Search, Hybrid Search, and RAG

8 min read

AI safety

AI Safety Guardrails Architecture: Input Validation, Output Filtering, and Human-in-the-Loop

8 min read

API design

API Backward Compatibility: Ship Changes Without Breaking Consumers

6 min read

Build this architecture

Generate an interactive architecture for Load Balancing Algorithms Explained in seconds.

Try it in Codelit →

load balancing algorithmsround robinleast connectionsconsistent hashingnginx load balancingHAProxy vs Nginxsystem designdistributed systems

Load Balancing Algorithms Explained: Round Robin, Least Connections, Consistent Hashing & More

March 28, 2026 8 min readBy Codelit Team Discussion

Why Load Balance?#

Availability — if one server crashes, others continue serving traffic.
Scalability — add servers horizontally instead of buying bigger hardware.
Performance — spread work so no single server becomes a bottleneck.
Maintainability — roll out deployments one server at a time with zero downtime.

Load Balancing Algorithms#

1. Round Robin#

The simplest approach: distribute requests to servers in sequential order.

Server A → Server B → Server C → Server A → ...

Good when all servers have identical specs and requests cost roughly the same. This is the default in Nginx.

2. Weighted Round Robin#

Assign a weight to each server proportional to its capacity. A server with weight 3 receives three times the traffic of a server with weight 1.

upstream backend {
    server 10.0.0.1 weight=3;
    server 10.0.0.2 weight=1;
    server 10.0.0.3 weight=1;
}

Use this when your fleet is heterogeneous — mixing instance sizes or generations.

3. Least Connections#

upstream backend {
    least_conn;
    server 10.0.0.1;
    server 10.0.0.2;
    server 10.0.0.3;
}

Least connections is often the best default for APIs with unpredictable response times.

4. IP Hash#

Hash the client's IP address to deterministically assign it to a server. The same client always reaches the same backend, providing a simple form of session affinity without cookies.

upstream backend {
    ip_hash;
    server 10.0.0.1;
    server 10.0.0.2;
    server 10.0.0.3;
}

Downside: if a server is removed, a large portion of clients get redistributed.

5. Consistent Hashing#

upstream backend {
    hash $request_uri consistent;
    server 10.0.0.1;
    server 10.0.0.2;
    server 10.0.0.3;
}

Consistent hashing is essential for caching layers where you want cache locality — the same URL should hit the same cache server.

6. Random (with Two Choices)#

L4 vs L7 Load Balancing#

Aspect	L4 (Transport)	L7 (Application)
Operates on	TCP/UDP packets	HTTP requests
Speed	Faster — no payload inspection	Slightly slower
Routing granularity	IP + port	URL path, headers, cookies
SSL termination	Pass-through or terminate	Typically terminates
Use case	Database connections, gRPC streams	Web APIs, microservices

Health Checks#

A load balancer must detect unhealthy servers and stop sending them traffic.

Passive health checks — the balancer monitors responses from normal traffic. If a server returns repeated 5xx errors or times out, it is marked down.

Active health checks — the balancer periodically sends a probe (HTTP GET, TCP connect, or a custom script) to each server.

backend web_servers
    option httpchk GET /healthz
    http-check expect status 200

    server web1 10.0.0.1:8080 check inter 5s fall 3 rise 2
    server web2 10.0.0.2:8080 check inter 5s fall 3 rise 2

In this HAProxy config, each server is probed every 5 seconds. After 3 consecutive failures (fall 3) it is marked down; after 2 successes (rise 2) it is restored.

Session Persistence (Sticky Sessions)#

Some applications store session state in memory. For those, you need the same client to reach the same server across multiple requests.

Common approaches:

Cookie-based — the load balancer injects a cookie identifying the backend server.
IP hash — deterministic routing by client IP (see above).
Application-level — store sessions in Redis or a database so any server can serve any client (the best long-term solution).

upstream backend {
    server 10.0.0.1;
    server 10.0.0.2;

    sticky cookie srv_id expires=1h path=/;
}

Graceful Draining#

When you remove a server for maintenance or deployment, you don't want to drop in-flight requests. Graceful draining means:

Mark the server as "draining" — no new connections are sent to it.
Existing connections are allowed to complete (up to a timeout).
Once all connections close, the server is fully removed.

In HAProxy, set a server to drain state via the runtime API:

echo "set server web_servers/web1 state drain" | socat stdio /var/run/haproxy.sock

Tools Compared: HAProxy vs Nginx vs Cloud vs Service Mesh#

Nginx#

The most popular reverse proxy and load balancer for HTTP workloads.

http {
    upstream api {
        least_conn;
        server 10.0.0.1:3000 max_fails=3 fail_timeout=30s;
        server 10.0.0.2:3000 max_fails=3 fail_timeout=30s;
    }

    server {
        listen 80;

        location / {
            proxy_pass http://api;
            proxy_set_header X-Real-IP $remote_addr;
        }
    }
}

Strengths: simple config, huge ecosystem, doubles as a web server and reverse proxy.

HAProxy#

Purpose-built for high-performance load balancing. Supports both L4 and L7.

frontend http_front
    bind *:80
    default_backend web_servers

backend web_servers
    balance leastconn
    option httpchk GET /healthz
    server web1 10.0.0.1:8080 check weight 3
    server web2 10.0.0.2:8080 check weight 1

Strengths: advanced health checks, runtime API, detailed metrics, battle-tested at extreme scale.

HAProxy vs Nginx#

Feature	HAProxy	Nginx
Primary role	Load balancer	Web server + reverse proxy
L4 support	Native	Stream module
Active health checks	Built-in (free)	Nginx Plus (paid) or third-party
Runtime API	Yes	Limited in open-source
Config reload	Hitless	Brief worker drain

For pure load balancing, HAProxy has the edge. For serving static files alongside proxying, Nginx is more convenient.

AWS ALB and NLB#

ALB (Application Load Balancer) — L7, supports path-based routing, WebSockets, gRPC, and native WAF integration.
NLB (Network Load Balancer) — L4, millions of requests per second with ultra-low latency, static IP support.

Use ALB for typical web applications. Use NLB when you need raw TCP/UDP performance or a static IP.

Envoy#

Traefik#

A cloud-native reverse proxy that auto-discovers services from Docker, Kubernetes, and Consul. Automatic TLS via Let's Encrypt. Great for smaller deployments or teams that want minimal configuration.

# traefik dynamic config
http:
  services:
    my-service:
      loadBalancer:
        servers:
          - url: "http://10.0.0.1:8080"
          - url: "http://10.0.0.2:8080"
        healthCheck:
          path: /healthz
          interval: "10s"

Choosing the Right Algorithm#

Scenario	Recommended Algorithm
Homogeneous servers, uniform requests	Round robin
Mixed instance sizes	Weighted round robin
Variable request durations	Least connections
Caching tier	Consistent hashing
Simple session affinity	IP hash
Large-scale distributed system	Random (two choices)

Key Takeaways#

Start with least connections — it adapts to real-world variance better than round robin.
Use consistent hashing for caches — minimize cache misses when scaling.
Always configure health checks — a load balancer without health checks is a traffic router, not a reliability tool.
Plan for graceful draining — deployments should never drop requests.
Choose L4 vs L7 based on routing needs — don't pay the L7 overhead if you only need IP + port routing.

Explore more system design fundamentals, scaling patterns, and production-ready guides at codelit.io.

129 articles on system design at codelit.io/blog.

Try it on Codelit

GitHub Integration

Paste any repo URL to generate an interactive architecture diagram from real code

Build this architecture →

Comments

AI search

Build this architecture

Generate an interactive architecture for Load Balancing Algorithms Explained in seconds.

Try it in Codelit →

Load Balancing Algorithms Explained: Round Robin, Least Connections, Consistent Hashing & More

Why Load Balance?#

Load Balancing Algorithms#

1. Round Robin#

2. Weighted Round Robin#

3. Least Connections#

4. IP Hash#

5. Consistent Hashing#

6. Random (with Two Choices)#

L4 vs L7 Load Balancing#

Health Checks#

Session Persistence (Sticky Sessions)#

Graceful Draining#

Tools Compared: HAProxy vs Nginx vs Cloud vs Service Mesh#

Nginx#

HAProxy#

HAProxy vs Nginx#

AWS ALB and NLB#

Envoy#

Traefik#

Choosing the Right Algorithm#

Key Takeaways#

Comments

Related articles

AI-Powered Search Architecture: Semantic Search, Hybrid Search, and RAG

AI Safety Guardrails Architecture: Input Validation, Output Filtering, and Human-in-the-Loop

API Backward Compatibility: Ship Changes Without Breaking Consumers

Build this architecture

Load Balancing Algorithms Explained: Round Robin, Least Connections, Consistent Hashing & More

Why Load Balance?#

Load Balancing Algorithms#

1. Round Robin#

2. Weighted Round Robin#

3. Least Connections#

4. IP Hash#

5. Consistent Hashing#

6. Random (with Two Choices)#

L4 vs L7 Load Balancing#

Health Checks#

Session Persistence (Sticky Sessions)#

Graceful Draining#

Tools Compared: HAProxy vs Nginx vs Cloud vs Service Mesh#

Nginx#

HAProxy#

HAProxy vs Nginx#

AWS ALB and NLB#

Envoy#

Traefik#

Choosing the Right Algorithm#

Key Takeaways#

Comments

Related articles

AI-Powered Search Architecture: Semantic Search, Hybrid Search, and RAG

AI Safety Guardrails Architecture: Input Validation, Output Filtering, and Human-in-the-Loop

API Backward Compatibility: Ship Changes Without Breaking Consumers

Build this architecture