Load Balancing Algorithms Explained: Round Robin, Least Connections, Consistent Hashing & More
Every production system eventually outgrows a single server. When that happens, you need a load balancer — a component that distributes incoming traffic across multiple backend servers to improve reliability, throughput, and latency.
This guide covers the algorithms behind load balancing, the difference between L4 and L7 balancing, health checks, session persistence, graceful draining, and hands-on configuration for the most popular tools.
Why Load Balance?#
- Availability — if one server crashes, others continue serving traffic.
- Scalability — add servers horizontally instead of buying bigger hardware.
- Performance — spread work so no single server becomes a bottleneck.
- Maintainability — roll out deployments one server at a time with zero downtime.
Load Balancing Algorithms#
1. Round Robin#
The simplest approach: distribute requests to servers in sequential order.
Server A → Server B → Server C → Server A → ...
Good when all servers have identical specs and requests cost roughly the same. This is the default in Nginx.
2. Weighted Round Robin#
Assign a weight to each server proportional to its capacity. A server with weight 3 receives three times the traffic of a server with weight 1.
upstream backend {
server 10.0.0.1 weight=3;
server 10.0.0.2 weight=1;
server 10.0.0.3 weight=1;
}
Use this when your fleet is heterogeneous — mixing instance sizes or generations.
3. Least Connections#
Route each new request to the server with the fewest active connections. This naturally adapts to varying request durations: a server processing a slow query won't receive more traffic until it catches up.
upstream backend {
least_conn;
server 10.0.0.1;
server 10.0.0.2;
server 10.0.0.3;
}
Least connections is often the best default for APIs with unpredictable response times.
4. IP Hash#
Hash the client's IP address to deterministically assign it to a server. The same client always reaches the same backend, providing a simple form of session affinity without cookies.
upstream backend {
ip_hash;
server 10.0.0.1;
server 10.0.0.2;
server 10.0.0.3;
}
Downside: if a server is removed, a large portion of clients get redistributed.
5. Consistent Hashing#
An improvement over simple IP hashing. Servers are placed on a virtual ring, and each request is hashed to a point on that ring. When a server is added or removed, only a small fraction of keys are remapped.
upstream backend {
hash $request_uri consistent;
server 10.0.0.1;
server 10.0.0.2;
server 10.0.0.3;
}
Consistent hashing is essential for caching layers where you want cache locality — the same URL should hit the same cache server.
6. Random (with Two Choices)#
Pick two servers at random, then send the request to whichever has fewer connections. This "power of two choices" approach provides near-optimal distribution with minimal coordination — useful in large distributed systems.
L4 vs L7 Load Balancing#
| Aspect | L4 (Transport) | L7 (Application) |
|---|---|---|
| Operates on | TCP/UDP packets | HTTP requests |
| Speed | Faster — no payload inspection | Slightly slower |
| Routing granularity | IP + port | URL path, headers, cookies |
| SSL termination | Pass-through or terminate | Typically terminates |
| Use case | Database connections, gRPC streams | Web APIs, microservices |
L4 load balancers (AWS NLB, HAProxy in TCP mode) are ideal for raw throughput and non-HTTP protocols. L7 load balancers (AWS ALB, Nginx, Envoy) let you route /api to one cluster and /static to another.
Health Checks#
A load balancer must detect unhealthy servers and stop sending them traffic.
Passive health checks — the balancer monitors responses from normal traffic. If a server returns repeated 5xx errors or times out, it is marked down.
Active health checks — the balancer periodically sends a probe (HTTP GET, TCP connect, or a custom script) to each server.
backend web_servers
option httpchk GET /healthz
http-check expect status 200
server web1 10.0.0.1:8080 check inter 5s fall 3 rise 2
server web2 10.0.0.2:8080 check inter 5s fall 3 rise 2
In this HAProxy config, each server is probed every 5 seconds. After 3 consecutive failures (fall 3) it is marked down; after 2 successes (rise 2) it is restored.
Session Persistence (Sticky Sessions)#
Some applications store session state in memory. For those, you need the same client to reach the same server across multiple requests.
Common approaches:
- Cookie-based — the load balancer injects a cookie identifying the backend server.
- IP hash — deterministic routing by client IP (see above).
- Application-level — store sessions in Redis or a database so any server can serve any client (the best long-term solution).
upstream backend {
server 10.0.0.1;
server 10.0.0.2;
sticky cookie srv_id expires=1h path=/;
}
Graceful Draining#
When you remove a server for maintenance or deployment, you don't want to drop in-flight requests. Graceful draining means:
- Mark the server as "draining" — no new connections are sent to it.
- Existing connections are allowed to complete (up to a timeout).
- Once all connections close, the server is fully removed.
In HAProxy, set a server to drain state via the runtime API:
echo "set server web_servers/web1 state drain" | socat stdio /var/run/haproxy.sock
In Kubernetes, this is handled automatically when a pod enters the Terminating state — the endpoints controller removes it from the Service while the pod's preStop hook and terminationGracePeriodSeconds allow in-flight requests to finish.
Tools Compared: HAProxy vs Nginx vs Cloud vs Service Mesh#
Nginx#
The most popular reverse proxy and load balancer for HTTP workloads.
http {
upstream api {
least_conn;
server 10.0.0.1:3000 max_fails=3 fail_timeout=30s;
server 10.0.0.2:3000 max_fails=3 fail_timeout=30s;
}
server {
listen 80;
location / {
proxy_pass http://api;
proxy_set_header X-Real-IP $remote_addr;
}
}
}
Strengths: simple config, huge ecosystem, doubles as a web server and reverse proxy.
HAProxy#
Purpose-built for high-performance load balancing. Supports both L4 and L7.
frontend http_front
bind *:80
default_backend web_servers
backend web_servers
balance leastconn
option httpchk GET /healthz
server web1 10.0.0.1:8080 check weight 3
server web2 10.0.0.2:8080 check weight 1
Strengths: advanced health checks, runtime API, detailed metrics, battle-tested at extreme scale.
HAProxy vs Nginx#
| Feature | HAProxy | Nginx |
|---|---|---|
| Primary role | Load balancer | Web server + reverse proxy |
| L4 support | Native | Stream module |
| Active health checks | Built-in (free) | Nginx Plus (paid) or third-party |
| Runtime API | Yes | Limited in open-source |
| Config reload | Hitless | Brief worker drain |
For pure load balancing, HAProxy has the edge. For serving static files alongside proxying, Nginx is more convenient.
AWS ALB and NLB#
- ALB (Application Load Balancer) — L7, supports path-based routing, WebSockets, gRPC, and native WAF integration.
- NLB (Network Load Balancer) — L4, millions of requests per second with ultra-low latency, static IP support.
Use ALB for typical web applications. Use NLB when you need raw TCP/UDP performance or a static IP.
Envoy#
A modern L4/L7 proxy designed for service meshes (Istio, Consul Connect). Supports advanced features like circuit breaking, retries with budgets, outlier detection, and distributed tracing out of the box.
Traefik#
A cloud-native reverse proxy that auto-discovers services from Docker, Kubernetes, and Consul. Automatic TLS via Let's Encrypt. Great for smaller deployments or teams that want minimal configuration.
# traefik dynamic config
http:
services:
my-service:
loadBalancer:
servers:
- url: "http://10.0.0.1:8080"
- url: "http://10.0.0.2:8080"
healthCheck:
path: /healthz
interval: "10s"
Choosing the Right Algorithm#
| Scenario | Recommended Algorithm |
|---|---|
| Homogeneous servers, uniform requests | Round robin |
| Mixed instance sizes | Weighted round robin |
| Variable request durations | Least connections |
| Caching tier | Consistent hashing |
| Simple session affinity | IP hash |
| Large-scale distributed system | Random (two choices) |
Key Takeaways#
- Start with least connections — it adapts to real-world variance better than round robin.
- Use consistent hashing for caches — minimize cache misses when scaling.
- Always configure health checks — a load balancer without health checks is a traffic router, not a reliability tool.
- Plan for graceful draining — deployments should never drop requests.
- Choose L4 vs L7 based on routing needs — don't pay the L7 overhead if you only need IP + port routing.
Load balancing is one of the foundational building blocks of scalable architecture. Get it right and your system handles traffic spikes, rolling deployments, and server failures without your users ever noticing.
Explore more system design fundamentals, scaling patterns, and production-ready guides at codelit.io.
129 articles on system design at codelit.io/blog.
Try it on Codelit
GitHub Integration
Paste any repo URL to generate an interactive architecture diagram from real code
Related articles
Build this architecture
Generate an interactive architecture for Load Balancing Algorithms Explained in seconds.
Try it in Codelit →
Comments