load-balancingsystem-designscaling

Load Balancing Strategies — Round Robin, Consistent Hashing, and Beyond

March 23, 2026 4 min readBy Mo Discussion

Every scalable system needs a load balancer#

The moment you have more than one server, you need something to decide which server handles each request. That's a load balancer.

Simple in concept. Surprisingly tricky in practice.

The strategies#

Round Robin#

The simplest approach: rotate through servers in order. Request 1 goes to Server A, request 2 to Server B, request 3 to Server C, then back to A.

When it works: All servers are identical and all requests take roughly the same time.

When it fails: If Server B is slower than the others (maybe it's processing a heavy query), round robin doesn't know. It keeps sending traffic to B at the same rate, and B falls further behind.

Weighted Round Robin#

Same as round robin, but some servers get more traffic. A powerful server might get weight 3 (3x the traffic) while a smaller one gets weight 1.

Use when: Servers have different capacities (common during rolling upgrades or mixed hardware).

Least Connections#

Send each request to the server with the fewest active connections. Naturally adapts to slow servers — they accumulate connections and get fewer new ones.

When it works: Requests have variable processing times. Long-running API calls, WebSocket connections, file uploads.

When it fails: Doesn't account for server capacity. A tiny server with 2 connections isn't necessarily less busy than a large server with 10.

Least Response Time#

Send to the server with the fastest recent response time. Combines connection count with actual performance measurement.

Best for: When you want the best possible latency and servers have varying performance.

IP Hash#

Hash the client's IP address to consistently route them to the same server. Provides session affinity without cookies.

When it works: Applications with server-side sessions that haven't moved to a shared session store yet.

When it fails: When clients share IPs (corporate NATs, mobile carriers). One IP can represent thousands of users, creating hotspots.

Consistent Hashing#

Distribute requests based on a hash of the request key (user ID, session ID, or URL). When servers are added or removed, only a fraction of keys are redistributed.

When it works: Caching layers (Redis, Memcached), CDNs, and any system where you want the same key to hit the same server for cache locality.

Why it matters: Regular hashing redistributes everything when the server count changes. Consistent hashing only moves keys belonging to the affected segment — critical for cache hit rates.

Layer 4 vs Layer 7#

Layer 4 (Transport): Routes based on IP and port. Fast, low overhead, but can't inspect the request content.

Layer 7 (Application): Routes based on HTTP headers, URL path, cookies. Can do content-based routing, SSL termination, and header manipulation.

	Layer 4	Layer 7
Speed	Faster	Slightly slower
Routing	IP/port only	URL, headers, cookies
SSL	Pass-through	Can terminate
Use case	TCP/UDP, high throughput	HTTP APIs, microservices

Most web applications use Layer 7. Use Layer 4 for non-HTTP protocols or when you need maximum throughput.

Health checks#

A load balancer is only as good as its health checks. If it routes traffic to a dead server, users get errors.

Active health checks: The load balancer pings each server periodically. If a server fails N checks, it's removed from the pool.

Passive health checks: The load balancer monitors real traffic. If a server returns too many 5xx errors, it's temporarily removed.

Best practice: Use both. Active checks catch servers that are completely down. Passive checks catch servers that are up but malfunctioning.

Common mistakes#

Single load balancer. Your load balancer is now a single point of failure. Use redundant load balancers with failover.

No graceful drain. When removing a server for maintenance, stop sending new connections but let existing ones finish. Don't just kill it.

Health check too aggressive. Checking every 100ms and removing after 1 failure? A brief network hiccup will remove healthy servers. Check every 5-10 seconds, remove after 3 consecutive failures.

Ignoring connection limits. Each server can only handle so many concurrent connections. If your load balancer doesn't enforce limits, a traffic spike can overwhelm individual servers.

See it in context#

Load balancers don't exist in isolation — they connect to API servers, caches, databases, and CDNs. On Codelit, generate any architecture and you'll see where load balancing fits in the data flow.

Explore load balancing in real architectures: describe your system on Codelit.io and see how traffic flows through every component.

{ }

Explore the Netflix architecture interactively

Try it →

Try it on Codelit

Chaos Mode

Simulate node failures and watch cascading impact across your architecture

Build this architecture →

Comments

api design

Batch API Endpoints — Patterns for Bulk Operations, Partial Success, and Idempotency

8 min read

system design

Circuit Breaker Implementation — State Machine, Failure Counting, Fallbacks, and Resilience4j

7 min read

api

API-First Design Methodology — Design Before You Implement

7 min read

Build this architecture

Generate an interactive architecture for Load Balancing Strategies in seconds.

Try it in Codelit →

load-balancingsystem-designscaling

Load Balancing Strategies — Round Robin, Consistent Hashing, and Beyond

March 23, 2026 4 min readBy Mo Discussion

Every scalable system needs a load balancer#

The moment you have more than one server, you need something to decide which server handles each request. That's a load balancer.

Simple in concept. Surprisingly tricky in practice.

The strategies#

Round Robin#

The simplest approach: rotate through servers in order. Request 1 goes to Server A, request 2 to Server B, request 3 to Server C, then back to A.

When it works: All servers are identical and all requests take roughly the same time.

When it fails: If Server B is slower than the others (maybe it's processing a heavy query), round robin doesn't know. It keeps sending traffic to B at the same rate, and B falls further behind.

Weighted Round Robin#

Same as round robin, but some servers get more traffic. A powerful server might get weight 3 (3x the traffic) while a smaller one gets weight 1.

Use when: Servers have different capacities (common during rolling upgrades or mixed hardware).

Least Connections#

Send each request to the server with the fewest active connections. Naturally adapts to slow servers — they accumulate connections and get fewer new ones.

When it works: Requests have variable processing times. Long-running API calls, WebSocket connections, file uploads.

When it fails: Doesn't account for server capacity. A tiny server with 2 connections isn't necessarily less busy than a large server with 10.

Least Response Time#

Send to the server with the fastest recent response time. Combines connection count with actual performance measurement.

Best for: When you want the best possible latency and servers have varying performance.

IP Hash#

Hash the client's IP address to consistently route them to the same server. Provides session affinity without cookies.

When it works: Applications with server-side sessions that haven't moved to a shared session store yet.

When it fails: When clients share IPs (corporate NATs, mobile carriers). One IP can represent thousands of users, creating hotspots.

Consistent Hashing#

Distribute requests based on a hash of the request key (user ID, session ID, or URL). When servers are added or removed, only a fraction of keys are redistributed.

When it works: Caching layers (Redis, Memcached), CDNs, and any system where you want the same key to hit the same server for cache locality.

Why it matters: Regular hashing redistributes everything when the server count changes. Consistent hashing only moves keys belonging to the affected segment — critical for cache hit rates.

Layer 4 vs Layer 7#

Layer 4 (Transport): Routes based on IP and port. Fast, low overhead, but can't inspect the request content.

Layer 7 (Application): Routes based on HTTP headers, URL path, cookies. Can do content-based routing, SSL termination, and header manipulation.

	Layer 4	Layer 7
Speed	Faster	Slightly slower
Routing	IP/port only	URL, headers, cookies
SSL	Pass-through	Can terminate
Use case	TCP/UDP, high throughput	HTTP APIs, microservices

Most web applications use Layer 7. Use Layer 4 for non-HTTP protocols or when you need maximum throughput.

Health checks#

A load balancer is only as good as its health checks. If it routes traffic to a dead server, users get errors.

Active health checks: The load balancer pings each server periodically. If a server fails N checks, it's removed from the pool.

Passive health checks: The load balancer monitors real traffic. If a server returns too many 5xx errors, it's temporarily removed.

Best practice: Use both. Active checks catch servers that are completely down. Passive checks catch servers that are up but malfunctioning.

Common mistakes#

Single load balancer. Your load balancer is now a single point of failure. Use redundant load balancers with failover.

No graceful drain. When removing a server for maintenance, stop sending new connections but let existing ones finish. Don't just kill it.

Health check too aggressive. Checking every 100ms and removing after 1 failure? A brief network hiccup will remove healthy servers. Check every 5-10 seconds, remove after 3 consecutive failures.

Ignoring connection limits. Each server can only handle so many concurrent connections. If your load balancer doesn't enforce limits, a traffic spike can overwhelm individual servers.

See it in context#

Load balancers don't exist in isolation — they connect to API servers, caches, databases, and CDNs. On Codelit, generate any architecture and you'll see where load balancing fits in the data flow.

Explore load balancing in real architectures: describe your system on Codelit.io and see how traffic flows through every component.

{ }

Explore the Netflix architecture interactively

Try it →

Try it on Codelit

Chaos Mode

Simulate node failures and watch cascading impact across your architecture

Build this architecture →

Comments

api design

Build this architecture

Generate an interactive architecture for Load Balancing Strategies in seconds.

Try it in Codelit →

Load Balancing Strategies — Round Robin, Consistent Hashing, and Beyond

Every scalable system needs a load balancer#

The strategies#

Round Robin#

Weighted Round Robin#

Least Connections#

Least Response Time#

IP Hash#

Consistent Hashing#

Layer 4 vs Layer 7#

Health checks#

Common mistakes#

See it in context#

Comments

Related articles

Batch API Endpoints — Patterns for Bulk Operations, Partial Success, and Idempotency

Circuit Breaker Implementation — State Machine, Failure Counting, Fallbacks, and Resilience4j

API-First Design Methodology — Design Before You Implement

Build this architecture

Load Balancing Strategies — Round Robin, Consistent Hashing, and Beyond

Every scalable system needs a load balancer#

The strategies#

Round Robin#

Weighted Round Robin#

Least Connections#

Least Response Time#

IP Hash#

Consistent Hashing#

Layer 4 vs Layer 7#

Health checks#

Common mistakes#

See it in context#

Comments

Related articles

Batch API Endpoints — Patterns for Bulk Operations, Partial Success, and Idempotency

Circuit Breaker Implementation — State Machine, Failure Counting, Fallbacks, and Resilience4j

API-First Design Methodology — Design Before You Implement

Build this architecture