Load Balancing Concepts

Modern web services cannot handle thousands of simultaneous connections with a single server. Load Balancing is a core technology that distributes traffic across multiple servers to achieve both high availability and high throughput. This chapter systematically covers everything from the fundamental concepts of load balancing to choosing the right algorithm for production use.

What is Load Balancing?

Load balancing is the practice of distributing client requests evenly across multiple backend servers (upstreams) so that no single server becomes overloaded. The device or software responsible for this is called a Load Balancer.

Load Balancing Overview

Without a load balancer, a single server failure brings down the entire service. With a load balancer, if one server dies, the remaining servers take over the traffic and keep the service running.

Benefits of Load Balancing

Benefit	Description
High Availability (HA)	Service continues even when some servers fail
Horizontal Scaling (Scale-out)	Add servers to increase processing capacity
Performance Improvement	Distribute requests to reduce response time
Zero-downtime Deployment	Replace servers one at a time without downtime
Ease of Maintenance	Take down servers one at a time while keeping service alive

L4 vs L7 Load Balancing

Load balancers are classified as L4 or L7 depending on which OSI model layer they operate at.

L4 vs L7 Load Balancing

L4 Load Balancing (Transport Layer)

L4 load balancers distribute traffic using only IP address and port (TCP/UDP) information. They don't inspect packet contents, making them extremely fast.

[Client: 1.2.3.4:54321]
        ↓
[L4 LB: checks destination port 80/443]
        ↓
[App1: 10.0.0.1:8080] or [App2: 10.0.0.2:8080]

Characteristics:

Cannot see HTTP headers, cookies, or URL paths
Operates via NAT (Network Address Translation)
Hardware load balancers (F5, Citrix) are primarily L4-based
Linux kernel IPVS, AWS NLB are representative examples

L7 Load Balancing (Application Layer)

L7 load balancers analyze HTTP headers, URL paths, cookies, methods, and other application data to make routing decisions. More sophisticated rules are possible but at higher processing cost than L4.

[Client Request]
  GET /api/users HTTP/1.1
  Host: example.com
        ↓
[L7 LB: analyzes URL, headers, cookies]
  /api/* → API server group
  /static/* → File server group
  /admin/* → Admin server (IP restricted)

Characteristics:

URL path and Host header-based routing
SSL termination
Cookie-based Sticky Session implementation
Nginx, HAProxy, AWS ALB are representative examples

L4 vs L7 Comparison

Feature	L4	L7
Operating Layer	Transport (TCP/UDP)	Application (HTTP/HTTPS)
Routing Criteria	IP, Port	URL, Headers, Cookies
Processing Speed	Very fast	Relatively slower
Content Inspection	Not possible	Possible
SSL Offloading	Not possible	Possible
Representative Products	AWS NLB, IPVS	Nginx, HAProxy, AWS ALB

Hardware vs Software Load Balancers

Hardware Load Balancers

Load balancers built into dedicated hardware appliances. F5 BIG-IP and Citrix ADC are representative, capable of handling hundreds of thousands to millions of TPS.

Advantages:

Extreme high performance (using ASIC chips)
High stability and reliability
Centrally managed by network teams

Disadvantages:

Tens of millions to hundreds of millions in acquisition cost
Slow configuration changes, less flexibility
Poor fit for cloud environments

Software Load Balancers

Load balancers implemented in software on general-purpose servers. Nginx, HAProxy, and Envoy are representative.

Advantages:

Low cost (most open source is free)
Configuration managed as code (GitOps-friendly)
Excellent fit for cloud/container environments
Easy feature additions and upgrades

Disadvantages:

Lower performance than hardware (though sufficient on modern servers)
Requires operational expertise

Cloud-managed Load Balancers:

Cloud	Product	Layer
AWS	ALB (Application LB)	L7
AWS	NLB (Network LB)	L4
GCP	Cloud Load Balancing	L4/L7
Azure	Application Gateway	L7

Load Balancing Algorithms

The method by which a load balancer decides which server to send a request to. Choosing the right algorithm for your service characteristics is critical.

Load Balancing Algorithms

Round Robin (Default)

The simplest method — assigns requests to servers in sequence.

Request 1 → App1
Request 2 → App2
Request 3 → App3
Request 4 → App1  ← back to the start

Best for: All servers have identical specs and similar request processing times.

Drawback: Long-running requests can pile up on certain servers creating imbalance.

Weighted Round Robin

Assigns a weight to each server so higher-performance servers receive more requests.

App1 (weight=3): Requests 1, 2, 3
App2 (weight=1): Request 4
App1 (weight=3): Requests 5, 6, 7
App2 (weight=1): Request 8

Nginx example:

upstream backend {
    server app1.example.com weight=3;
    server app2.example.com weight=1;
}

Best for: When server specs differ (set higher weight for higher-spec servers).

Least Connections

Sends requests to the server with the fewest active connections. Effective when request processing times vary significantly.

App1: 100 connections
App2: 30 connections  ← new request goes to App2
App3: 75 connections

Nginx example:

upstream backend {
    least_conn;
    server app1.example.com;
    server app2.example.com;
    server app3.example.com;
}

Best for: Variable request processing times (e.g., API servers, services with DB queries).

IP Hash

Hashes the client IP address to always route to the same server. Enables routing specific users to the same server without session sharing.

1.2.3.4 → hash → App2  (always App2 thereafter)
5.6.7.8 → hash → App1  (always App1 thereafter)

Nginx example:

upstream backend {
    ip_hash;
    server app1.example.com;
    server app2.example.com;
}

Best for: Implementing Sticky Session as a workaround when session sharing infrastructure (Redis, etc.) is unavailable.

Drawback: Adding/removing servers recalculates hashes, causing existing users to be routed to different servers → session loss.

Algorithm Selection Guide

Situation	Recommended Algorithm
Identical specs, short requests	Round Robin
Different server specs	Weighted Round Robin
Variable processing times	Least Connections
No session sharing environment	IP Hash
Maximum performance (Nginx Plus)	Least Time

Real-world Load Balancing Architecture Patterns

Pattern 1: Simple Active-Active

Internet
  │
[Load Balancer]
  ├── App Server 1
  ├── App Server 2
  └── App Server 3
        │
   [Database]

Most common pattern. Sessions shared via external storage like Redis.

Pattern 2: Two-tier Load Balancing

Internet
  │
[L4 LB: VIP]
  ├── [L7 LB (Nginx) 1]
  └── [L7 LB (Nginx) 2]
          │
    [App Server Group]

The load balancer itself is also redundant. Used for large-scale traffic.

Pattern 3: Per-service Separation

Internet
  │
[L7 LB]
  ├── /api/*     → API Server Group
  ├── /static/*  → CDN or File Server
  └── /admin/*   → Admin Server (IP restricted)

Routes to different server groups based on URL path.

The Importance of Health Checks

Load balancers must periodically check backend server health to automatically exclude failed servers.

Passive Health Check

Monitors responses to actual requests. If errors exceed a threshold, the server is excluded.

upstream backend {
    server app1.example.com max_fails=3 fail_timeout=30s;
    server app2.example.com max_fails=3 fail_timeout=30s;
}

Active Health Check

Periodically sends dedicated health check requests to proactively verify server status. Detects failures faster. (Supported by Nginx Plus, HAProxy)

Summary

Load balancing is a core technology in modern web infrastructure. The learning order for this chapter:

Nginx Load Balancing— the most widely used open source approach
Apache mod_proxy_balancer— load balancing in Apache environments
mod_jk Load Balancing— AJP protocol-based Tomcat clustering
Health Checks— automatic detection and removal of failed nodes

The next page covers Nginx upstream configuration to set up actual load balancing.

What is Load Balancing?​

Benefits of Load Balancing​

L4 vs L7 Load Balancing​

L4 Load Balancing (Transport Layer)​

L7 Load Balancing (Application Layer)​

L4 vs L7 Comparison​

Hardware vs Software Load Balancers​

Hardware Load Balancers​

Software Load Balancers​

Load Balancing Algorithms​

Round Robin (Default)​

Weighted Round Robin​

Least Connections​

IP Hash​

Algorithm Selection Guide​

Real-world Load Balancing Architecture Patterns​

Pattern 1: Simple Active-Active​

Pattern 2: Two-tier Load Balancing​

Pattern 3: Per-service Separation​

The Importance of Health Checks​

Passive Health Check​

Active Health Check​

Summary​

What is Load Balancing?

Benefits of Load Balancing

L4 vs L7 Load Balancing

L4 Load Balancing (Transport Layer)

L7 Load Balancing (Application Layer)

L4 vs L7 Comparison

Hardware vs Software Load Balancers

Hardware Load Balancers

Software Load Balancers

Load Balancing Algorithms

Round Robin (Default)

Weighted Round Robin

Least Connections

IP Hash

Algorithm Selection Guide

Real-world Load Balancing Architecture Patterns

Pattern 1: Simple Active-Active

Pattern 2: Two-tier Load Balancing

Pattern 3: Per-service Separation

The Importance of Health Checks

Passive Health Check

Active Health Check

Summary