Load Balancing Concepts
Modern web services cannot handle thousands of simultaneous connections with a single server. Load Balancing is a core technology that distributes traffic across multiple servers to achieve both high availability and high throughput. This chapter systematically covers everything from the fundamental concepts of load balancing to choosing the right algorithm for production use.
What is Load Balancing?β
Load balancing is the practice of distributing client requests evenly across multiple backend servers (upstreams) so that no single server becomes overloaded. The device or software responsible for this is called a Load Balancer.
Without a load balancer, a single server failure brings down the entire service. With a load balancer, if one server dies, the remaining servers take over the traffic and keep the service running.
Benefits of Load Balancingβ
| Benefit | Description |
|---|---|
| High Availability (HA) | Service continues even when some servers fail |
| Horizontal Scaling (Scale-out) | Add servers to increase processing capacity |
| Performance Improvement | Distribute requests to reduce response time |
| Zero-downtime Deployment | Replace servers one at a time without downtime |
| Ease of Maintenance | Take down servers one at a time while keeping service alive |
L4 vs L7 Load Balancingβ
Load balancers are classified as L4 or L7 depending on which OSI model layer they operate at.
L4 Load Balancing (Transport Layer)β
L4 load balancers distribute traffic using only IP address and port (TCP/UDP) information. They don't inspect packet contents, making them extremely fast.
[Client: 1.2.3.4:54321]
β
[L4 LB: checks destination port 80/443]
β
[App1: 10.0.0.1:8080] or [App2: 10.0.0.2:8080]
Characteristics:
- Cannot see HTTP headers, cookies, or URL paths
- Operates via NAT (Network Address Translation)
- Hardware load balancers (F5, Citrix) are primarily L4-based
- Linux kernel IPVS, AWS NLB are representative examples
L7 Load Balancing (Application Layer)β
L7 load balancers analyze HTTP headers, URL paths, cookies, methods, and other application data to make routing decisions. More sophisticated rules are possible but at higher processing cost than L4.
[Client Request]
GET /api/users HTTP/1.1
Host: example.com
β
[L7 LB: analyzes URL, headers, cookies]
/api/* β API server group
/static/* β File server group
/admin/* β Admin server (IP restricted)
Characteristics:
- URL path and Host header-based routing
- SSL termination
- Cookie-based Sticky Session implementation
- Nginx, HAProxy, AWS ALB are representative examples
L4 vs L7 Comparisonβ
| Feature | L4 | L7 |
|---|---|---|
| Operating Layer | Transport (TCP/UDP) | Application (HTTP/HTTPS) |
| Routing Criteria | IP, Port | URL, Headers, Cookies |
| Processing Speed | Very fast | Relatively slower |
| Content Inspection | Not possible | Possible |
| SSL Offloading | Not possible | Possible |
| Representative Products | AWS NLB, IPVS | Nginx, HAProxy, AWS ALB |
Hardware vs Software Load Balancersβ
Hardware Load Balancersβ
Load balancers built into dedicated hardware appliances. F5 BIG-IP and Citrix ADC are representative, capable of handling hundreds of thousands to millions of TPS.
Advantages:
- Extreme high performance (using ASIC chips)
- High stability and reliability
- Centrally managed by network teams
Disadvantages:
- Tens of millions to hundreds of millions in acquisition cost
- Slow configuration changes, less flexibility
- Poor fit for cloud environments
Software Load Balancersβ
Load balancers implemented in software on general-purpose servers. Nginx, HAProxy, and Envoy are representative.
Advantages:
- Low cost (most open source is free)
- Configuration managed as code (GitOps-friendly)
- Excellent fit for cloud/container environments
- Easy feature additions and upgrades
Disadvantages:
- Lower performance than hardware (though sufficient on modern servers)
- Requires operational expertise
Cloud-managed Load Balancers:
| Cloud | Product | Layer |
|---|---|---|
| AWS | ALB (Application LB) | L7 |
| AWS | NLB (Network LB) | L4 |
| GCP | Cloud Load Balancing | L4/L7 |
| Azure | Application Gateway | L7 |
Load Balancing Algorithmsβ
The method by which a load balancer decides which server to send a request to. Choosing the right algorithm for your service characteristics is critical.
Round Robin (Default)β
The simplest method β assigns requests to servers in sequence.
Request 1 β App1
Request 2 β App2
Request 3 β App3
Request 4 β App1 β back to the start
Best for: All servers have identical specs and similar request processing times.
Drawback: Long-running requests can pile up on certain servers creating imbalance.
Weighted Round Robinβ
Assigns a weight to each server so higher-performance servers receive more requests.
App1 (weight=3): Requests 1, 2, 3
App2 (weight=1): Request 4
App1 (weight=3): Requests 5, 6, 7
App2 (weight=1): Request 8
Nginx example:
upstream backend {
server app1.example.com weight=3;
server app2.example.com weight=1;
}
Best for: When server specs differ (set higher weight for higher-spec servers).
Least Connectionsβ
Sends requests to the server with the fewest active connections. Effective when request processing times vary significantly.
App1: 100 connections
App2: 30 connections β new request goes to App2
App3: 75 connections
Nginx example:
upstream backend {
least_conn;
server app1.example.com;
server app2.example.com;
server app3.example.com;
}
Best for: Variable request processing times (e.g., API servers, services with DB queries).
IP Hashβ
Hashes the client IP address to always route to the same server. Enables routing specific users to the same server without session sharing.
1.2.3.4 β hash β App2 (always App2 thereafter)
5.6.7.8 β hash β App1 (always App1 thereafter)
Nginx example:
upstream backend {
ip_hash;
server app1.example.com;
server app2.example.com;
}
Best for: Implementing Sticky Session as a workaround when session sharing infrastructure (Redis, etc.) is unavailable.
Drawback: Adding/removing servers recalculates hashes, causing existing users to be routed to different servers β session loss.
Algorithm Selection Guideβ
| Situation | Recommended Algorithm |
|---|---|
| Identical specs, short requests | Round Robin |
| Different server specs | Weighted Round Robin |
| Variable processing times | Least Connections |
| No session sharing environment | IP Hash |
| Maximum performance (Nginx Plus) | Least Time |
Real-world Load Balancing Architecture Patternsβ
Pattern 1: Simple Active-Activeβ
Internet
β
[Load Balancer]
βββ App Server 1
βββ App Server 2
βββ App Server 3
β
[Database]
Most common pattern. Sessions shared via external storage like Redis.
Pattern 2: Two-tier Load Balancingβ
Internet
β
[L4 LB: VIP]
βββ [L7 LB (Nginx) 1]
βββ [L7 LB (Nginx) 2]
β
[App Server Group]
The load balancer itself is also redundant. Used for large-scale traffic.
Pattern 3: Per-service Separationβ
Internet
β
[L7 LB]
βββ /api/* β API Server Group
βββ /static/* β CDN or File Server
βββ /admin/* β Admin Server (IP restricted)
Routes to different server groups based on URL path.
The Importance of Health Checksβ
Load balancers must periodically check backend server health to automatically exclude failed servers.
Passive Health Checkβ
Monitors responses to actual requests. If errors exceed a threshold, the server is excluded.
upstream backend {
server app1.example.com max_fails=3 fail_timeout=30s;
server app2.example.com max_fails=3 fail_timeout=30s;
}
Active Health Checkβ
Periodically sends dedicated health check requests to proactively verify server status. Detects failures faster. (Supported by Nginx Plus, HAProxy)
Summaryβ
Load balancing is a core technology in modern web infrastructure. The learning order for this chapter:
- Nginx Load Balancing β the most widely used open source approach
- Apache mod_proxy_balancer β load balancing in Apache environments
- mod_jk Load Balancing β AJP protocol-based Tomcat clustering
- Health Checks β automatic detection and removal of failed nodes
The next page covers Nginx upstream configuration to set up actual load balancing.