Skip to main content

Load Balancing Concepts

Modern web services cannot handle thousands of simultaneous connections with a single server. Load Balancing is a core technology that distributes traffic across multiple servers to achieve both high availability and high throughput. This chapter systematically covers everything from the fundamental concepts of load balancing to choosing the right algorithm for production use.


What is Load Balancing?​

Load balancing is the practice of distributing client requests evenly across multiple backend servers (upstreams) so that no single server becomes overloaded. The device or software responsible for this is called a Load Balancer.

Load Balancing Overview

Without a load balancer, a single server failure brings down the entire service. With a load balancer, if one server dies, the remaining servers take over the traffic and keep the service running.

Benefits of Load Balancing​

BenefitDescription
High Availability (HA)Service continues even when some servers fail
Horizontal Scaling (Scale-out)Add servers to increase processing capacity
Performance ImprovementDistribute requests to reduce response time
Zero-downtime DeploymentReplace servers one at a time without downtime
Ease of MaintenanceTake down servers one at a time while keeping service alive

L4 vs L7 Load Balancing​

Load balancers are classified as L4 or L7 depending on which OSI model layer they operate at.

L4 vs L7 Load Balancing

L4 Load Balancing (Transport Layer)​

L4 load balancers distribute traffic using only IP address and port (TCP/UDP) information. They don't inspect packet contents, making them extremely fast.

[Client: 1.2.3.4:54321]
↓
[L4 LB: checks destination port 80/443]
↓
[App1: 10.0.0.1:8080] or [App2: 10.0.0.2:8080]

Characteristics:

  • Cannot see HTTP headers, cookies, or URL paths
  • Operates via NAT (Network Address Translation)
  • Hardware load balancers (F5, Citrix) are primarily L4-based
  • Linux kernel IPVS, AWS NLB are representative examples

L7 Load Balancing (Application Layer)​

L7 load balancers analyze HTTP headers, URL paths, cookies, methods, and other application data to make routing decisions. More sophisticated rules are possible but at higher processing cost than L4.

[Client Request]
GET /api/users HTTP/1.1
Host: example.com
↓
[L7 LB: analyzes URL, headers, cookies]
/api/* β†’ API server group
/static/* β†’ File server group
/admin/* β†’ Admin server (IP restricted)

Characteristics:

  • URL path and Host header-based routing
  • SSL termination
  • Cookie-based Sticky Session implementation
  • Nginx, HAProxy, AWS ALB are representative examples

L4 vs L7 Comparison​

FeatureL4L7
Operating LayerTransport (TCP/UDP)Application (HTTP/HTTPS)
Routing CriteriaIP, PortURL, Headers, Cookies
Processing SpeedVery fastRelatively slower
Content InspectionNot possiblePossible
SSL OffloadingNot possiblePossible
Representative ProductsAWS NLB, IPVSNginx, HAProxy, AWS ALB

Hardware vs Software Load Balancers​

Hardware Load Balancers​

Load balancers built into dedicated hardware appliances. F5 BIG-IP and Citrix ADC are representative, capable of handling hundreds of thousands to millions of TPS.

Advantages:

  • Extreme high performance (using ASIC chips)
  • High stability and reliability
  • Centrally managed by network teams

Disadvantages:

  • Tens of millions to hundreds of millions in acquisition cost
  • Slow configuration changes, less flexibility
  • Poor fit for cloud environments

Software Load Balancers​

Load balancers implemented in software on general-purpose servers. Nginx, HAProxy, and Envoy are representative.

Advantages:

  • Low cost (most open source is free)
  • Configuration managed as code (GitOps-friendly)
  • Excellent fit for cloud/container environments
  • Easy feature additions and upgrades

Disadvantages:

  • Lower performance than hardware (though sufficient on modern servers)
  • Requires operational expertise

Cloud-managed Load Balancers:

CloudProductLayer
AWSALB (Application LB)L7
AWSNLB (Network LB)L4
GCPCloud Load BalancingL4/L7
AzureApplication GatewayL7

Load Balancing Algorithms​

The method by which a load balancer decides which server to send a request to. Choosing the right algorithm for your service characteristics is critical.

Load Balancing Algorithms

Round Robin (Default)​

The simplest method β€” assigns requests to servers in sequence.

Request 1 β†’ App1
Request 2 β†’ App2
Request 3 β†’ App3
Request 4 β†’ App1 ← back to the start

Best for: All servers have identical specs and similar request processing times.

Drawback: Long-running requests can pile up on certain servers creating imbalance.

Weighted Round Robin​

Assigns a weight to each server so higher-performance servers receive more requests.

App1 (weight=3): Requests 1, 2, 3
App2 (weight=1): Request 4
App1 (weight=3): Requests 5, 6, 7
App2 (weight=1): Request 8

Nginx example:

upstream backend {
server app1.example.com weight=3;
server app2.example.com weight=1;
}

Best for: When server specs differ (set higher weight for higher-spec servers).

Least Connections​

Sends requests to the server with the fewest active connections. Effective when request processing times vary significantly.

App1: 100 connections
App2: 30 connections ← new request goes to App2
App3: 75 connections

Nginx example:

upstream backend {
least_conn;
server app1.example.com;
server app2.example.com;
server app3.example.com;
}

Best for: Variable request processing times (e.g., API servers, services with DB queries).

IP Hash​

Hashes the client IP address to always route to the same server. Enables routing specific users to the same server without session sharing.

1.2.3.4 β†’ hash β†’ App2  (always App2 thereafter)
5.6.7.8 β†’ hash β†’ App1 (always App1 thereafter)

Nginx example:

upstream backend {
ip_hash;
server app1.example.com;
server app2.example.com;
}

Best for: Implementing Sticky Session as a workaround when session sharing infrastructure (Redis, etc.) is unavailable.

Drawback: Adding/removing servers recalculates hashes, causing existing users to be routed to different servers β†’ session loss.

Algorithm Selection Guide​

SituationRecommended Algorithm
Identical specs, short requestsRound Robin
Different server specsWeighted Round Robin
Variable processing timesLeast Connections
No session sharing environmentIP Hash
Maximum performance (Nginx Plus)Least Time

Real-world Load Balancing Architecture Patterns​

Pattern 1: Simple Active-Active​

Internet
β”‚
[Load Balancer]
β”œβ”€β”€ App Server 1
β”œβ”€β”€ App Server 2
└── App Server 3
β”‚
[Database]

Most common pattern. Sessions shared via external storage like Redis.

Pattern 2: Two-tier Load Balancing​

Internet
β”‚
[L4 LB: VIP]
β”œβ”€β”€ [L7 LB (Nginx) 1]
└── [L7 LB (Nginx) 2]
β”‚
[App Server Group]

The load balancer itself is also redundant. Used for large-scale traffic.

Pattern 3: Per-service Separation​

Internet
β”‚
[L7 LB]
β”œβ”€β”€ /api/* β†’ API Server Group
β”œβ”€β”€ /static/* β†’ CDN or File Server
└── /admin/* β†’ Admin Server (IP restricted)

Routes to different server groups based on URL path.


The Importance of Health Checks​

Load balancers must periodically check backend server health to automatically exclude failed servers.

Passive Health Check​

Monitors responses to actual requests. If errors exceed a threshold, the server is excluded.

upstream backend {
server app1.example.com max_fails=3 fail_timeout=30s;
server app2.example.com max_fails=3 fail_timeout=30s;
}

Active Health Check​

Periodically sends dedicated health check requests to proactively verify server status. Detects failures faster. (Supported by Nginx Plus, HAProxy)


Summary​

Load balancing is a core technology in modern web infrastructure. The learning order for this chapter:

  1. Nginx Load Balancing β€” the most widely used open source approach
  2. Apache mod_proxy_balancer β€” load balancing in Apache environments
  3. mod_jk Load Balancing β€” AJP protocol-based Tomcat clustering
  4. Health Checks β€” automatic detection and removal of failed nodes

The next page covers Nginx upstream configuration to set up actual load balancing.