Skip to main content

Keepalived + VRRP — VIP Failover and Nginx HA Redundancy

Keepalived is a lightweight tool for implementing high availability in Linux environments. Based on the VRRP (Virtual Router Redundancy Protocol), multiple servers share a single Virtual IP (VIP), and when the Master server fails, the Backup server automatically takes over the VIP. Combined with Nginx, it completely eliminates the SPOF at the web server layer.

Keepalived and VRRP Protocol

VRRP (Virtual Router Redundancy Protocol) was originally designed for router redundancy as a network protocol (RFC 3768). Keepalived applies this protocol to server redundancy.

How It Works

┌─────────────────────────────────────────────────────────────────┐
│ VRRP Group (VRID: 51) │
│ │
│ ┌──────────────────────┐ ┌──────────────────────┐ │
│ │ MASTER Server │ │ BACKUP Server │ │
│ │ 192.168.1.10 │ │ 192.168.1.11 │ │
│ │ Priority: 100 │ │ Priority: 90 │ │
│ │ (Holds VIP) │ │ (On standby) │ │
│ └──────────┬───────────┘ └──────────────────────┘ │
│ │ VRRP Advertisement (multicast, 1-second interval) │
│ └────────────────────────────────────────── │
│ │
│ VIP: 192.168.1.100 ← Single IP clients connect to │
└─────────────────────────────────────────────────────────────────┘

MASTER Server: The server with the highest priority becomes the Master. It binds the VIP to its own NIC and sends VRRP Advertisement packets via multicast every second.

BACKUP Server: Receives Advertisement packets and monitors the Master. If no packets arrive from the Master (dead interval = 3 seconds), it judges a failure and takes over the VIP.

Failover time: With default settings, failover completes within approximately 3-4 seconds.

Installation

Ubuntu/Debian

sudo apt update
sudo apt install -y keepalived

# Enable service
sudo systemctl enable keepalived
sudo systemctl start keepalived

CentOS/RHEL

sudo yum install -y keepalived
# Or RHEL 8+
sudo dnf install -y keepalived

sudo systemctl enable keepalived
sudo systemctl start keepalived

Kernel Parameter Configuration

Allow non-local IP binding so the VIP can be bound.

# Add to /etc/sysctl.conf
echo "net.ipv4.ip_nonlocal_bind = 1" | sudo tee -a /etc/sysctl.conf
sudo sysctl -p

Nginx Health Check Script

Keepalived periodically executes scripts to check the status of local services. If Nginx is healthy, Priority is maintained; if unhealthy, Priority is reduced to cause the Backup server to become the Master.

# /etc/keepalived/check_nginx.sh
#!/bin/bash

# Check Nginx process
if ! pgrep -x "nginx" > /dev/null; then
echo "Nginx process not found, attempting restart..."
systemctl restart nginx
sleep 2
# Re-check after restart
if ! pgrep -x "nginx" > /dev/null; then
echo "Nginx restart failed"
exit 1
fi
fi

# Check HTTP response (actual request test)
HTTP_CODE=$(curl -s -o /dev/null -w "%{http_code}" --max-time 3 http://127.0.0.1/health)
if [ "$HTTP_CODE" != "200" ]; then
echo "Nginx health check failed: HTTP $HTTP_CODE"
exit 1
fi

echo "Nginx is healthy"
exit 0
# Grant execution permission
sudo chmod +x /etc/keepalived/check_nginx.sh

MASTER Server Configuration

# /etc/keepalived/keepalived.conf (MASTER server)
global_defs {
# Unique ID for this server
router_id NGINX_MASTER
# User to run VRRP scripts
script_user root
enable_script_security
}

# Define health check script
vrrp_script check_nginx {
script "/etc/keepalived/check_nginx.sh"
interval 2 # Execute every 2 seconds
weight -20 # Decrease Priority by 20 on failure
fall 2 # Apply after 2 consecutive failures
rise 2 # Recover after 2 consecutive successes
timeout 5 # Script timeout
}

vrrp_instance VI_1 {
state MASTER # Initial state: MASTER
interface eth0 # Network interface for VRRP
virtual_router_id 51 # VRRP group ID (0~255, same value for all in group)
priority 100 # Priority (higher = becomes Master)
advert_int 1 # Advertisement send interval (seconds)
preempt # Reclaim Master role when recovered

# VRRP authentication (identical across all servers in the same group)
authentication {
auth_type PASS
auth_pass secret123 # Password up to 8 characters
}

# Virtual IP address
virtual_ipaddress {
192.168.1.100/24 dev eth0 label eth0:1
}

# Link to health check script
track_script {
check_nginx
}
}

BACKUP Server Configuration

# /etc/keepalived/keepalived.conf (BACKUP server)
global_defs {
router_id NGINX_BACKUP
script_user root
enable_script_security
}

vrrp_script check_nginx {
script "/etc/keepalived/check_nginx.sh"
interval 2
weight -20
fall 2
rise 2
timeout 5
}

vrrp_instance VI_1 {
state BACKUP # Initial state: BACKUP
interface eth0
virtual_router_id 51 # Same VRID as MASTER
priority 90 # Lower priority than MASTER
advert_int 1
nopreempt # Don't auto-switch when original MASTER recovers

authentication {
auth_type PASS
auth_pass secret123
}

virtual_ipaddress {
192.168.1.100/24 dev eth0 label eth0:1
}

track_script {
check_nginx
}
}

Notify Script (Failover Alerts)

You can send alerts to Slack, email, etc. when a failover occurs.

# /etc/keepalived/notify.sh
#!/bin/bash

TYPE=$1 # GROUP or INSTANCE
NAME=$2 # Instance name
STATE=$3 # MASTER, BACKUP, FAULT

SLACK_WEBHOOK="https://hooks.slack.com/services/YOUR/WEBHOOK/URL"
HOSTNAME=$(hostname)
TIMESTAMP=$(date '+%Y-%m-%d %H:%M:%S')

case $STATE in
MASTER)
MESSAGE="[HA] :crown: $HOSTNAME has been promoted to MASTER. ($TIMESTAMP)"
;;
BACKUP)
MESSAGE="[HA] :shield: $HOSTNAME has transitioned to BACKUP state. ($TIMESTAMP)"
;;
FAULT)
MESSAGE="[HA] :rotating_light: FAULT detected on $HOSTNAME! Immediate attention required. ($TIMESTAMP)"
;;
esac

# Send Slack alert
curl -s -X POST "$SLACK_WEBHOOK" \
-H "Content-type: application/json" \
-d "{\"text\": \"$MESSAGE\"}"

# Log the event
echo "$TIMESTAMP [KEEPALIVED] $HOSTNAME -> $STATE" >> /var/log/keepalived-notify.log
# Add to vrrp_instance block in keepalived.conf
vrrp_instance VI_1 {
# ... existing configuration ...
notify /etc/keepalived/notify.sh
}

Failover Testing

Scenario 1: Stop Nginx Process

# On MASTER server
sudo systemctl stop nginx

# Verify VIP has moved to BACKUP server
# On BACKUP server
ip addr show eth0 | grep "192.168.1.100"
# If eth0:1: 192.168.1.100/24 is shown, failover succeeded

# Check Keepalived logs
sudo tail -f /var/log/syslog | grep keepalived
# Or
sudo journalctl -u keepalived -f

Scenario 2: Full Server Shutdown

# On MASTER server (caution: actual failure simulation)
sudo systemctl stop keepalived

# Check VIP on BACKUP server
ip addr show | grep "192.168.1.100"

Scenario 3: Verify Failover During Continuous Requests

# Continuous requests from client
while true; do
RESPONSE=$(curl -s -w "\n%{http_code}" http://192.168.1.100/health)
echo "$(date): $RESPONSE"
sleep 0.5
done

Keepalived + HAProxy Combination Pattern

In large-scale environments, use the pattern where Keepalived provides redundancy for HAProxy, and HAProxy performs load balancing for backend servers.

Client


VIP: 192.168.1.100 (managed by Keepalived)
┌─┴─┐
▼ ▼ (Keepalived VRRP)
HAProxy HAProxy
Master Backup

▼ (HAProxy load balancing)
┌──┴──┬──────┐
▼ ▼ ▼
App1 App2 App3
# HAProxy health check script
# /etc/keepalived/check_haproxy.sh
#!/bin/bash
if ! pgrep -x "haproxy" > /dev/null; then
systemctl restart haproxy
sleep 2
if ! pgrep -x "haproxy" > /dev/null; then
exit 1
fi
fi

# Check status via HAProxy stats socket
if ! echo "show info" | socat stdio /var/run/haproxy/admin.sock &>/dev/null; then
exit 1
fi

exit 0

Troubleshooting

Preventing Split-Brain

To prevent Split-Brain — where both servers simultaneously become Master — configure the firewall to not block VRRP multicast.

# iptables: Allow VRRP multicast
sudo iptables -A INPUT -d 224.0.0.18 -j ACCEPT
sudo iptables -A OUTPUT -d 224.0.0.18 -j ACCEPT

# Using firewalld
sudo firewall-cmd --add-rich-rule='rule protocol value="vrrp" accept' --permanent
sudo firewall-cmd --reload

Checking Keepalived Logs

# Real-time log monitoring
sudo journalctl -u keepalived -f

# Check VIP status
ip addr show | grep "192.168.1.100"

# Check VRRP state (status file output by keepalived)
cat /tmp/keepalived.data 2>/dev/null || sudo cat /run/keepalived.pid

# Enable verbose debug logging (/etc/keepalived/keepalived.conf)
global_defs {
log_level 7
}

Common Issues and Solutions

VIP not transferring:

  • Verify virtual_router_id is identical on both servers
  • Verify auth_pass is identical on both servers
  • Confirm VRRP protocol (IP protocol number 112) is allowed in the firewall

Both servers becoming MASTER:

  • Check network connectivity between servers
  • Check if VRRP multicast (224.0.0.18) is blocked
  • Verify both servers are on the same network segment

Pro Tips

  • Applying the nopreempt option to the BACKUP server provides stability by not immediately switching when the original MASTER recovers.
  • Reducing advert_int enables faster detection but increases network load. 1-2 seconds is appropriate.
  • In cloud environments (AWS, GCP), VRRP multicast is not supported — use each cloud's native HA solution instead.
  • Regularly perform failover tests to validate behavior in actual failure scenarios.
  • With weight -20, MASTER Priority(100) - 20 = 80 becomes lower than BACKUP Priority(90), causing the Backup to become Master. Calculate this value precisely.