Keepalived + VRRP — VIP Failover and Nginx HA Redundancy

Keepalived is a lightweight tool for implementing high availability in Linux environments. Based on the VRRP (Virtual Router Redundancy Protocol), multiple servers share a single Virtual IP (VIP), and when the Master server fails, the Backup server automatically takes over the VIP. Combined with Nginx, it completely eliminates the SPOF at the web server layer.

Keepalived and VRRP Protocol

VRRP (Virtual Router Redundancy Protocol) was originally designed for router redundancy as a network protocol (RFC 3768). Keepalived applies this protocol to server redundancy.

How It Works

┌─────────────────────────────────────────────────────────────────┐
│                    VRRP Group (VRID: 51)                         │
│                                                                  │
│  ┌──────────────────────┐      ┌──────────────────────┐         │
│  │   MASTER Server      │      │   BACKUP Server      │         │
│  │   192.168.1.10       │      │   192.168.1.11       │         │
│  │   Priority: 100      │      │   Priority: 90       │         │
│  │   (Holds VIP)        │      │   (On standby)       │         │
│  └──────────┬───────────┘      └──────────────────────┘         │
│             │ VRRP Advertisement (multicast, 1-second interval)  │
│             └──────────────────────────────────────────         │
│                                                                  │
│  VIP: 192.168.1.100 ← Single IP clients connect to              │
└─────────────────────────────────────────────────────────────────┘

MASTER Server: The server with the highest priority becomes the Master. It binds the VIP to its own NIC and sends VRRP Advertisement packets via multicast every second.

BACKUP Server: Receives Advertisement packets and monitors the Master. If no packets arrive from the Master (dead interval = 3 seconds), it judges a failure and takes over the VIP.

Failover time: With default settings, failover completes within approximately 3-4 seconds.

Installation

Ubuntu/Debian

sudo apt update
sudo apt install -y keepalived

# Enable service
sudo systemctl enable keepalived
sudo systemctl start keepalived

CentOS/RHEL

sudo yum install -y keepalived
# Or RHEL 8+
sudo dnf install -y keepalived

sudo systemctl enable keepalived
sudo systemctl start keepalived

Kernel Parameter Configuration

Allow non-local IP binding so the VIP can be bound.

# Add to /etc/sysctl.conf
echo "net.ipv4.ip_nonlocal_bind = 1" | sudo tee -a /etc/sysctl.conf
sudo sysctl -p

Nginx Health Check Script

Keepalived periodically executes scripts to check the status of local services. If Nginx is healthy, Priority is maintained; if unhealthy, Priority is reduced to cause the Backup server to become the Master.

# /etc/keepalived/check_nginx.sh
#!/bin/bash

# Check Nginx process
if ! pgrep -x "nginx" > /dev/null; then
    echo "Nginx process not found, attempting restart..."
    systemctl restart nginx
    sleep 2
    # Re-check after restart
    if ! pgrep -x "nginx" > /dev/null; then
        echo "Nginx restart failed"
        exit 1
    fi
fi

# Check HTTP response (actual request test)
HTTP_CODE=$(curl -s -o /dev/null -w "%{http_code}" --max-time 3 http://127.0.0.1/health)
if [ "$HTTP_CODE" != "200" ]; then
    echo "Nginx health check failed: HTTP $HTTP_CODE"
    exit 1
fi

echo "Nginx is healthy"
exit 0

# Grant execution permission
sudo chmod +x /etc/keepalived/check_nginx.sh

MASTER Server Configuration

# /etc/keepalived/keepalived.conf (MASTER server)
global_defs {
    # Unique ID for this server
    router_id NGINX_MASTER
    # User to run VRRP scripts
    script_user root
    enable_script_security
}

# Define health check script
vrrp_script check_nginx {
    script "/etc/keepalived/check_nginx.sh"
    interval 2          # Execute every 2 seconds
    weight -20          # Decrease Priority by 20 on failure
    fall 2              # Apply after 2 consecutive failures
    rise 2              # Recover after 2 consecutive successes
    timeout 5           # Script timeout
}

vrrp_instance VI_1 {
    state MASTER                    # Initial state: MASTER
    interface eth0                  # Network interface for VRRP
    virtual_router_id 51            # VRRP group ID (0~255, same value for all in group)
    priority 100                    # Priority (higher = becomes Master)
    advert_int 1                    # Advertisement send interval (seconds)
    preempt                         # Reclaim Master role when recovered

    # VRRP authentication (identical across all servers in the same group)
    authentication {
        auth_type PASS
        auth_pass secret123         # Password up to 8 characters
    }

    # Virtual IP address
    virtual_ipaddress {
        192.168.1.100/24 dev eth0 label eth0:1
    }

    # Link to health check script
    track_script {
        check_nginx
    }
}

BACKUP Server Configuration

# /etc/keepalived/keepalived.conf (BACKUP server)
global_defs {
    router_id NGINX_BACKUP
    script_user root
    enable_script_security
}

vrrp_script check_nginx {
    script "/etc/keepalived/check_nginx.sh"
    interval 2
    weight -20
    fall 2
    rise 2
    timeout 5
}

vrrp_instance VI_1 {
    state BACKUP                    # Initial state: BACKUP
    interface eth0
    virtual_router_id 51            # Same VRID as MASTER
    priority 90                     # Lower priority than MASTER
    advert_int 1
    nopreempt                       # Don't auto-switch when original MASTER recovers

    authentication {
        auth_type PASS
        auth_pass secret123
    }

    virtual_ipaddress {
        192.168.1.100/24 dev eth0 label eth0:1
    }

    track_script {
        check_nginx
    }
}

Notify Script (Failover Alerts)

You can send alerts to Slack, email, etc. when a failover occurs.

# /etc/keepalived/notify.sh
#!/bin/bash

TYPE=$1       # GROUP or INSTANCE
NAME=$2       # Instance name
STATE=$3      # MASTER, BACKUP, FAULT

SLACK_WEBHOOK="https://hooks.slack.com/services/YOUR/WEBHOOK/URL"
HOSTNAME=$(hostname)
TIMESTAMP=$(date '+%Y-%m-%d %H:%M:%S')

case $STATE in
    MASTER)
        MESSAGE="[HA] :crown: $HOSTNAME has been promoted to MASTER. ($TIMESTAMP)"
        ;;
    BACKUP)
        MESSAGE="[HA] :shield: $HOSTNAME has transitioned to BACKUP state. ($TIMESTAMP)"
        ;;
    FAULT)
        MESSAGE="[HA] :rotating_light: FAULT detected on $HOSTNAME! Immediate attention required. ($TIMESTAMP)"
        ;;
esac

# Send Slack alert
curl -s -X POST "$SLACK_WEBHOOK" \
    -H "Content-type: application/json" \
    -d "{\"text\": \"$MESSAGE\"}"

# Log the event
echo "$TIMESTAMP [KEEPALIVED] $HOSTNAME -> $STATE" >> /var/log/keepalived-notify.log

# Add to vrrp_instance block in keepalived.conf
vrrp_instance VI_1 {
    # ... existing configuration ...
    notify /etc/keepalived/notify.sh
}

Failover Testing

Scenario 1: Stop Nginx Process

# On MASTER server
sudo systemctl stop nginx

# Verify VIP has moved to BACKUP server
# On BACKUP server
ip addr show eth0 | grep "192.168.1.100"
# If eth0:1: 192.168.1.100/24 is shown, failover succeeded

# Check Keepalived logs
sudo tail -f /var/log/syslog | grep keepalived
# Or
sudo journalctl -u keepalived -f

Scenario 2: Full Server Shutdown

# On MASTER server (caution: actual failure simulation)
sudo systemctl stop keepalived

# Check VIP on BACKUP server
ip addr show | grep "192.168.1.100"

Scenario 3: Verify Failover During Continuous Requests

# Continuous requests from client
while true; do
    RESPONSE=$(curl -s -w "\n%{http_code}" http://192.168.1.100/health)
    echo "$(date): $RESPONSE"
    sleep 0.5
done

Keepalived + HAProxy Combination Pattern

In large-scale environments, use the pattern where Keepalived provides redundancy for HAProxy, and HAProxy performs load balancing for backend servers.

Client
    │
    ▼
VIP: 192.168.1.100 (managed by Keepalived)
   ┌─┴─┐
   ▼   ▼ (Keepalived VRRP)
HAProxy   HAProxy
Master    Backup
   │
   ▼ (HAProxy load balancing)
┌──┴──┬──────┐
▼     ▼      ▼
App1  App2   App3

# HAProxy health check script
# /etc/keepalived/check_haproxy.sh
#!/bin/bash
if ! pgrep -x "haproxy" > /dev/null; then
    systemctl restart haproxy
    sleep 2
    if ! pgrep -x "haproxy" > /dev/null; then
        exit 1
    fi
fi

# Check status via HAProxy stats socket
if ! echo "show info" | socat stdio /var/run/haproxy/admin.sock &>/dev/null; then
    exit 1
fi

exit 0

Troubleshooting

Preventing Split-Brain

To prevent Split-Brain — where both servers simultaneously become Master — configure the firewall to not block VRRP multicast.

# iptables: Allow VRRP multicast
sudo iptables -A INPUT -d 224.0.0.18 -j ACCEPT
sudo iptables -A OUTPUT -d 224.0.0.18 -j ACCEPT

# Using firewalld
sudo firewall-cmd --add-rich-rule='rule protocol value="vrrp" accept' --permanent
sudo firewall-cmd --reload

Checking Keepalived Logs

# Real-time log monitoring
sudo journalctl -u keepalived -f

# Check VIP status
ip addr show | grep "192.168.1.100"

# Check VRRP state (status file output by keepalived)
cat /tmp/keepalived.data 2>/dev/null || sudo cat /run/keepalived.pid

# Enable verbose debug logging (/etc/keepalived/keepalived.conf)
global_defs {
    log_level 7
}

Common Issues and Solutions

VIP not transferring:

Verify virtual_router_id is identical on both servers
Verify auth_pass is identical on both servers
Confirm VRRP protocol (IP protocol number 112) is allowed in the firewall

Both servers becoming MASTER:

Check network connectivity between servers
Check if VRRP multicast (224.0.0.18) is blocked
Verify both servers are on the same network segment

Pro Tips

Applying the nopreempt option to the BACKUP server provides stability by not immediately switching when the original MASTER recovers.
Reducing advert_int enables faster detection but increases network load. 1-2 seconds is appropriate.
In cloud environments (AWS, GCP), VRRP multicast is not supported — use each cloud's native HA solution instead.
Regularly perform failover tests to validate behavior in actual failure scenarios.
With weight -20, MASTER Priority(100) - 20 = 80 becomes lower than BACKUP Priority(90), causing the Backup to become Master. Calculate this value precisely.

Keepalived and VRRP Protocol​

How It Works​

Installation​

Ubuntu/Debian​

CentOS/RHEL​

Kernel Parameter Configuration​

Nginx Health Check Script​

MASTER Server Configuration​

BACKUP Server Configuration​

Notify Script (Failover Alerts)​

Failover Testing​

Scenario 1: Stop Nginx Process​

Scenario 2: Full Server Shutdown​

Scenario 3: Verify Failover During Continuous Requests​

Keepalived + HAProxy Combination Pattern​

Troubleshooting​

Preventing Split-Brain​

Checking Keepalived Logs​

Common Issues and Solutions​

Pro Tips​