2

I have set up two HAProxy Load Balancers in an Active->Passive pair.

KeepAliveD will be used for failover between MASTER and BACKUP servers.

Like most clouds, Multicast is not supported so I cannot use a Virtual IP. Instead I'm attempting to use Unicast which I've seen littered around the web as a solution.

My problem is that the BACKUP KeepAliveD instance enters MASTER state straight away. It can ping the MASTER server, but it's like it is not able to realise that is is indeed up.

I would class myself as a sysadmin n00b, so please forgive me. For this reason, I'm hoping there are some glaringly obvious mistakes I'm making that can be rectified easily...

      __[HAProxy Active, KeepAliveD MASTER, 10.179.66.95]
     /
----|
    |
     \__[HAProxy Passive, KeepAliveD BACKUP, 10.179.74.172]

Configs as follows...

KeepAliveD version on both

1.2.9 (Unicast support was added in 1.2.8 and patched in 1.2.9).

http://www.keepalived.org/changelog.html

On both servers in /etc/sysctl.conf

# Nonlocal bind for use with KeepAliveD. Allows this instance to take on a non-local IP for failover.
net.ipv4.ip_nonlocal_bind=1

KeepAliveD MASTER in /etc/keepalived/keepalived.conf

! Configuration File for keepalived

global_defs {
    notification_email {
        me@me.com
    }
    notification_email_from me@me.com
        smtp_server 127.0.0.1
        smtp_connect_timeout 30
        router_id LB_MASTER_ACTIVE
    }

    # Define the script used to check if haproxy is still working
    vrrp_script chk_haproxy {
        script "killall -0 haproxy"   # verify the pid existance
        interval 2                    # check every 2 seconds
        weight 2                      # add 2 points of prio if OK
    }

    # Virtual interface.
   vrrp_instance VI_1 {
    state MASTER
    interface eth1
    virtual_router_id 51
    priority 101
    smtp_alert                  # Activate e-mail notifications.
    #advert_int 1

    authentication {
        auth_type PASS
        auth_pass 1111
    }

    # IP of myself and my peer for unicast based failover.
    vrrp_unicast_bind 10.179.66.95      # My IP.
    vrrp_unicast_peer 10.179.74.172     # The other's IP.

    # Check if HAProxy is running or not.
    track_script {
        chk_haproxy
    }
}

KeepAliveD BACKUP in /etc/keepalived/keepalived.conf

! Configuration File for keepalived

global_defs {
    notification_email {
        me@me.com
    }
    notification_email_from me@me.com
    smtp_server 127.0.0.1
    smtp_connect_timeout 30
    router_id LB_BACKUP_PASSIVE
}

# Define the script used to check if haproxy is still working
vrrp_script chk_haproxy {
    script "killall -0 haproxy"
    interval 2
    weight 2
}

# Virtual interface.
vrrp_instance VI_1 {
    state BACKUP
    interface eth1
    virtual_router_id 51
    priority 100                # MASTER is priority 101.
    smtp_alert                  # Activate e-mail notifications.
    advert_int 1

    authentication {
        auth_type PASS
        auth_pass 1111
    }

    # IP of myself and my peer for unicast based failover.
    vrrp_unicast_bind 10.179.74.172     # My IP.
    vrrp_unicast_peer 10.179.66.95      # The other's IP.

    # Check if HAProxy is running or not.
    track_script {
        chk_haproxy
    }
}

Messages log on KeepAliveD MASTER when KeepAliveD started, tail -f /var/log/messages

Nov 28 10:54:02 mysql-read-lb-1 Keepalived[30158]: Starting Keepalived v1.2.7 (02/21,2013)
Nov 28 10:54:02 mysql-read-lb-1 Keepalived[30159]: Starting Healthcheck child process, pid=30161
Nov 28 10:54:02 mysql-read-lb-1 Keepalived[30159]: Starting VRRP child process, pid=30162
Nov 28 10:54:02 mysql-read-lb-1 Keepalived_healthcheckers[30161]: Interface queue is empty
Nov 28 10:54:02 mysql-read-lb-1 Keepalived_vrrp[30162]: Interface queue is empty
Nov 28 10:54:02 mysql-read-lb-1 Keepalived_healthcheckers[30161]: No such interface, eth1
Nov 28 10:54:02 mysql-read-lb-1 Keepalived_healthcheckers[30161]: No such interface, eth2
Nov 28 10:54:02 mysql-read-lb-1 Keepalived_healthcheckers[30161]: Netlink reflector reports IP 10.179.66.95 added
Nov 28 10:54:02 mysql-read-lb-1 Keepalived_healthcheckers[30161]: Netlink reflector reports IP 192.168.3.1 added
Nov 28 10:54:02 mysql-read-lb-1 Keepalived_healthcheckers[30161]: Netlink reflector reports IP fe80::be76:4eff:fe08:9227 added
Nov 28 10:54:02 mysql-read-lb-1 Keepalived_healthcheckers[30161]: Netlink reflector reports IP fe80::be76:4eff:fe08:8b4d added
Nov 28 10:54:02 mysql-read-lb-1 Keepalived_healthcheckers[30161]: Registering Kernel netlink reflector
Nov 28 10:54:02 mysql-read-lb-1 Keepalived_healthcheckers[30161]: Registering Kernel netlink command channel
Nov 28 10:54:02 mysql-read-lb-1 Keepalived_healthcheckers[30161]: Opening file '/etc/keepalived/keepalived.conf'.
Nov 28 10:54:02 mysql-read-lb-1 Keepalived_vrrp[30162]: No such interface, eth1
Nov 28 10:54:02 mysql-read-lb-1 Keepalived_vrrp[30162]: No such interface, eth2
Nov 28 10:54:02 mysql-read-lb-1 Keepalived_vrrp[30162]: Netlink reflector reports IP 10.179.66.95 added
Nov 28 10:54:02 mysql-read-lb-1 Keepalived_vrrp[30162]: Netlink reflector reports IP 192.168.3.1 added
Nov 28 10:54:02 mysql-read-lb-1 Keepalived_vrrp[30162]: Netlink reflector reports IP fe80::be76:4eff:fe08:9227 added
Nov 28 10:54:02 mysql-read-lb-1 Keepalived_vrrp[30162]: Netlink reflector reports IP fe80::be76:4eff:fe08:8b4d added
Nov 28 10:54:02 mysql-read-lb-1 Keepalived_vrrp[30162]: Registering Kernel netlink reflector
Nov 28 10:54:02 mysql-read-lb-1 Keepalived_vrrp[30162]: Registering Kernel netlink command channel
Nov 28 10:54:02 mysql-read-lb-1 Keepalived_vrrp[30162]: Registering gratuitous ARP shared channel
Nov 28 10:54:02 mysql-read-lb-1 Keepalived_vrrp[30162]: Opening file '/etc/keepalived/keepalived.conf'.
Nov 28 10:54:02 mysql-read-lb-1 Keepalived_healthcheckers[30161]: Configuration is using : 7559 Bytes
Nov 28 10:54:02 mysql-read-lb-1 Keepalived_vrrp[30162]: Truncating auth_pass to 8 characters
Nov 28 10:54:02 mysql-read-lb-1 Keepalived_vrrp[30162]: Configuration is using : 64400 Bytes
Nov 28 10:54:02 mysql-read-lb-1 Keepalived_vrrp[30162]: Using LinkWatch kernel netlink reflector...
Nov 28 10:54:02 mysql-read-lb-1 Keepalived_vrrp[30162]: VRRP sockpool: [ifindex(3), proto(112), fd(10,11)]
Nov 28 10:54:02 mysql-read-lb-1 Keepalived_healthcheckers[30161]: Using LinkWatch kernel netlink reflector...
Nov 28 10:54:03 mysql-read-lb-1 Keepalived_vrrp[30162]: VRRP_Instance(VI_1) Transition to MASTER STATE
Nov 28 10:54:04 mysql-read-lb-1 Keepalived_vrrp[30162]: VRRP_Instance(VI_1) Entering MASTER STATE
Nov 28 10:54:04 mysql-read-lb-1 Keepalived_vrrp[30162]: Remote SMTP server [127.0.0.1]:25 connected.
Nov 28 10:54:04 mysql-read-lb-1 Keepalived_vrrp[30162]: SMTP alert successfully sent.

Messages log on KeepAliveD BACKUP when KeepAliveD started, tail -f /var/log/messages

You'll notice it enters MASTER state straight off the bat but should stay in BACKUP...

Nov 28 10:57:35 load-balancer-1-passive Keepalived[25048]: Starting Keepalived v1.2.7 (02/21,2013)
Nov 28 10:57:35 load-balancer-1-passive Keepalived[25049]: Starting Healthcheck child process, pid=25050
Nov 28 10:57:35 load-balancer-1-passive Keepalived[25049]: Starting VRRP child process, pid=25052
Nov 28 10:57:35 load-balancer-1-passive Keepalived_healthcheckers[25050]: Interface queue is empty
Nov 28 10:57:35 load-balancer-1-passive Keepalived_healthcheckers[25050]: No such interface, eth1
Nov 28 10:57:35 load-balancer-1-passive Keepalived_healthcheckers[25050]: No such interface, eth2
Nov 28 10:57:35 load-balancer-1-passive Keepalived_healthcheckers[25050]: Netlink reflector reports IP 10.179.74.172 added
Nov 28 10:57:35 load-balancer-1-passive Keepalived_healthcheckers[25050]: Netlink reflector reports IP 192.168.3.2 added
Nov 28 10:57:35 load-balancer-1-passive Keepalived_healthcheckers[25050]: Netlink reflector reports IP fe80::be76:4eff:fe08:93fc added
Nov 28 10:57:35 load-balancer-1-passive Keepalived_healthcheckers[25050]: Netlink reflector reports IP fe80::be76:4eff:fe08:940c added
Nov 28 10:57:35 load-balancer-1-passive Keepalived_healthcheckers[25050]: Registering Kernel netlink reflector
Nov 28 10:57:35 load-balancer-1-passive Keepalived_healthcheckers[25050]: Registering Kernel netlink command channel
Nov 28 10:57:35 load-balancer-1-passive Keepalived_healthcheckers[25050]: Opening file '/etc/keepalived/keepalived.conf'.
Nov 28 10:57:35 load-balancer-1-passive Keepalived_vrrp[25052]: Interface queue is empty
Nov 28 10:57:35 load-balancer-1-passive Keepalived_vrrp[25052]: No such interface, eth1
Nov 28 10:57:35 load-balancer-1-passive Keepalived_vrrp[25052]: No such interface, eth2
Nov 28 10:57:35 load-balancer-1-passive Keepalived_vrrp[25052]: Netlink reflector reports IP 10.179.74.172 added
Nov 28 10:57:35 load-balancer-1-passive Keepalived_healthcheckers[25050]: Configuration is using : 7595 Bytes
Nov 28 10:57:35 load-balancer-1-passive Keepalived_vrrp[25052]: Netlink reflector reports IP 192.168.3.2 added
Nov 28 10:57:35 load-balancer-1-passive Keepalived_vrrp[25052]: Netlink reflector reports IP fe80::be76:4eff:fe08:93fc added
Nov 28 10:57:35 load-balancer-1-passive Keepalived_vrrp[25052]: Netlink reflector reports IP fe80::be76:4eff:fe08:940c added
Nov 28 10:57:35 load-balancer-1-passive Keepalived_vrrp[25052]: Registering Kernel netlink reflector
Nov 28 10:57:35 load-balancer-1-passive Keepalived_vrrp[25052]: Registering Kernel netlink command channel
Nov 28 10:57:35 load-balancer-1-passive Keepalived_vrrp[25052]: Registering gratuitous ARP shared channel
Nov 28 10:57:35 load-balancer-1-passive Keepalived_vrrp[25052]: Opening file '/etc/keepalived/keepalived.conf'.
Nov 28 10:57:35 load-balancer-1-passive Keepalived_vrrp[25052]: Truncating auth_pass to 8 characters
Nov 28 10:57:35 load-balancer-1-passive Keepalived_vrrp[25052]: Configuration is using : 64436 Bytes
Nov 28 10:57:35 load-balancer-1-passive Keepalived_vrrp[25052]: Using LinkWatch kernel netlink reflector...
Nov 28 10:57:35 load-balancer-1-passive Keepalived_vrrp[25052]: VRRP_Instance(VI_1) Entering BACKUP STATE
Nov 28 10:57:35 load-balancer-1-passive Keepalived_vrrp[25052]: VRRP sockpool: [ifindex(3), proto(112), fd(10,11)]
Nov 28 10:57:35 load-balancer-1-passive Keepalived_vrrp[25052]: Remote SMTP server [127.0.0.1]:25 connected.
Nov 28 10:57:35 load-balancer-1-passive Keepalived_healthcheckers[25050]: Using LinkWatch kernel netlink reflector...
Nov 28 10:57:35 load-balancer-1-passive Keepalived_vrrp[25052]: VRRP_Script(chk_haproxy) succeeded
Nov 28 10:57:35 load-balancer-1-passive Keepalived_vrrp[25052]: SMTP alert successfully sent.
Nov 28 10:57:38 load-balancer-1-passive Keepalived_vrrp[25052]: VRRP_Instance(VI_1) Transition to MASTER STATE
Nov 28 10:57:39 load-balancer-1-passive Keepalived_vrrp[25052]: VRRP_Instance(VI_1) Entering MASTER STATE
Nov 28 10:57:39 load-balancer-1-passive Keepalived_vrrp[25052]: Remote SMTP server [127.0.0.1]:25 connected.
Nov 28 10:57:39 load-balancer-1-passive Keepalived_vrrp[25052]: SMTP alert successfully sent.

KeepAliveD MASTER server's interfaces, "ip a":

[root@load-balancer-1-active keepalived]# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue state UNKNOWN 
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST> mtu 1500 qdisc pfifo_fast state DOWN qlen 1000
    link/ether bc:76:4e:08:92:38 brd ff:ff:ff:ff:ff:ff
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether bc:76:4e:08:92:27 brd ff:ff:ff:ff:ff:ff
    inet 10.179.66.95/18 brd 10.179.127.255 scope global eth1
    inet6 fe80::be76:4eff:fe08:9227/64 scope link 
       valid_lft forever preferred_lft forever

KeepAliveD BACKUP server's interfaces, "ip a":

[root@load-balancer-1-passive ~]# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue state UNKNOWN 
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST> mtu 1500 qdisc pfifo_fast state DOWN qlen 1000
    link/ether bc:76:4e:08:4f:b4 brd ff:ff:ff:ff:ff:ff
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether bc:76:4e:08:93:fc brd ff:ff:ff:ff:ff:ff
    inet 10.179.74.172/18 brd 10.179.127.255 scope global eth1
    inet6 fe80::be76:4eff:fe08:93fc/64 scope link 
       valid_lft forever preferred_lft forever

Sniff on MASTER

As suggested by "emy" below, sniffing packets to see if health check communication is getting through as suggested here: http://www.cyberciti.biz/faq/linux-unix-verify-keepalived-working-or-not/

[root@mysql-read-lb-1 ~]# tcpdump -vvv -n -i eth1 host 10.179.74.172
tcpdump: listening on eth1, link-type EN10MB (Ethernet), capture size 65535 bytes

Nothing :(

Sniff on BACKUP

[root@load-balancer-1-passive ~]# tcpdump -vvv -n -i eth1 host 10.179.66.95
tcpdump: listening on eth1, link-type EN10MB (Ethernet), capture size 65535 bytes

Nothing :(

Chris Rosillo
  • 51
  • 1
  • 3

1 Answers1

1

Some guess:

Are you sure that the vrrp traffic gets through? Could you sniff (e.g. ngrep, tcpdump) on port 112 if packets are received? (You should see one each second.) See this link.

If not, it could be a firewall issue.

erny
  • 351
  • 1
  • 7
  • Thanks, emy. I've updated my question with the sniff results. I am seeing results each second, but guessing I should be seeing results from the other KeepAliveD instance intermingled as well? – Chris Rosillo Nov 28 '13 at 13:57
  • More questions: 1. Your scanning for multicast (224.0.0.18), but you should scan for "the other machine's IP" (from BACKUP, you should do a tcpdump for (10.179.66.95). You wrote that you'll use unicast, as multicast won't work in most cloud infrastructures. 2. Does you version of keepalived really support unicast? What version are you using? (Unicast support was added very recently in version 1.2.8). 3. Port 112 (VRRP) may be filtered by external firewalls. You could something like netcat (nc) to see that you really can send data to this ports. (Use keepalived -l to log to console.) – erny Nov 28 '13 at 18:41
  • Version, 1.2.7 - I'm going to install 1.2.8 or 1.2.9 and try again! Will report back. – Chris Rosillo Nov 28 '13 at 20:11
  • Answers: 1. Tried suggestion, appears nothing, which explains split brain. Updated original question with results. 2. Installed 1.2.9 ([guide here](http://www.cyberciti.biz/faq/rhel-centos-fedora-keepalived-lvs-cluster-configuration/)), no joy. 3. I'll try this tomorrow. (Thanks for help so far erny). – Chris Rosillo Nov 28 '13 at 20:56
  • VRRP is not supported by Rackspace on either cloud or dedicated... Cloud because they basically want you to pay for their Cloud Load Balancers as discussed [here](http://www.rackspace.com/knowledge_center/article/ip-failover-with-heartbeat) on their knowledge base. Dedicated, again because they want you to pay for a dedicated load balanced device, "Multicast and therefore VRRP is not be possible as we don't allow the traffic over our switches and we do not support it on our firewalls." ... "The only way for this solution to work would be to add an F5 loadbalancer to your solution ". Damn! – Chris Rosillo Dec 02 '13 at 10:38
  • Sorry for that. If you still insist, you could try with heartbeat as I read [here](http://linux-ha.org/wiki/Ha.cf#ucast_-_configures_unicast_Heartbeat_communication) and [here](http://wiki.debuntu.org/wiki/Linux_HA_Heartbeat). Another alternative could be using using "iptunnel" to tunnel VRRP protocol sending the packets to a virtual IP. But this may not be robust / practical. – erny Dec 04 '13 at 10:45