Why is keepalived running two MASTER nodes in my Sticky VIP configuration?

Question

I have keepalived setup (floating VIP) in front of haproxy on each of my three-node galera cluster nodes. When I restart keepalived on any given node, sometimes I end up with two nodes running in MASTER (as evidenced by the /etc/keepalived/log_status.sh notify script):

# cat /etc/keepalived/log_status.sh 
#!/bin/bash
echo $1 $2 is in $3 state > /var/run/keepalive.$1.$2.state

From what I've read, the 'multiple masters' is due to Multicast being filtered on the switch but I can run tcpdump on any one of my galera nodes and see the MC traffic hitting the nic (these are KVM virtuals running). I can try changing to unicast, but would like to know if this is due to a bug, feature or my config.

# cat /etc/keepalived/keepalived.conf 
log "setting up keepalived"
global_defs {
  router_id    host1  # short hostname of each KA node (10.20.18.201-203)
}

vrrp_script check_haproxy {
   script      "pidof haproxy"
   interval    2
   weight      2
}

vrrp_instance 250 {
  virtual_router_id 250
  advert_int   1
  nopreempt
  priority     100
  state        BACKUP
  interface    eth0
  notify       /etc/keepalived/log_status.sh

  virtual_ipaddress {
    10.20.18.250 dev   eth0
  }

  track_script {
    check_haproxy
  }
}

tcpdump output:

09:44:00.934942 IP 10.20.18.203 > 224.0.0.18: VRRPv2, Advertisement, vrid 250, prio 102, authtype none, intvl 1s, length 20
09:44:01.936054 IP 10.20.18.203 > 224.0.0.18: VRRPv2, Advertisement, vrid 250, prio 102, authtype none, intvl 1s, length 20
09:44:02.937315 IP 10.20.18.203 > 224.0.0.18: VRRPv2, Advertisement, vrid 250, prio 102, authtype none, intvl 1s, length 20
09:44:03.938444 IP 10.20.18.203 > 224.0.0.18: VRRPv2, Advertisement, vrid 250, prio 102, authtype none, intvl 1s, length 20
09:44:04.942302 IP 10.20.18.203 > 224.0.0.18: VRRPv2, Advertisement, vrid 250, prio 102, authtype none, intvl 1s, length 20
09:44:05.373224 IP 10.20.18.201 > 224.0.0.22: igmp v3 report, 1 group record(s)
09:44:05.943936 IP 10.20.18.203 > 224.0.0.18: VRRPv2, Advertisement, vrid 250, prio 102, authtype none, intvl 1s, length 20
09:44:06.029216 IP 10.20.18.201 > 224.0.0.22: igmp v3 report, 1 group record(s)
09:44:06.385127 IP 10.20.18.201 > 224.0.0.22: igmp v3 report, 1 group record(s)
09:44:06.945303 IP 10.20.18.203 > 224.0.0.18: VRRPv2, Advertisement, vrid 250, prio 102, authtype none, intvl 1s, length 20
09:44:07.333210 IP 10.20.18.201 > 224.0.0.22: igmp v3 report, 1 group record(s)
09:44:07.946098 IP 10.20.18.203 > 224.0.0.18: VRRPv2, Advertisement, vrid 250, prio 102, authtype none, intvl 1s, length 20
09:44:08.947228 IP 10.20.18.203 > 224.0.0.18: VRRPv2, Advertisement, vrid 250, prio 102, authtype none, intvl 1s, length 20
09:44:09.948507 IP 10.20.18.203 > 224.0.0.18: VRRPv2, Advertisement, vrid 250, prio 102, authtype none, intvl 1s, length 20
09:44:10.548023 IP 10.20.18.202 > 224.0.0.22: igmp v3 report, 1 group record(s)
09:44:10.663961 IP 10.20.18.202 > 224.0.0.22: igmp v3 report, 1 group record(s)
09:44:10.949633 IP 10.20.18.203 > 224.0.0.18: VRRPv2, Advertisement, vrid 250, prio 102, authtype none, intvl 1s, length 20
09:44:11.559970 IP 10.20.18.202 > 224.0.0.22: igmp v3 report, 1 group record(s)
09:44:11.587980 IP 10.20.18.202 > 224.0.0.22: igmp v3 report, 1 group record(s)
09:44:11.950795 IP 10.20.18.203 > 224.0.0.18: VRRPv2, Advertisement, vrid 250, prio 102, authtype none, intvl 1s, length 20
09:44:12.952124 IP 10.20.18.203 > 224.0.0.18: VRRPv2, Advertisement, vrid 250, prio 102, authtype none, intvl 1s, length 20
09:44:13.953075 IP 10.20.18.203 > 224.0.0.18: VRRPv2, Advertisement, vrid 250, prio 102, authtype none, intvl 1s, length 20
09:44:14.953543 IP 10.20.18.203 > 224.0.0.18: VRRPv2, Advertisement, vrid 250, prio 102, authtype none, intvl 1s, length 20
09:44:15.954703 IP 10.20.18.203 > 224.0.0.18: VRRPv2, Advertisement, vrid 250, prio 102, authtype none, intvl 1s, length 20
09:44:15.987641 IP 10.20.18.203 > 224.0.0.18: VRRPv2, Advertisement, vrid 250, prio 0, authtype none, intvl 1s, length 20
09:44:15.992698 IP 10.20.18.203 > 224.0.0.22: igmp v3 report, 1 group record(s)
09:44:16.008817 IP 10.20.18.203 > 224.0.0.22: igmp v3 report, 1 group record(s)
09:44:17.008829 IP 10.20.18.203 > 224.0.0.22: igmp v3 report, 1 group record(s)
09:44:17.036879 IP 10.20.18.203 > 224.0.0.22: igmp v3 report, 1 group record(s)
09:44:20.613407 IP 10.20.18.203 > 224.0.0.18: VRRPv2, Advertisement, vrid 250, prio 102, authtype none, intvl 1s, length 20
09:44:21.615616 IP 10.20.18.203 > 224.0.0.18: VRRPv2, Advertisement, vrid 250, prio 102, authtype none, intvl 1s, length 20
09:44:22.616909 IP 10.20.18.203 > 224.0.0.18: VRRPv2, Advertisement, vrid 250, prio 102, authtype none, intvl 1s, length 20
09:44:23.618155 IP 10.20.18.203 > 224.0.0.18: VRRPv2, Advertisement, vrid 250, prio 102, authtype none, intvl 1s, length 20
09:44:24.619607 IP 10.20.18.203 > 224.0.0.18: VRRPv2, Advertisement, vrid 250, prio 102, authtype none, intvl 1s, length 20

You've placed the config of just one of your keepalived servers. Could you please post the master as well? — Jaroslav Kucera, Feb 01 '18 at 16:32
Ah perhaps that is the issue? The config is the same on all nodes since I intend to run BACKUP/BACKUP (Choice 2) as per this example: https://blog.cloudandheat.com/index.php/en/2016/09/12/tutorial-part-2-highly-available-mariadb-galera-cluster-with-floating-ip/ — Server Fault, Feb 01 '18 at 16:46
Actually the `state` isn't so important. What is really important is the `priority`. It must be sufficiently different so keepalived can select single master for vrrp_instance. In the effective priority, there is also `weight` of the check_script calculated. So one of instances must always win. — Jaroslav Kucera, Feb 05 '18 at 07:07
I'm running BACKUP/BACKUP so there isn't (intentionally) a "real" MASTER so all `priority` and `weight` configurations are set the same. Is the multiple masters more of a side effect then? I guess the question then becomes is it "bad" to have two masters? I read an article that, by design, `keepalived` guarantees **at least** one master - seemingly implying that more than one MASTER could be running. From what I've seen in the debug logs, there is a second MASTER from time-to-time, but only one of them actually has the VIP — Server Fault, Feb 08 '18 at 14:32
Keepalived is about VIP. VIP is IP as any other. The IP can be only once in the network. By using identical configuration, I'm afraid keepalived instances are not able to select single MASTER to hold the VIP. Configure it to have different priorities and you'll have single VIP in the network. If the instance with higher priority goes down, second instance becomes MASTER. When the instance with higher priority returns to life, it again becomes MASTER. Simple, working... — Jaroslav Kucera, Feb 11 '18 at 10:24
Thank you for the information. I changed only the priority for all nodes, as you say, and it is working as it should. The link above does not mention the priorities should be different so I was confused. — Server Fault, Jul 12 '18 at 13:42

score 3 · Answer 1 · answered Sep 18 '18 at 14:24

Single word answer: iptables.

I was running two instances of keepalived - one to allow access from inside networks and the other to support external access.

I copied the internal configuration to create an external keepalived instance. While keepalived was working properly on the first interface ( the internal one, eth0 ) , my copied config was producing a VIP on both hosts.

My review of tcpdump showed that the bcast VRRP traffic was allowed in the network and visible to both keepalived instances. I reviewed tcp traffic on both the internal and external interfaces ( eth0 internal / eth1 external ) .

VRRP traffic must be allowed. I found that I could sniff the traffic successfully and saw VRRP traffic from both of my keepalived instances with the correct (and different) priorities. However, my iptables configuration was only allowing traffic on eth1.

The relevant lines in /etc/sysconfig/iptabes:

Before (problems on keepalived on eth1 but eth0 OK):

###Allow multicast for KeepAlived
-A INPUT -i eth0  -d 224.0.0.18/32 -p vrrp -j ACCEPT
-I OUTPUT -o eth0 -d 224.0.0.18/32 -p vrrp -j ACCEPT

After ( all good ) :

###Allow multicast for KeepAlived
-A INPUT -i eth0  -d 224.0.0.18/32 -p vrrp -j ACCEPT
-I OUTPUT -o eth0 -d 224.0.0.18/32 -p vrrp -j ACCEPT
-A INPUT -i eth1  -d 224.0.0.18/32 -p vrrp -j ACCEPT
-I OUTPUT -o eth1 -d 224.0.0.18/32 -p vrrp -j ACCEPT

score 0 · Answer 2 · edited Mar 28 '21 at 12:31

I had a similar configuration working just fine on a Vagrant local environment, but when configuring in a cloud provider servers, the BACKUP was always promoting itself and I would have 2 MASTERS at the same time.

I tried changing firewall rules, but what did it for me was entering the private network interface in the vrrp_instance interface field, along with adding unicast_src_ip and unicast_peer blocks. unicast_src_ip having the server own IP address and unicast_peer with the addresses of the remaining Keepalived nodes.

Why is keepalived running two MASTER nodes in my Sticky VIP configuration?

2 Answers2