keepalived not promoting BACKUP to MASTER in multi-instance configuration

Question

I'm trying to make a multi-instance keepalived to control a master-replica pair of tarantool servers and setup VIPs. Server state itself is managed outside keepalived. Keepalived should only manage VIPs: it should set VIP1 on dummy1 iface in case server is in master state and VIP2 on same dummy1 iface in case server is in replica state and ensure no same VIP is set on two servers in case of miss-configuration . Servers are in deferred data centers so multicast is not an option. There is no dedicated server roles, both servers could be master or replica in the initial state, so I use BACKUP-BACKUP with equal priority configuration. Here is my config:

global_defs {
  enable_script_security # 1)
  dynamic_interfaces
}

vrrp_script chk_trntl_db1_pri {
  script "/bin/sh -c '/usr/bin/echo lua box.info.status | /usr/bin/tarantool -p 11011 | /usr/bin/grep -q primary'"
  interval 1
  fall 3
  rise 2
}

vrrp_script chk_trntl_db1_rpl {
  script "/bin/sh -c '/usr/bin/echo lua box.info.status | /usr/bin/tarantool -p 11011 | /usr/bin/grep -q connected'"
  interval 1
  fall 3
  rise 2
}

vrrp_instance TRNTL_DB1_PRI {
  interface eth0
  state BACKUP # 3)
  nopreempt # 4)
  virtual_router_id 03
  priority 100
  advert_int 1
  authentication {
    auth_type PASS # TODO: test AH method
    auth_pass XXXXXX
  }
  unicast_src_ip 10.161.133.20
  unicast_peer {
    10.161.133.19
  }
  virtual_ipaddress {
    10.161.133.21/32 dev dummy1
  }
  track_script {
    chk_trntl_db1_pri
  }
}

vrrp_instance TRNTL_DB1_RPL {
  interface eth0
  state BACKUP # 3)
  nopreempt # 4)
  virtual_router_id 04
  priority 100
  advert_int 1
  authentication {
    auth_type PASS
    auth_pass XXXXXX
  }
  unicast_src_ip 10.161.133.20
  unicast_peer {
    10.161.133.19
  }
  virtual_ipaddress {
    10.161.133.22/32 dev dummy1
  }
  track_script {
    chk_trntl_db1_rpl
  }
}

The pair's config differs only in unicast_src_ip and unicast_peer - they are opposite. You can notice, that virtual_router_id is unique to each instance.

What I found is that keepalived will never promote BACKUP instance to a MASTER state and set VIP in case more then one instance is present. No TRNTL_DB1_PRI, neither TRNTL_DB1_PRI even the corresponding scripts succeeds all the time from the start and instance remains in BACKUP state.

Here is the test case: SERVER1's tarantool is in master state, TRNTL_DB1_RPL instance is commented out (single instance configuration for test),chk_trntl_db1_pri returns 0, TRNTL_DB1_PRI becomes BACKUP, get advertisement timeout and promoted to MASTER, VIP is set. At the same time SERVER2's tarantool is in replica state, chk_trntl_db1_pri returns 1, TRNTL_DB1_RPL therefor is in FAULT state - as expected. But as tarantool is in replica state, chk_trntl_db1_rpl returns 0, TRNTL_DB1_RPL instance becomes BACKUP and remains in that state forever even it does not receive any of advertisements from the other node for it's virtual_router_id. I even switched SERVER1's keepalived down to prevent is sending any advertisements at all - nop, TRNTL_DB1_RPL will remain in BACKUP state. Here is the logs:

Starting LVS and VRRP High Availability Monitor...
Starting Keepalived v2.0.7 (08/23,2018)
Running on Linux 3.10.0-862.3.2.el7.x86_64 #1 SMP Mon May 21 23:36:36 UTC 2018 (built for Linux 3.10.0)
Command line: '/usr/sbin/keepalived' '-D'
Opening file '/etc/keepalived/keepalived.conf'.
Starting VRRP child process, pid=14851
Registering Kernel netlink reflector
Registering Kernel netlink command channel 
LVS and VRRP High Availability Monitor. 
Opening file '/etc/keepalived/keepalived.conf'.
Assigned address 10.255.160.193 for interface eth0
Assigned address fe80::21a:4aff:fe16:12a for interface eth0
Registering gratuitous ARP shared channel 
(TRNTL_DB1_PRI) removing VIPs.
(TRENT_DB1_RPL) removing VIPs.
VRRP sockpool: [ifindex(2), proto(112), unicast(1), fd(8,9)]
Script `chk_trntl_db1_pri` now returning 1
VRRP_Script(chk_trntl_db1_pri) failed (exited with status 1)
(TRNTL_DB1_PRI) Entering FAULT STATE
VRRP_Script(chk_trntl_db1_rpl) succeeded
(TRNTL_DB1_RPL) Entering BACKUP STATE

If I add TRENT_DB1_RPL section to the SERVER1 config it will not promote TRNTL_DB1_PRI to the MASTER state and stay in BACKUP state just is TRNTL_DB1_RPL does on SERVER2.

What I found, is if I remove all unicast related options from one of the sections, it will be promoted to the MASTER state, but only in case one instance is unicast and the other is multicast. But if both multicast or unicast, BACKUP will never be promoted to MASTER regardless the check script succeeds and no master present on the other side.

As it looks like even VIPs are set on one interface and check scripts probing the same daemon, it is two different independent instances as long as virtual_router_id is different and they should work together.

So, am I misunderstand something or it is a bug in keepalived?

UPDATE: fixed in 2.0.16. After building an rpm of current version from sources VRRP works exactly as expected.

keepalived not promoting BACKUP to MASTER in multi-instance configuration

0 Answers0