1

I'm using keepalived v2.0.19 on CentOS7 with a single vrrp instance tracking haproxy process presence. Unfortunately the vrrp instance never leaves the FAULT state after a restart of haproxy process

Here is my config

vrrp_track_process chk_service {
    process haproxy
    weight 0
}

vrrp_instance VI_1 {
    interface eth0
    state MASTER
    virtual_router_id 51
        priority 101
        advert_int 1
    authentication {
        auth_type PASS
        auth_pass 1111
    }
    virtual_ipaddress {
        10.0.0.100 dev eth0 label eth0:shared
    }
    track_process {
        chk_service
    }
}

The syslogs logs show the quorm is lost when haproxy process comes down, but the quorum is never gained when haproxy process comes back online few seconds later.

systemd: Stopping HAProxy Load Balancer...
haproxy: [WARNING] 330/081104 (72258) : Exiting Master process...
haproxy: [ALERT] 330/081104 (72258) : Current program 'dataplane-api' (72260) exited with code 0 (Exit)
haproxy: [ALERT] 330/081104 (72258) : Current worker #1 (72261) exited with code 143 (Terminated)
haproxy: [WARNING] 330/081104 (72258) : All workers exited. Exiting... (0)
systemd: Stopped HAProxy Load Balancer.
Keepalived_vrrp[72335]: Quorum lost for tracked process chk_service
Keepalived_vrrp[72335]: (VI_1) Entering FAULT STATE
Keepalived_vrrp[72335]: (VI_1) sent 0 priority
Keepalived_vrrp[72335]: (VI_1) removing VIPs.
systemd: Starting HAProxy Load Balancer...
haproxy[113178]: Proxy stats started.
haproxy[113178]: Proxy main started.
haproxy[113178]: Proxy app started.
haproxy: [NOTICE] 330/081112 (113178) : New program 'dataplane-api' (113179) forked
haproxy: [NOTICE] 330/081112 (113178) : New worker #1 (113180) forked
systemd: Started HAProxy Load Balancer.

Note that the haproxy process presence is correctly detected when I start keepalived process

Here is the output of keepalived -v

Keepalived v2.0.19 (unknown)

Copyright(C) 2001-2019 Alexandre Cassen, <acassen@gmail.com>

Built with kernel headers for Linux 3.10.0
Running on Linux 3.10.0-1062.el7.x86_64 #1 SMP Thu Jul 18 20:25:13 UTC 2019

configure options: --prefix=/opt/keepalived

Config options:  LIBIPTC LIBIPSET_DYNAMIC LVS VRRP VRRP_AUTH OLD_CHKSUM_COMPAT FIB_ROUTING

System options:  PIPE2 SIGNALFD INOTIFY_INIT1 VSYSLOG EPOLL_CREATE1 IPV6_ADVANCED_API LIBNL3 RTA_ENCAP RTA_EXPIRES RTA_PREF FRA_SUPPRESS_PREFIXLEN FRA_TUN_ID RTAX_CC_ALGO RTAX_QUICKACK FRA_OIFNAME IFA_FLAGS IP_MULTICAST_ALL LIBIPTC NET_LINUX_IF_H_COLLISION LIBIPVS_NETLINK VRRP_VMAC IFLA_LINK_NETNSID CN_PROC SOCK_NONBLOCK SOCK_CLOEXEC O_PATH GLOB_BRACE INET6_ADDR_GEN_MODE SO_MARK SCHED_RT SCHED_RESET_ON_FORK

I tried to set quorum min and max values with no luck.

did someone experience the same issue ?

Fabien A.
  • 11
  • 2
  • I’m experiencing the same issue with Keepalived 2.0.19 on Alpine Linux. And it’s not the first issue with vrrp_track_process, so I’m moving to vrrp_script with `killall -0`. – Jakub Jirutka Feb 19 '20 at 20:28

1 Answers1

0

Had the same problem for keepalived version 2.0.19.

In our case the problem was that for process with pids greater than 32767 keepalived tried to open de file: /proc/xxxxx/comm with xxxx as a negative number. So if the computer is running long time periods and pids get huge you can experiment this behavior.

Luckily keepalived 2.0.20 fixed this bug, as is mention here:

  • Fix track_process with PIDs > 32767

https://www.keepalived.org/changelog.html (Release 2.0.20)