We have two Redis servers behind an HAProxy server. One server is the master, which is up, and the other is the slave, which is down. If the master goes down, the sentinels elect the other server to be the master. That part is working fine. What's tricky is making sure that HAProxy never allows traffic to go to both servers at the same time.
I initially fixed this by adding rise
and fall
arguments for the servers, as below:
backend Backend:Redis
bind-process 1
timeout server 3h
timeout tunnel 3h
option tcp-check
tcp-check connect
tcp-check send PING\r\n
tcp-check expect string +PONG
tcp-check send info\ replication\r\n
tcp-check expect string role:master
tcp-check send QUIT\r\n
tcp-check expect string +OK
server redis-01.vbox 10.10.0.10:6279 check inter 5s rise 5 fall 2 maxconn 600 weight 1
server redis-02.vbox 10.10.0.11:6279 check inter 5s rise 5 fall 2 maxconn 600 weight 1
We are dealing with a strange case after the following sequence:
- Start with
redis-01
(initial master) up andredis-02
(initial slave) down. - Kill
redis-01
(master). - Sentinels elect
redis-02
to be new master. - Restart redis on
redis-01
(now slave, original master). - For two or three HAProxy health checks,
redis-01
thinks it's master and passes the checks. - Eventually,
redis-01
realizes its still the slave and starts failing the checks.
The problem is that HAProxy does not reset the health check counter. The status page shows that redis-01
has passed 2/5 (or 3/5) health checks. It's not up, which is good. What's not good is that if the other server goes down, it has fewer checks to pass, eventually just 1, which could lead to the case where both servers are up, from the point of view of HAProxy.
I don't understand why HAProxy doesn't consider redis-01
to have failed to come up, since it stopped passing checks after 2. It doesn't seem like it should keep waiting. The documentation says:
The "rise" parameter states that a server will be considered as operational after consecutive successful health checks.
It got 2 out of the 5, but it did not get 5, so it's not up and it shouldn't start counting again at 2 the next time it passes a health check. It needs to be at 0.
The question then is either of the following: 1) What do I need to do to tell HAProxy to reset the consecutive health check counter? 2) Is there a better way to keep HAProxy from considering both servers up at the same time?