1

We have two Redis servers behind an HAProxy server. One server is the master, which is up, and the other is the slave, which is down. If the master goes down, the sentinels elect the other server to be the master. That part is working fine. What's tricky is making sure that HAProxy never allows traffic to go to both servers at the same time.

I initially fixed this by adding rise and fall arguments for the servers, as below:

backend Backend:Redis
    bind-process 1
    timeout server  3h
    timeout tunnel 3h
    option tcp-check
    tcp-check connect
    tcp-check send PING\r\n
    tcp-check expect string +PONG
    tcp-check send info\ replication\r\n
    tcp-check expect string role:master
    tcp-check send QUIT\r\n
    tcp-check expect string +OK
    server redis-01.vbox 10.10.0.10:6279 check inter 5s rise 5 fall 2 maxconn 600 weight 1
    server redis-02.vbox 10.10.0.11:6279 check inter 5s rise 5 fall 2 maxconn 600 weight 1

We are dealing with a strange case after the following sequence:

  • Start with redis-01 (initial master) up and redis-02 (initial slave) down.
  • Kill redis-01 (master).
  • Sentinels elect redis-02 to be new master.
  • Restart redis on redis-01 (now slave, original master).
  • For two or three HAProxy health checks, redis-01 thinks it's master and passes the checks.
  • Eventually, redis-01 realizes its still the slave and starts failing the checks.

The problem is that HAProxy does not reset the health check counter. The status page shows that redis-01 has passed 2/5 (or 3/5) health checks. It's not up, which is good. What's not good is that if the other server goes down, it has fewer checks to pass, eventually just 1, which could lead to the case where both servers are up, from the point of view of HAProxy.

I don't understand why HAProxy doesn't consider redis-01 to have failed to come up, since it stopped passing checks after 2. It doesn't seem like it should keep waiting. The documentation says:

The "rise" parameter states that a server will be considered as operational after consecutive successful health checks.

It got 2 out of the 5, but it did not get 5, so it's not up and it shouldn't start counting again at 2 the next time it passes a health check. It needs to be at 0.

The question then is either of the following: 1) What do I need to do to tell HAProxy to reset the consecutive health check counter? 2) Is there a better way to keep HAProxy from considering both servers up at the same time?

siride
  • 529
  • 2
  • 7
  • 18

0 Answers0