3

We have been using keepalived in combination with a virtual IP address for two years now. In the rare case that a machine crashes this works very well.

But when there are issues on the box itself we have seen a couple of cases where no failover took place. For example when we had a issue where the system was swapping all the time. The load was 25 instead of the normal 5 and there was no way to ssh into the machine. Ping was working. Keepalived kept running and the virtual IP adress was not taken over by the other slave.

Also we had the situation where in a MySql HA setup somebody locked the complete database by mistake by doing a backup on the master instead of the slave. That was also not picked up.

Is the issue here that I am just using the wrong scripts to check on the machine itself if the master is working fine, or is this typical for a virtual IP setup?

I feels strange to me that you don't use a third system to determine if the master is available. Of course I understand why: keepalivd should be switched on the master itself by the master.

I noticed lately that for Redis HA setups people are using Zookeeper (eg https://github.com/ryanlecompte/redis_failover). Is that because of the limitations I ran into?

Marco
  • 31
  • 1

1 Answers1

1

Is the issue here that I am just using the wrong scripts to check on the machine itself if the master is working fine,

Yes.

For example when we had a issue where the system was swapping all the time. The load was 25 instead of the normal 5 and there was no way to ssh into the machine. Ping was working. Keepalived kept running and the virtual IP adress was not taken over by the other slave.

Have you tried to write your own script to check the load average, something like this:

#!/bin/bash

LOAD=$(/command/to/get/the/load/average)
if [ $LOAD -ge 25 ]; then
    exit 1
else
    exit 0
fi

then use it as a track_script:

vrrp_script check_load {
    script "/path/to/check_load.sh"
    interval 2
    weight 2
}

vrrp_instance VI_1 {
    state BACKUP
    nopreempt
    ...
    authentication {
        auth_type PASS
        auth_pass Neifeaw7
    }
    virtual_ipaddress {
        192.168.6.8
    }
    track_script {
        check_load
    }
}

But wait, what happens if the virtual IP is switched too frequently?

quanta
  • 50,327
  • 19
  • 152
  • 213