I have a two node cluster with heartbeat and DRBD managing a mysql resource. The failover works great if I halt the primary, reboot it, or disconnect the network connection.
However, if the primary suffers from a kernel panic (simulated by running echo c > /proc/sysrq-trigger
), the secondary does not takeover the resources.
This is what the heartbeat log on the secondary looks like:
Jul 11 21:33:32 rad11 heartbeat: [7519]: WARN: node rad10: is dead
Jul 11 21:33:32 rad11 heartbeat: [7519]: info: Link rad10:eth0 dead.
Jul 11 21:33:32 rad11 heartbeat: [8442]: info: Resetting node rad10 with [Meatware STONITH device]
Jul 11 21:33:32 rad11 heartbeat: [8442]: ERROR: glib: OPERATOR INTERVENTION REQUIRED to reset rad10.
Jul 11 21:33:32 rad11 heartbeat: [8442]: ERROR: glib: Run "meatclient -c rad10" AFTER power-cycling the machine.
Does anybody have any idea why the secondary fails to take-over in this situation? Normally failover works great, but I'm trying to simulate a kernel panic on the primary node.
EDIT: Here is my heartbeat config, ha.cf
# /etc/ha.d/ha.cf
logfile /var/log/ha-log
keepalive 1
deadtime 10
udpport 695
ucast eth0 rad11
auto_failback on
stonith_host rad10 meatware rad11
stonith_host rad11 meatware rad10
node rad10 rad11