Using DRBD version: 8.2.6 (api:88/proto:86-88)
Here is the contents of /etc/ha.d/haresources
db1 192.168.100.200/24/eth0 drbddisk::mysql Filesystem::/dev/drbd0::/drbd::ext3::defaults mysql
and /etc/ha.d/ha.cf
logfile /var/log/ha-log logfacility local0 keepalive 1 deadtime 30 warntime 10 initdead 120 udpport 694 bcast eth0, eth4 auto_failback off node db1 node db2 respawn hacluster /usr/lib64/heartbeat/ipfail apiauth ipfail gid=haclient uid=hacluster deadping 5
When testing failover between machines I ran the following commands on db2:
service heartbeat stop service mysqld stop drbdadm down mysql service drbd stop
/proc/drbd on db1 reported
0: cs:Connected st:Primary/Unknown ds:UpToDate/DUnknown C r---
What happened next, after:
- Bringing the services back online on db2
- Transferring primary to db2 using hb_primary script
- Taking db1 down as above
- Bringing the services back online on db1
- Transferring primary back to db1 using hb_primary script
was db1 remounted the DRBD disk, assumed the correct IP and started MySQL. There was massive MySQL table corruption; it was all fixable (using InnoDB recovery mode 6, mysqlcheck and the occasional backup), but how did it happen?
I speculate:
- DRBD disconnected the disk from the filesystem while it was being used by MySQL, as a clean MySQL shutdown would not have resulted in corrupt data
- heartbeat controlled DRBD, and stopping the heartbeat service "pulled the plug" on DRBD
- this may happen again in the case of an actual failover (due to heartbeat ping timeout)
I do not have access to this setup again for some time, and would like to repeat the test.
Are the configuration settings correct?
Was the corruption the result of my manual testing?
Is there a better way to test failover than to stop the heartbeat service and let it run the haresources commands?