0

A couple of weeks ago I setup a 2 nodes CRM system with one of the resources managed being MySQL over DRBD. Today for maintenance reasons I restarted both nodes but now they can't connect to each other anymore.

DRBD fell out of sync and I followed this guide to get it back connected but it's only able to run successfully on one node.

But this strange thing happens: If I crm node standby both nodes and I try:

  • crm node online node0 before crm node online node1, all the CRM resources start successfully but the DRBD partitions are still running in StandAlone state.
  • crm node online node1 beofre crm node online node0, the DRBD resource fails to start, thus causing mysql not to start.
  • If I standby both resources and call crm node online node0 then it times out and prints this error:
    Running crm node online node0 produces this output after timing out 
    Error setting standby=off (section=nodes, set=<null>): Remote node did not respond
    Error performing operation: Remote node did not respond

Is there anything I'm doing wrong here? An alternative will be just do MySQL replication but I'm not sure how to promote a slave to master when the master database is not available.

MrD
  • 235
  • 4
  • 11

1 Answers1

0

This very much looks like a network error, the two drbd nodes can't connect to each other and they don't know who is the master, so in order to prevent corruption they fall back to StandAlone.

The most important is to fix this network problem first of all, if you can't do that just promote one of the two drbd systems with ForcePrimary and shut down the secondary node until the network problem can be solved.

If you're using drbd just for replicating MySQL using the MySQL replication engine is not out of question, it's just a question of the size of the DB you have, both methods have advantages and disadvantages.

lynxman
  • 9,157
  • 3
  • 24
  • 28