I've installed Red Hat's cluster software on an install of CentOS 6.5, and use it to provide redundant routing from one network to another. This works fine, and I have a pair of boxes providing the service, so that if one fails (for example, if I test by removing its network connections), the other takes over routing.
However, if I then have to do anything to the remaining box, I can't restart it due to problems with rgmanager:
service rgmanager stop
hangs, and the only way to stop the process is to kill -9
it. This obviously also affects any action that tries to stop the service, like a reboot
or poweroff
.
When I do manage to start the server on its own, although the cluster starts, rgmanager is not shown as running in clustat
and none of the redundant routing services are even visible, let alone start.
This could cause problems if, for instance, the boxes are deployed to a remote location, and need to be powered down before we've had a chance to replace the failed box.
Here's my cluster.conf:
<?xml version="1.0"?>
<cluster config_version="2" name="router-ha">
<fence_daemon/>
<clusternodes>
<clusternode name="router-01" nodeid="1"/>
<clusternode name="router-02" nodeid="2"/>
</clusternodes>
<cman expected_votes="1" two_node="1"/>
<fencedevices/>
<rm>
<failoverdomains/>
<resources>
<ip address="10.0.0.1" monitor_link="1" sleeptime="0"/>
<ip address="10.0.0.2" monitor_link="1" sleeptime="0"/>
<ip address="10.2.0.1" monitor_link="1" sleeptime="0"/>
<ip address="10.4.0.1" monitor_link="1" sleeptime="0"/>
</resources>
<service autostart="1" name="routing-a" recovery="restart">
<ip ref="10.0.0.1"/>
<ip ref="10.2.0.1"/>
</service>
<service autostart="1" name="routing-b" recovery="restart">
<ip ref="10.0.0.2"/>
<ip ref="10.4.0.1"/>
</service>
</rm>
</cluster>
Why can't I start the service on a single box if it can't see the other? Surely it's a required part of being a redundant pair that you don't depend on the other machine to be able to start a cluster service?