I'm creating a 2+1 failover cluster under Red Hat 5.5 with 4 services of which 2 have to run on the same node, sharing the same virtual IP address. One of the services on each node (called disk1 and disk2 in cluster.conf below) needs a (SAN) disk, the other doesn't (they are called nodisk1 and nodisk2). So on each node there should be one service needing a disk (diskN) and its corresponding service which doesn't need a disk (nodiskN). I'm using HA-LVM.

When I shut down (via ifdown) the two interfaces connected to the SAN to simulate SAN failure, the service needing the disk is disabled, the other keeps running, as expected. Surprisingly (and unfortunately), the virtual IP address shared by the two services on the same machine is also removed, rendering the still-running service useless. How can I configure the cluster to keep the IP address up? The only way I found so far was to assign a different virtual IP address to each of the service not needing a disk (not implemented in the following cluster.conf).

cluster.conf looks like this:

<?xml version="1.0" ?>
<cluster config_version="1" name="cluster">
  <fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="3"/>
  <cman shutdown_timeout="10000"/>
<clusternode name="node1" nodeid="1" votes="1">
    <method name="1">
      <device name="device1"/>
<clusternode name="node2" nodeid="2" votes="1">
    <method name="1">
      <device name="device2"/>
<clusternode name="node3" nodeid="3" votes="1">
    <method name="1">
      <device name="device3"/>
      <fencedevice agent="fence_ilo" ipaddr="" login="admin" name="device1" passwd="password"/>
      <fencedevice agent="fence_ilo" ipaddr="" login="admin" name="device2" passwd="password"/>
      <fencedevice agent="fence_ilo" ipaddr="" login="admin" name="device3" passwd="password"/>
  <failoverdomain name="domain1" nofailback="0">
    <failoverdomainnode name="node1" priority="1"/>
  <failoverdomain name="domain2" nofailback="0">
    <failoverdomainnode name="node2" priority="1"/>
  <ip address="" monitor_link="1"/>
  <ip address="" monitor_link="1"/>
<service autostart="1" exclusive="0" name="disk1" recovery="restart" domain="domain1">
  <ip ref=""/>
  <script file="/etc/init.d/disk1" name="disk1"/>
  <fs device="/dev/VolGroup10/LogVol10" force_fsck="0" force_unmount="1" fstype="ext3" mountpoint="/mnt/lun1" name="lun1" self_fence="1"/>
  <lvm lv_name="LogVol10" name="VolGroup10/LogVol10" vg_name="VolGroup10"/>
<service autostart="1" exclusive="0" name="nodisk1" recovery="restart" domain="domain1">
  <ip ref=""/>
  <script file="/etc/init.d/nodisk1" name="nodisk1"/>
<service autostart="1" exclusive="0" name="disk2" recovery="restart" domain="domain2">
  <ip ref=""/>
  <script file="/etc/init.d/disk2" name="disk2"/>
  <fs device="/dev/VolGroup20/LogVol20" force_fsck="0" force_unmount="1" fstype="ext3" mountpoint="/mnt/lun2" name="lun2" self_fence="1"/>
  <lvm lv_name="LogVol20" name="VolGroup20/LogVol20" vg_name="VolGroup20"/>
<service autostart="1" exclusive="0" name="nodisk2" recovery="restart" domain="domain2">
  <ip ref=""/>
  <script file="/etc/init.d/nodisk2" name="nodisk2"/>
  • 123
  • 1
  • 5
  • Is the IP bound to one of the interfaces you're shutting down? – d34dh0r53 Nov 19 '11 at 03:34
  • No, it's not. I'm shutting down the two interfaces to the SAN, the other two interfaces (bonded together) with the IP and providing access to the server are still up. – js. Nov 20 '11 at 01:04

3 Answers3


I think you'll need another service in order to maintain this IP. The problem is that when the SAN service fails rgmanager issues an ip addr del <ip> on the node that is running the service. Since this IP is shared it's yanked out from the other service. So you'll need to add another service such as:

<service autostart="1" domain="<fo_domain_of_services>" name="floating_ip">
  <ip ref="your_ip" />

The way you setup your failover domains is key, if you do it wrong you'll wind up with the IP sitting on one node and the services on the other. Unfortunately I don't have a cluster to test with currently, but I'm thinking that you want all three of the services (the two that need the IP and the IP itself) in a single restricted failover domain with a priority of at least 1.

Always keep in mind that if you're making changes to /etc/cluster/cluster.conf by hand to increment the version number and then use ccs_tool update /etc/cluster/cluster.conf to push the configuration out to the other nodes. Another thing to keep in mind is that ccs_tool is being phased out, but in RHEL 5.4 it should still work. The other command to remember is rg_test it will allow you to see exactly what the cluster is doing when you start/stop services. Set your debug levels up and always watch the log files. Good luck!

  • 1,671
  • 11
  • 11
  • As you can see in the cluster.conf I just added to the question, I already have two services referencing the same IP address. And the address still gets removed when one of them dies. The only way I have found so far to be able to access one of the surviving processes (nodisk1 or nodisk2) is to give them different IP addresses (which would complicate the configuration of the part of the system accessing them). – js. Nov 21 '11 at 14:53
  • Right, which is why I think you need to move the IP address in question to it's own service. – d34dh0r53 Nov 22 '11 at 20:39
  • It still gets removed when one of the other services use it (seems like a bug to me.) – js. Nov 28 '11 at 11:47
  • It's because of the way RHCS removes the IP address. It's not smart (for lack of a better term) enough to know that the IP address is being used by another service. When the service fails/deactivates it runs `ip addr del ` it doesn't check to see if that IP is in use by another service. I'm pretty sure that RedHat would argue that by design an IP should only be used by one service as the service is usually dependent on that IP being on the server it's running on. – d34dh0r53 Nov 28 '11 at 15:35
  • I guess you are right; the two services could end up running on two different machines, so there doesn't seem to be a simple solution. – js. Nov 29 '11 at 11:02

Have you tried putting the two services that are dependent on the disk in their own resource group?

It sounds like the best course of action would be to drop the IP and the running service when the failure is detected, then move the IP and both services to another cluster member.

Matt Simmons
  • 20,218
  • 10
  • 67
  • 114
  • The two processes that are dependent on the disk should run on different nodes and are dependent on different LUNs. I clarified my question in this regard. – js. Nov 20 '11 at 14:10

The only way to make this work was to give the services not needing a disk their own virtual IP addresses.

cluster.conf now looks like this:

<?xml version="1.0" ?>
<cluster config_version="1" name="cluster">
  <fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="3"/>
  <cman shutdown_timeout="10000"/>
    <clusternode name="node1" nodeid="1" votes="1">
        <method name="1">
          <device name="device1"/>
    <clusternode name="node2" nodeid="2" votes="1">
        <method name="1">
          <device name="device2"/>
    <clusternode name="node3" nodeid="3" votes="1">
        <method name="1">
          <device name="device3"/>
      <fencedevice agent="fence_ilo" ipaddr="" login="admin" name="device1" passwd="password"/>
      <fencedevice agent="fence_ilo" ipaddr="" login="admin" name="device2" passwd="password"/>
      <fencedevice agent="fence_ilo" ipaddr="" login="admin" name="device3" passwd="password"/>
      <failoverdomain name="domain1" nofailback="0">
        <failoverdomainnode name="node1" priority="1"/>
      <failoverdomain name="domain2" nofailback="0">
        <failoverdomainnode name="node2" priority="1"/>
      <ip address="" monitor_link="1"/>
      <ip address="" monitor_link="1"/>
      <ip address="" monitor_link="1"/>
      <ip address="" monitor_link="1"/>
    <service autostart="1" exclusive="0" name="disk1" recovery="restart" domain="domain1">
      <ip ref=""/>
      <script file="/etc/init.d/disk1" name="disk1"/>
      <fs device="/dev/VolGroup10/LogVol10" force_fsck="0" force_unmount="1" fstype="ext3" mountpoint="/mnt/lun1" name="lun1" self_fence="1"/>
      <lvm lv_name="LogVol10" name="VolGroup10/LogVol10" vg_name="VolGroup10"/>
    <service autostart="1" exclusive="0" name="nodisk1" recovery="restart" domain="domain1">
      <ip ref=""/>
      <script file="/etc/init.d/nodisk1" name="nodisk1"/>
    <service autostart="1" exclusive="0" name="disk2" recovery="restart" domain="domain2">
      <ip ref=""/>
      <script file="/etc/init.d/disk2" name="disk2"/>
      <fs device="/dev/VolGroup20/LogVol20" force_fsck="0" force_unmount="1" fstype="ext3" mountpoint="/mnt/lun2" name="lun2" self_fence="1"/>
      <lvm lv_name="LogVol20" name="VolGroup20/LogVol20" vg_name="VolGroup20"/>
    <service autostart="1" exclusive="0" name="nodisk2" recovery="restart" domain="domain2">
      <ip ref=""/>
      <script file="/etc/init.d/nodisk2" name="nodisk2"/>
  • 123
  • 1
  • 5