5

I'm configuring a two node A/A cluster with a common storage attached via iSCSI, which uses GFS2 on top of clustered LVM. So far I have prepared a simple configuration, but am not sure which is the right way to configure gfs resource.

Here is the rm section of /etc/cluster/cluster.conf:

<rm>
    <failoverdomains>
        <failoverdomain name="node1" nofailback="0" ordered="0" restricted="1">
            <failoverdomainnode name="rhc-n1"/>
        </failoverdomain>
        <failoverdomain name="node2" nofailback="0" ordered="0" restricted="1">
            <failoverdomainnode name="rhc-n2"/>
        </failoverdomain>
    </failoverdomains>
    <resources>
        <script file="/etc/init.d/clvm" name="clvmd"/>
        <clusterfs name="gfs" fstype="gfs2" mountpoint="/mnt/gfs"  device="/dev/vg-cs/lv-gfs"/>
    </resources>
    <service name="shared-storage-inst1" autostart="0" domain="node1" exclusive="0" recovery="restart">
        <script ref="clvmd">
            <clusterfs ref="gfs"/>
        </script>
    </service>
    <service name="shared-storage-inst2" autostart="0" domain="node2" exclusive="0" recovery="restart">
        <script ref="clvmd">
            <clusterfs ref="gfs"/>
        </script>
    </service>
</rm>

This is what I mean: when using clusterfs resource agent to handle GFS partition, it is not unmounted by default (unless force_unmount option is given). This way when I issue

clusvcadm -s shared-storage-inst1

clvm is stopped, but GFS is not unmounted, so a node cannot alter LVM structure on shared storage anymore, but can still access data. And even though a node can do it quite safely (dlm is still running), this seems to be rather inappropriate to me, since clustat reports that the service on a particular node is stopped. Moreover if I later try to stop cman on that node, it will find a dlm locking, produced by GFS, and fail to stop.

I could have simply added force_unmount="1", but I would like to know what is the reason behind the default behavior. Why is it not unmounted? Most of the examples out there silently use force_unmount="0", some don't, but none of them give any clue on how the decision was made.

Apart from that I have found sample configurations, where people manage GFS partitions with gfs2 init script - https://alteeve.ca/w/2-Node_Red_Hat_KVM_Cluster_Tutorial#Defining_The_Resources

or even as simply as just enabling services such as clvm and gfs2 to start automatically at boot (http://pbraun.nethence.com/doc/filesystems/gfs2.html), like:

chkconfig gfs2 on

If I understand the latest approach correctly, such cluster only controls whether nodes are still alive and can fence errant ones, but such cluster has no control over the status of its resources.

I have some experience with Pacemaker and I'm used to that all resources are controlled by a cluster and an action can be taken when not only there are connectivity issues, but any of the resources misbehave.

So, which is the right way for me to go:

  1. leave GFS partition mounted (any reasons to do so?)
  2. set force_unmount="1". Won't this break anything? Why this is not the default?
  3. use script resource <script file="/etc/init.d/gfs2" name="gfs"/> to manage GFS partition.
  4. start it at boot and don't include in cluster.conf (any reasons to do so?)

This may be a sort of question that cannot be answered unambiguously, so it would be also of much value for me if you shared your experience or expressed your thoughts on the issue. How does for example /etc/cluster/cluster.conf look like when configuring gfs with Conga or ccs (they are not available to me since for now I have to use Ubuntu for the cluster)?

Thanks you very much!

Pavel A
  • 153
  • 2
  • 13
  • 1
    Some of your answers may be found here: https://access.redhat.com/knowledge/solutions/48988 but it is behind the RedHat support paywall. – sysadmin1138 Dec 22 '12 at 04:40

1 Answers1

1

I have worked a little with clusters. These are my opinions on the subject.

could have simply added force_unmount="1",  but I would like to know what is
the reason behind the default behavior. Why is it not unmounted? 

If you choose to configure gfs as a clustered resource, and add the clvmd and gfs disk as resources, then when you failover with rgmanager it will try to umount the disk, so what I'd do in your case first is check logs (or lsof/fuser etc) for an indication why the umounting might have failed. Likely there is a process having a file open or something like that, preventing a "clean" umount.

Can it be because you don't use rgmanager to start your clustered application? I don't see it in your cluster.conf. That would if true explain the behaviour.

If you choose to force_unmount, what rgmanager will do when failing over/recovering is forcefully killing any recourse using the disk before umounting the disk. Weather that is a good idea or not depends.

clvm is stopped, but GFS is not unmounted, so a node cannot alter LVM structure 
on shared storage anymore, but can still access data. And even though a node can 
do it quite safely (dlm is still running), [...]  
Moreover if I later try to stop cman on that node, it will find a dlm locking,
produced by GFS, and fail to stop.

If you want to change LVM structure in this scenario you can start the clvmd daemon again manually. if you umount the gfs disk before stopping cman, that should work. On the other hand, in a production scenario I rarely find myself in a situation where I'd want to stop CMAN on a clustered node.

My preference is to go with option 4.

If I understand the latest approach correctly, such cluster only controls 
whether nodes are still alive and can fence errant ones, but such cluster
has no control over the status of its resources.

It is true that if you don't add gfs2 and clvmd resource as a cluster resource, rgmanager won't be able to control it. What I usually do when setting upp A/A clusters (depending on the case of course) is that I'd add the start script for my service as the clustered resource. (rgmanager will then call the script with status argument on a regular basis to determine weather it needs to take configured action). Since my script has a dependency on the gfs file system it will fail unless it is mounted.

The 4 approach implies manually enabeling clvmd, cman and gfs2 (and possibly other daemons too depending on the situation).

Since the GFS fileystem sits on top of a iSCSI device, adding _netdev option to the mount in /etc/fstab is a requirement for it to work.

  • This way I don't get an over-complicated cluster configuration, adding more services later will be less of a headache (say for example two services using the same disk or what ever)
  • when something does happen, my experience is that manual intervention is a lot easier with resources not managed by rgmanager
  • in my experience, it's not the gfs2 or clvmd services which goes wrong most in a cluster, but the services on top, so re-starting/mounting them often will only take you extra time.

There are a few disadvantages I can think of too:

  • Like you said, rgmanager will not manage these resources, and will take no action if for example the gfs filesystem would somehow fail/get umounted
  • having a gfs filesystem mounted a lot can generate unnecessary load on the device by for example updatedb and other jobs which might want to traverse the filesystem, thereby causing drive latency (locking traffic)

No matter what you decide

I would add the init script as a clustered resource, and if you chose to add gfs and clvm to the cluster as resources, I'd consider adding the __independent_subtree attribute to it, so if it fails, rgmanager won't re-mount the gfs filesystem. This depends of course on your particular situation. Note the nested configuration in the link, marking a sort of dependency tree.

Petter H
  • 3,383
  • 1
  • 14
  • 18
  • +1 Thanks for sharing the experience. I'm not working on this task right now, but I'll try your suggestions when I get back to it. – Pavel A Jun 12 '13 at 06:59