NFS failover fails with stale file handles while migrating resources

Question

Running into a bit of a problem here , I set up two servers ( Centos 6 ) with Glusterfs and a shared directory between them, I have moved the nfs directory to the shared Gluster folder and have created a symlink on both boxes. The machines can talk to themselves via hostnames and Gluster replication is handled via another ethernet card between the servers.

The problem I am having, is that even though the resources fail over correctly ( though it seems to come up and down a few times while failing over ) , I get stale nfs handles on the client. Below is my crm config; what am I doing wrong?

The nfs mount on the client is as plain as possible.

node GlusterFS01 
node GlusterFS02 
primitive ClusterIP ocf:heartbeat:IPaddr2 \ 
        params ip="10.10.10.167" cidr_netmask="24" clusterip_hash="sourceip" \ 
        op monitor interval="5s" 
primitive exportfs ocf:heartbeat:exportfs \ 
        params fsid="0" directory="/GlusterFS/Files" \
        options="rw,sync,no_subtree_check,no_root_squash" \ 
        clientspec="10.10.10.0/24" \        
        wait_for_leasetime_on_stop="false" \ 
        op monitor interval="5s" \ 
        op start interval="0s" timeout="240s" \ 
        op stop interval="0s" timeout="100s" \ 
        meta is-managed="true" target-role="Started" 
primitive nfs lsb:nfs \ 
        meta target-role="Started" \ 
        op monitor interval="5s" timeout="5s" 
colocation sitewithnfs inf: ClusterIP exportfs nfs 
order nfsorder inf: exportfs ClusterIP nfs 
property $id="cib-bootstrap-options" \ 
        dc-version="1.1.10-14.el6_5.2-368c726" \ 
        cluster-infrastructure="classic openais (with plugin)" \ 
        expected-quorum-votes="2" \ 
        stonith-enabled="false" \ 
        no-quorum-policy="ignore" \ 
last-lrm-refresh="1395246465" \ 
        default-resource-stickiness="100" 
rsc_defaults $id="rsc-options" \ 
        resource-stickiness="100"

Thank you for your time.

Update1: I have decided that I was overcomplicating everything. After a call with Florian, he convinced me to simplify. I am sharing nfs directly from Gluster, and I just have the ip resource being handled by corosync/pacemaker. Much simpler solution and it fits my needs.

I will say however that Dok was completely correct in his assessment and suggestions, even though I was not able to get it up and running 100% on the productions environment ( even thought is worked in testing ).

yes : /etc/exports: /GlusterFS/Files 10.10.10.0/24(rw,fsid=0,sync,no_root_squash,no_subtree_check) — Roncioiu, Mar 20 '14 at 17:06

score 0 · Accepted Answer · edited Jun 11 '20 at 10:02

colocation sitewithnfs inf: ClusterIP exportfs nfs

order nfsorder inf: exportfs ClusterIP nfs

Firstly, I believe you want to start the nfsd before the export.

Adding the unlock_on_stop="true" parameter to the exportfs resource agent may also help, but what has really made the difference in my testing was to stop the virtual IP first during the failovers. I am not entirely sure why, but I suspect it has to do with closing the connections before attempting to stop the exports.

Also, I recall there being issues with "resource sets" (i.e. ordering and colocation constraints with more than two resources) in older versions of pacemaker. I would instead suggest removing your ordering and colocation constraints and replacing them with a single resource group like so:

group g_nfs nfs exportfs ClusterIP

P.S. The exportfs resource agent should handle all the exports. Your /etc/exports file should be empty.

Hello Dok, I have added the group, emptied the export file in /etc/exports , removed the colocation and order. I was not able to add unlock_on_stop="true as it complained that the attribute does not exist; which is funny as I am using resource-agents-3.9.5-3.6.x86_64. I have also tried performing a failover, and am getting the same problem , with the stale file handle. I will try later putting the colocation and order back , but killing ClusterIP first, to see if it makes a difference. — Roncioiu, Mar 20 '14 at 22:06

NFS failover fails with stale file handles while migrating resources

1 Answers1