1

I have a NFS server running Ubuntu 10.04, serving an OCFS2 filesystem. The setup is somehow complicated because the server has Heartbeat and Pacemaker installed to make a cluster with another server.

Anyway, the strange thing:

# tail -12 /var/log/messages
Jul 17 17:15:45 ctserv01 exportfs[14870]: INFO: Directory /export/homes is exported to 172.16.54.0/24 (started).
Jul 17 17:15:45 ctserv01 exportfs[14869]: INFO: Directory /export/proyectos is exported to 172.16.54.0/24 (started).
Jul 17 17:15:45 ctserv01 exportfs[14871]: INFO: Directory /export is exported to 172.16.54.0/24 (started).
Jul 17 17:16:15 ctserv01 exportfs[15960]: INFO: Directory /export/proyectos is exported to 172.16.54.0/24 (started).
Jul 17 17:16:15 ctserv01 exportfs[15961]: INFO: Directory /export is exported to 172.16.54.0/24 (started).
Jul 17 17:16:15 ctserv01 exportfs[15962]: INFO: Directory /export/homes is exported to 172.16.54.0/24 (started).
Jul 17 17:16:45 ctserv01 exportfs[17054]: INFO: Directory /export/proyectos is exported to 172.16.54.0/24 (started).
Jul 17 17:16:45 ctserv01 exportfs[17055]: INFO: Directory /export/homes is exported to 172.16.54.0/24 (started).
Jul 17 17:16:45 ctserv01 exportfs[17056]: INFO: Directory /export is exported to 172.16.54.0/24 (started).
Jul 17 17:17:15 ctserv01 exportfs[18168]: INFO: Directory /export is exported to 172.16.54.0/24 (started).
Jul 17 17:17:15 ctserv01 exportfs[18169]: INFO: Directory /export/proyectos is exported to 172.16.54.0/24 (started).
Jul 17 17:17:15 ctserv01 exportfs[18170]: INFO: Directory /export/homes is exported to 172.16.54.0/24 (started).

Logs show exportfs being respawned every 30 seconds. The NFS server works ok most of the time, but after some days, it starts hanging for up to 20 seconds every 13-15 minutes, and the people rants for this.

Maybe Pacemaker, or Heartbeat, or something else, is causing this. Or maybe it's the expected behaviour and the problem should be elsewhere?

Pacemaker configuration (notice /export/proyectos is Stopped now):

# crm configure show
node $id="06334af6-e766-457c-8c30-457080276507" ctserv01
node $id="bf53e028-9f27-4ef3-bb45-4fcef981e441" ctserv02
primitive ClusterIP ocf:heartbeat:IPaddr2 \
    params ip="172.16.54.56" cidr_netmask="24" nic="eth0"
primitive exports_nfs_home ocf:heartbeat:exportfs \
    params rmtab_backup="none" directory="/export/homes" clientspec="172.16.54.0/24" options="rw,async,no_subtree_check,insecure,root_squash" fsid="1" \
    op monitor interval="30s" \
    op start interval="0" timeout="240s" \
    meta target-role="Started"
primitive exports_nfs_proys ocf:heartbeat:exportfs \
    params rmtab_backup="none" directory="/export/proyectos" clientspec="172.16.54.0/24" options="rw,async,no_subtree_check,insecure,root_squash" fsid="2" \
    op monitor interval="30s" \
    op start interval="0" timeout="240s" \
    meta target-role="Stopped"
primitive exports_nfs_root ocf:heartbeat:exportfs \
    params rmtab_backup="none" directory="/export" clientspec="172.16.54.0/24" options="rw,async,no_subtree_check,insecure" fsid="0" \
    op monitor interval="30s" \
    op start interval="0" timeout="240s"
group grupo_nfs ClusterIP exports_nfs_root exports_nfs_home exports_nfs_proys
location nodo_preferido grupo_nfs 100: 06334af6-e766-457c-8c30-457080276507
order orden_de_recursos inf: ClusterIP exports_nfs_root exports_nfs_home exports_nfs_proys
property $id="cib-bootstrap-options" \
    dc-version="1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c" \
    cluster-infrastructure="Heartbeat" \
    stonith-enabled="false" \
    no-quorum-policy="ignore" \
    expected-quorum-votes="2" \
    cluster-recheck-interval="60min"

1 Answers1

2

The spammy log messages you see are coming from the ocf:heartbeat:exportfs resource agent. They appear every 30 seconds, which corresponds to the monitoring interval you specified in the exportfs primitive definitions. The resource agent is a bit too verbose, IMHO, but this should not be a problem. Just make sure you logrotate often enough that the logs don't fill up your disks. Or edit the resource agent to be less verbose.

Your problem is probably somewhere deeper in your cluster setup. Is the Pacemaker configuration you posted complete? It seems to me the cluster is not managing all the resources it should, like the NFS server itself, idmapd or bind mounts.

NFS (under Linux) is notoriously difficult to get right in a HA environment. I recommend you read this tech guide on HA NFS from Linbit, the guys who made DRBD and much of the Linux HA stack. Free registration is required, but it is a very good and detailed guide on setting up a working and stable NFS HA cluster. We operate several clusters like this in production.

daff
  • 4,729
  • 2
  • 26
  • 27