2

I've built an NFS cluster with OCFS2 as a backing filesystem.

This is working quite well with the exception of when a node is restarted/shutdown cleanly it causes a fence operation as network is severed before unmounting OCFS2 filesytems to inform the other node it is leaving the cluster.

Networking is configured as an LACP bond with two physical adapters, and two VLAN interfaces providing IP connectivity. This has all been configured in nmtui with potentially a couple small tweaks to the config files residing in /etc/sysconfig/network-scripts.

No matter what dependencies I add (Before/After/Requires etc) including network-online.target and various others, or adding in my own systemd service and script to handle the unmounts can I get the server to unmount the _netdev OCFS2 filesystems via systemd before the networking is torn down.

I've added a debug.sh to /usr/lib/systemd/system-shutdown to record some details, here's where the shutdown sequence is started, the 'ocfs unmounting device' is the point where OCFS2 is being unmounted..

[  309.286479] bonding: bond0: Warning: the permanent HWaddr of ens2f1 - 0c:c4:7a:bb:93:3f - is still in use by bond0. Set the HWaddr of ens2f1 
to a different address to avoid conflicts.
[  309.286484] bonding: bond0: releasing active interface ens2f1
[  309.288270] ixgbe 0000:02:00.1: removed PHC on ens2f1
[  309.806098] pps pps0: new PPS source ptp0
[  309.806100] ixgbe 0000:02:00.1: registered PHC device on ens2f1
[  310.028112] IPv6: ADDRCONF(NETDEV_UP): ens2f1: link is not ready
[  310.028114] 8021q: adding VLAN 0 to HW filter on device ens2f1
[  310.028750] bonding: bond0: Removing an active aggregator
[  310.028754] bonding: bond0: releasing active interface ens2f0
[  310.028755] bonding: bond0: Warning: clearing HW address of bond0 while it still has VLANs.
[  310.028756] bonding: bond0: When re-adding slaves, make sure the bond's HW address matches its VLANs'.
[  310.028773] device bond0 entered promiscuous mode
[  310.028818] device ens2f0 entered promiscuous mode
[  310.030895] ixgbe 0000:02:00.0: removed PHC on ens2f0
[  310.328057] nfsd: last server has exited, flushing export cache
[  310.549842] pps pps1: new PPS source ptp1
[  310.549844] ixgbe 0000:02:00.0: registered PHC device on ens2f0
[  310.772136] IPv6: ADDRCONF(NETDEV_UP): ens2f0: link is not ready
[  310.772137] 8021q: adding VLAN 0 to HW filter on device ens2f0
[  310.773358] IPv6: ADDRCONF(NETDEV_UP): bond0.3xxx: link is not ready
[  310.774187] IPv6: ADDRCONF(NETDEV_UP): bond0.31xx: link is not ready
[  310.775060] IPv6: ADDRCONF(NETDEV_UP): bond0: link is not ready
[  310.775560] IPv6: ADDRCONF(NETDEV_UP): bond0.3xxx: link is not ready
[  310.779437] IPv6: ADDRCONF(NETDEV_UP): bond0.31xx: link is not ready
[  310.832053] IPv6: ADDRCONF(NETDEV_UP): eno2: link is not ready
[  310.883931] IPv6: ADDRCONF(NETDEV_UP): eno1: link is not ready
[  312.995508] o2dlm: Leaving domain 78D2C50072B84C8899E6CA71D23E24CC
[  313.010803] ocfs2: Unmounting device (252,3) on (node 1)
[  313.022493] o2dlm: Leaving domain 4EBC5792914B4DC0B5C548A94924F48A
[  313.039052] ocfs2: Unmounting device (252,5) on (node 1)
[  314.108160] o2dlm: Leaving domain 0B217FA1ACA5452397F9DA8A8B792DA0
[  314.122868] ocfs2: Unmounting device (252,4) on (node 1)
[  314.231817] ixgbe 0000:02:00.0: removed PHC on ens2f0
[  314.893756] ixgbe 0000:02:00.1: removed PHC on ens2f1
[  315.709225] audit_printk_skb: 321 callbacks suppressed
[  315.709227] type=1305 audit(1476683800.733:156): audit_pid=0 old=1053 auid=4294967295 ses=4294967295 res=1
[  315.718254] type=1130 audit(1476683800.742:157): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=auditd comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[  315.718271] type=1131 audit(1476683800.742:158): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=auditd comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[  315.726152] type=1130 audit(1476683800.750:159): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-tmpfiles-setup comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[  315.726161] type=1131 audit(1476683800.750:160): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-tmpfiles-setup comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[  315.734233] type=1130 audit(1476683800.758:161): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=rhel-import-state comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[  315.734251] type=1131 audit(1476683800.758:162): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=rhel-import-state comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[  315.743145] type=1130 audit(1476683800.767:163): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=rhel-readonly comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[  315.743153] type=1131 audit(1476683800.767:164): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=rhel-readonly comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[  315.782154] type=1130 audit(1476683800.806:165): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=lvm2-monitor comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[  315.954555] systemd-shutdown[1]: Sending SIGTERM to remaining processes...
[  315.958696] systemd-journald[720]: Received SIGTERM from PID 1 (systemd-shutdow).
[  315.965900] systemd-shutdown[1]: Sending SIGKILL to remaining processes...
[  315.981786] systemd-shutdown[1]: Unmounting file systems.
[  316.002080] systemd-shutdown[1]: All filesystems unmounted.
[  316.002083] systemd-shutdown[1]: Deactivating swaps.
stuntkiwi
  • 21
  • 1

1 Answers1

0

The cause of this was the remote-fs.target was too quick to finish, with the networked nature of OCFS2 it took slightly longer for a proper unmount of the filesystem than the actual return of the unmount command.

With the next step in the shutdown sequence to shutdown networking this meant the last filesystem was always considered still online to other nodes, causing the fence.

My fix was to create my own mount/unmount script and systemd service with the appropriate delays in the script, with a dependency on NetworkManager-wait-online.target.

stuntkiwi
  • 21
  • 1