I have two Linode containers. Box A is our general purpose web server. It occasionally needs to access Box B, which is set up as an NFS server.
When Box B reboots, Box A is unable to access any NFS shares no matter what I do. After several hours of troubleshooting, I was finally able to narrow it down to a single step fix.
After Box B reboots:
$ sudo service nfs restart
These are both CentOS 6.8 boxes, up to date. NFS-related packages were all installed via yum, I believe. I did have some trouble getting the whole thing set up; it was not a smooth process, but after restarting the nfs service(s), everything works great.
If I
$ sudo service --status-all
there is no difference before and after issuing the restart. Maybe it's a timing issue? But I don't know how to even begin to trouble shoot this. What can I do?
Other things of note:
I'm using autofs to automatically mount the share on demand from Box A, but the share won't mount manually either
I spend my days on Windows and Mac desktops and servers, but I've been running websites on Linux for many years. I'm proficient in the things I need to do, but it's not my area of comfort and I spend a lot of time googling how to do new things.
I don't even know where to check. I didn't see anything obvious in the logs, but tell me what to look for and I'll post.
Update
On Box B:
[shorowitz@BoxB ~]$ sudo chkconfig --list nfs
nfs 0:off 1:off 2:on 3:on 4:on 5:on 6:off
[shorowitz@BoxB ~]$ sudo chkconfig --list nfslock
nfslock 0:off 1:off 2:on 3:on 4:on 5:on 6:off
Update 2
After a fresh reboot of BoxB, running
$ sudo showmount -e BoxB
from BoxA shows the expected mount points, but I'm unable to mount them. Simply restarting nfs on BoxB
$ sudo service nfs restart
Shutting down NFS daemon: [ OK ]
Shutting down NFS mountd: [ OK ]
Shutting down NFS services: [ OK ]
Shutting down RPC idmapd: [ OK ]
FATAL: Module nfsd not found.
FATAL: Error running install command for nfsd
Starting NFS services: [ OK ]
Starting NFS mountd: [ OK ]
Starting NFS daemon: [ OK ]
Starting RPC idmapd: [ OK ]
And the mounts are immediately available on BoxA. Those fatal errors appear on subsequent restarts as well when NFS is already working, so I don't know how relevant they are (I thought I had posted them already).
Additional Log Info
I issued the reboot command at 9:29 on Nov 15
grep -i "nfs" /var/log/message*
messages:Nov 15 09:29:08 BoxB kernel: nfsd: last server has exited, flushing export cache
messages:Nov 15 09:29:54 BoxB kernel: RPC: Registered tcp NFSv4.1 backchannel transport module.
messages:Nov 15 09:29:54 BoxB kernel: FS-Cache: Netfs 'nfs' registered for caching
messages:Nov 15 09:29:54 BoxB kernel: NFS: Registering the id_resolver key type
messages:Nov 15 09:29:54 BoxB kernel: nfs4filelayout_init: NFSv4 File Layout Driver Registering...
messages:Nov 15 09:29:54 BoxB kernel: Installing knfsd (copyright (C) 1996 okir@monad.swb.de).
messages:Nov 15 09:29:54 BoxB kernel: xenfs: not registering filesystem on non-xen platform
messages:Nov 15 09:29:54 BoxB rpc.mountd[2740]: NFS v4 mounts will be disabled unless fsid=0
messages:Nov 15 09:29:54 BoxB kernel: NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory
messages:Nov 15 09:29:54 BoxB kernel: NFSD: starting 90-second grace period (net ****************)
messages:Nov 15 09:33:39 BoxB kernel: nfsd: last server has exited, flushing export cache
messages:Nov 15 09:33:40 BoxB kernel: NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory
messages:Nov 15 09:33:40 BoxB kernel: NFSD: starting 90-second grace period (net ****************)
Update 3:
BoxB
[shorowitz@BoxB ~]$ sudo chkconfig --list | egrep "nfs|rpc"
nfs 0:off 1:off 2:on 3:on 4:on 5:on 6:off
nfslock 0:off 1:off 2:on 3:on 4:on 5:on 6:off
rpcbind 0:off 1:off 2:on 3:on 4:on 5:on 6:off
rpcgssd 0:off 1:off 2:off 3:on 4:on 5:on 6:off
rpcsvcgssd 0:off 1:off 2:off 3:off 4:off 5:off 6:off
[shorowitz@BoxB ~]$ sudo iptables --list -n -v
Chain INPUT (policy ACCEPT 0 packets, 0 bytes)
pkts bytes target prot opt in out source destination
0 0 ACCEPT all -- lo * 0.0.0.0/0 0.0.0.0/0
0 0 REJECT all -- !lo * 127.0.0.0/8 0.0.0.0/0 reject-with icmp-port-unreachable
18 710 ACCEPT icmp -- * * 0.0.0.0/0 0.0.0.0/0 state NEW icmp type 8
471 26200 ACCEPT tcp -- * * 0.0.0.0/0 0.0.0.0/0 tcp dpt:22 state NEW
204K 393M ACCEPT all -- * * {BoxA IP} 0.0.0.0/0
6721 754K ACCEPT all -- * * 0.0.0.0/0 0.0.0.0/0 state RELATED,ESTABLISHED
2859 168K LOG all -- * * 0.0.0.0/0 0.0.0.0/0 limit: avg 5/min burst 5 LOG flags 0 level 7 prefix `iptables_INPUT_denied: '
9229 628K REJECT all -- * * 0.0.0.0/0 0.0.0.0/0 reject-with icmp-port-unreachable
Chain FORWARD (policy ACCEPT 0 packets, 0 bytes)
pkts bytes target prot opt in out source destination
0 0 LOG all -- * * 0.0.0.0/0 0.0.0.0/0 limit: avg 5/min burst 5 LOG flags 0 level 7 prefix `iptables_FORWARD_denied: '
0 0 REJECT all -- * * 0.0.0.0/0 0.0.0.0/0 reject-with icmp-port-unreachable
Chain OUTPUT (policy ACCEPT 278K packets, 8386M bytes)
pkts bytes target prot opt in out source destination
[shorowitz@BoxB ~]$ sudo rpcinfo -p
program vers proto port service
100000 4 tcp 111 portmapper
100000 3 tcp 111 portmapper
100000 2 tcp 111 portmapper
100000 4 udp 111 portmapper
100000 3 udp 111 portmapper
100000 2 udp 111 portmapper
100024 1 udp 38148 status
100024 1 tcp 45681 status
100005 1 udp 37846 mountd
100005 1 tcp 59259 mountd
100005 2 udp 59934 mountd
100005 2 tcp 42645 mountd
100005 3 udp 33867 mountd
100005 3 tcp 41823 mountd
100003 2 tcp 2049 nfs
100003 3 tcp 2049 nfs
100003 4 tcp 2049 nfs
100227 2 tcp 2049 nfs_acl
100227 3 tcp 2049 nfs_acl
100003 2 udp 2049 nfs
100003 3 udp 2049 nfs
100003 4 udp 2049 nfs
100227 2 udp 2049 nfs_acl
100227 3 udp 2049 nfs_acl
100021 1 udp 37287 nlockmgr
100021 3 udp 37287 nlockmgr
100021 4 udp 37287 nlockmgr
100021 1 tcp 37579 nlockmgr
100021 3 tcp 37579 nlockmgr
100021 4 tcp 37579 nlockmgr
This returns nothing:
grep -v "^#" /etc/sysconfig/nfs
BoxA
$ chkconfig --list | egrep "nfs|rpc"
nfs 0:off 1:off 2:on 3:on 4:on 5:on 6:off
nfslock 0:off 1:off 2:on 3:on 4:on 5:on 6:off
rpcbind 0:off 1:off 2:on 3:on 4:on 5:on 6:off
rpcgssd 0:off 1:off 2:off 3:on 4:on 5:on 6:off
rpcsvcgssd 0:off 1:off 2:off 3:off 4:off 5:off 6:off
$ iptables --list -n -v
Chain INPUT (policy ACCEPT 0 packets, 0 bytes)
pkts bytes target prot opt in out source destination
390K 58M ACCEPT all -- lo * 0.0.0.0/0 0.0.0.0/0
0 0 REJECT all -- * * 0.0.0.0/0 127.0.0.0/8 reject-with icmp-port-unreachable
990K 7850M ACCEPT all -- * * 0.0.0.0/0 0.0.0.0/0 state RELATED,ESTABLISHED
0 0 DROP all -- * * 43.255.188.145 0.0.0.0/0
8 388 ACCEPT tcp -- * * 0.0.0.0/0 0.0.0.0/0 tcp dpt:587
11864 608K ACCEPT tcp -- * * 0.0.0.0/0 0.0.0.0/0 tcp dpt:25
1 40 ACCEPT tcp -- * * 0.0.0.0/0 0.0.0.0/0 tcp dpt:636
4545 238K ACCEPT tcp -- * * 0.0.0.0/0 0.0.0.0/0 tcp dpt:80
9759 553K ACCEPT tcp -- * * 0.0.0.0/0 0.0.0.0/0 tcp dpt:443
24 960 ACCEPT tcp -- * * 0.0.0.0/0 0.0.0.0/0 tcp dpt:8080
320 19152 ACCEPT tcp -- * * 0.0.0.0/0 0.0.0.0/0 state NEW tcp dpt:22
85 5681 ACCEPT icmp -- * * 0.0.0.0/0 0.0.0.0/0
3254 194K LOG all -- * * 0.0.0.0/0 0.0.0.0/0 limit: avg 5/min burst 5 LOG flags 0 level 7 prefix `iptables denied: '
3634 227K DROP all -- * * 0.0.0.0/0 0.0.0.0/0
Chain FORWARD (policy ACCEPT 0 packets, 0 bytes)
pkts bytes target prot opt in out source destination
0 0 DROP all -- * * 0.0.0.0/0 0.0.0.0/0
Chain OUTPUT (policy ACCEPT 0 packets, 0 bytes)
pkts bytes target prot opt in out source destination
1360K 1907M ACCEPT all -- * * 0.0.0.0/0 0.0.0.0/0
$ rpcinfo -p
program vers proto port service
100000 4 tcp 111 portmapper
100000 3 tcp 111 portmapper
100000 2 tcp 111 portmapper
100000 4 udp 111 portmapper
100000 3 udp 111 portmapper
100000 2 udp 111 portmapper
100024 1 udp 55882 status
100024 1 tcp 58283 status
100011 1 udp 875 rquotad
100011 2 udp 875 rquotad
100011 1 tcp 875 rquotad
100011 2 tcp 875 rquotad
100005 1 udp 43136 mountd
100005 1 tcp 55047 mountd
100005 2 udp 51117 mountd
100005 2 tcp 42791 mountd
100005 3 udp 44511 mountd
100005 3 tcp 46535 mountd
100003 2 tcp 2049 nfs
100003 3 tcp 2049 nfs
100003 4 tcp 2049 nfs
100227 2 tcp 2049 nfs_acl
100227 3 tcp 2049 nfs_acl
100003 2 udp 2049 nfs
100003 3 udp 2049 nfs
100003 4 udp 2049 nfs
100227 2 udp 2049 nfs_acl
100227 3 udp 2049 nfs_acl
100021 1 udp 43509 nlockmgr
100021 3 udp 43509 nlockmgr
100021 4 udp 43509 nlockmgr
100021 1 tcp 38725 nlockmgr
100021 3 tcp 38725 nlockmgr
100021 4 tcp 38725 nlockmgr
$ mount | grep nfs
nfsd on /proc/fs/nfsd type nfsd (rw)
sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw)
Update 14 November
BoxA:
$ cat /etc/auto.master.d/nfs
xdata -rw boxb:/srv/nfs/xdata
xbackup -rw boxb:/srv/nfs/xbackup
zbackups -rw boxb:/srv/nfs/zbackups
$ mount | grep nfs
mount |grep nfs
nfsd on /proc/fs/nfsd type nfsd (rw)
sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw)
boxb:/srv/nfs/xdata on /mnt/nfs/xdata type nfs (rw,sloppy,vers=4,addr={boxb ip},clientaddr={boxa ip})