3

I have a Solaris 5.10 (I think that's considered Solaris 10) server - we'll call it "Bill" - which I own, but a vendor supports the additional software installed. They have to manually stop services and make sure data is flushed to its local database before rebooting. Otherwise I would have rebooted it.

"Bill" makes backups to an NFS server running Ubuntu 16 LTS. Other Unix (5.)9 servers on site also make backups to this server.

On "Bill", I'm getting five errors spaced a minute apart reading "NFS compound failed for server 10.0.2.18: error 5 (RPC: Timed out)" when I try to mount the share. After the fifth I get "nfs mount: mount: /nfsmnt: Connection timed out". I'm not relying on DNS to find the NFS Server, 10.0.2.18. showmount -e 10.0.2.18 reports the NFS export properly.

export list for 10.0.2.18:
/data 10.0.0.0/16

I can change the export properties, restart NFS services on the server, and "Bill" sees the changes.

"Bill" can mount other NFS shares, within and without the same subnet. Other clients within and without the same subnet can mount the /data share on the NFS Server without problem.

rpcinfo -p 10.0.2.18 reports NFS v4 with tcp/udp is supported. I don't want to use udp though.

program vers proto   port  service
100000    4   tcp    111  rpcbind
100000    3   tcp    111  rpcbind
100000    2   tcp    111  rpcbind
100000    4   udp    111  rpcbind
100000    3   udp    111  rpcbind
100000    2   udp    111  rpcbind
100005    1   udp  52533  mountd
100005    1   tcp  33303  mountd
100005    2   udp  52711  mountd
100005    2   tcp  60660  mountd
100005    3   udp  34912  mountd
100005    3   tcp  50746  mountd
100003    2   tcp   2049  nfs
100003    3   tcp   2049  nfs
100003    4   tcp   2049  nfs
100227    2   tcp   2049  nfs_acl
100227    3   tcp   2049  nfs_acl
100003    2   udp   2049  nfs
100003    3   udp   2049  nfs
100003    4   udp   2049  nfs
100227    2   udp   2049  nfs_acl
100227    3   udp   2049  nfs_acl
100021    1   udp  53804  nlockmgr
100021    3   udp  53804  nlockmgr
100021    4   udp  53804  nlockmgr
100021    1   tcp  44612  nlockmgr
100021    3   tcp  44612  nlockmgr
100021    4   tcp  44612  nlockmgr

The NFS client appears to be running. Today's date shows because I stopped and restarted NFS client with svcadm.

# svcs -xv nfs/client
svc:/network/nfs/client:default (NFS client)
 State: online since Sun Dec 11 18:45:40 2016
   See: man -M /usr/share/man -s 1M mount_nfs
   See: /var/svc/log/network-nfs-client:default.log
Impact: None

The log file mentioned in the "See" line reports

# tail /var/svc/log/network-nfs-client:default.log
[ Nov 17 17:13:47 Stopping because service disabled. ]
[ Nov 17 17:13:47 Executing stop method ("/lib/svc/method/nfs-client stop") ]
[ Nov 17 17:13:48 Method "stop" exited with status 0 ]
[ Nov 17 17:15:29 Executing start method ("/lib/svc/method/nfs-client start") ]
[ Nov 17 17:15:29 Method "start" exited with status 0 ]
[ Dec 11 18:45:39 Stopping because service restarting. ]
[ Dec 11 18:45:39 Executing stop method ("/lib/svc/method/nfs-client stop") ]
[ Dec 11 18:45:39 Method "stop" exited with status 0 ]
[ Dec 11 18:45:39 Executing start method ("/lib/svc/method/nfs-client start") ]
[ Dec 11 18:45:40 Method "start" exited with status 0 ]

I shutdown the NFS Server Monday 12/5 to install memory. That night, 12/6, at 2am was the last successful backup. The timeouts started after that when the following scheduled backup, 12/7 2am, tried to run.

df -kh and ls against the mountpoint timed out prior to my umount /nfsmnt.

This timeout issue happened a couple months ago and a reboot fixed the behavior. The server has been up for 24 days, the first 20 of which the backups were successful. No firewall changes.

user38537
  • 273
  • 3
  • 13
  • 1
    I feel it's a TCP issue; can you try to mount with NFSv3 ? `vers=3,proto=udp` – Tolsadus Dec 20 '16 at 21:48
  • 1
    UDP is a no-go. I need TCP reliability for these backups. I've tried mounting with `-o vers=3` & it complains the option argument is invalid. In the mean time, I had the vendor reboot the server and it's up'n'running again. I can't do any troubleshooting until this issue surfaces again. When I did some troubleshooting when I made this post I was able to mount other NFS shares on this server with the same command I used to mount the primary NFS share. I was also able to mount the primary NFS share from other machines. So it was an issue between the client & the primary NFS server. – user38537 Dec 20 '16 at 22:10
  • Just wanted to try udp to see if it was helping. Not as a final solution, but more like troubleshooting. Any badluck that your internet line had some issues between you and the client ? – Tolsadus Dec 20 '16 at 22:13
  • No this is all internal to the LAN. No issues there. – user38537 Dec 21 '16 at 04:08

1 Answers1

0

Issue seems to be pointing to your NFS server. And perhaps the outage to install memory on the NFS server are related.

Also, you may want to check that the time on the servers are in sync.

sleepyweasel
  • 171
  • 6