4

Client machines were able to connect to our NFS server earlier this afternoon, and everything was running fine. The setup has been working fine for several years. No configuration changes were made on the server.

The NFS server hung with a "too many open files" error, and unable to ssh into it, we shut it down via ACPI. After the NFS server was restarted, all attempts by clients to connect to it hang forever.

Steps taken so far:

Verify the NFS daemon is running

service nfs-kernel-server status
nfsd running

Restart NFS daemon. This is where I ran into something bizarre

When I run:

service nfs-kernel-server stop

It says:

 * Stopping NFS kernel daemon                                                                        [ OK ] 
 * Unexporting directories for NFS kernel daemon...                                                  [ OK ] 

Then I run:

service nfs-kernel-server status

and it says:

nfsd running

So no idea if it is actually stopping the service or not, since it claims to stop, but then says its still running anyway. Also, running stop multiple times does not produce an error- it just says it Stopping NFS kernel daemon each time I run the stop command.

When it is supposedly stopped, ps aux | grep nfsd shows:

root       761  0.0  0.0      0     0 ?        S<   Apr04   0:00 [nfsd4]
root       762  0.0  0.0      0     0 ?        S<   Apr04   0:00 [nfsd4_callbacks]
root       763  0.0  0.0      0     0 ?        D    Apr04   0:00 [nfsd]
root       764  0.0  0.0      0     0 ?        D    Apr04   0:00 [nfsd]
root       765  0.0  0.0      0     0 ?        D    Apr04   0:00 [nfsd]
root       766  0.0  0.0      0     0 ?        D    Apr04   0:00 [nfsd]
root       767  0.0  0.0      0     0 ?        D    Apr04   0:00 [nfsd]
root       768  0.0  0.0      0     0 ?        D    Apr04   0:00 [nfsd]
root       769  0.0  0.0      0     0 ?        D    Apr04   0:00 [nfsd]

So it appears that the stop command isn't actually stopping the process.

Reboot NFS Server again

Failing that, we rebooted the NFS server using reboot. We get the same problem every time we reboot, mount attempts still timeout, and NFS appears to be keep running even when we try to stop it.

Verify portmap is running

root@nfs:~# service portmap status
portmap start/running, process 540

Stopand restart portmap and NFS

I went through the motions of:

service nfs-kernel-server stop
service portmap stop
service portmap start
service nfs-kernel-server start

But since the nfs-kernel-server service doesn't actually stop when you tell it to (see above), it didn't do anything other than restart portmap.

Nick
  • 4,433
  • 29
  • 67
  • 95

0 Answers0