4

So we are having a medium-sized NFS4 storage server running CentOS 6.6 exporting an NFS share /storageDat (which is the NFS root, with two RAID volumes bound in there: ./dat1 and ./dat2); Export options: rw,sync,no_wdelay,no_subtree_check,fsid=0

We have the NFS mounted on quite a few Fedora20 workstations and desktops (>100) and most of the time everything works well, using large MTUs and client-side mounting options of rw,relatime,vers=4.0,rsize=8192,wsize=8192,namlen=255,soft,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=10.216.10.1,local_lock=none,addr=10.216.14.200, with mountpoint /storageDat also quite fast (read >400MByte/s transfers)

HOWEVER: From time to time, on single machines, NFS will do the following: a program to run has to access a certain deep folder on the NFS:

/storageDat/dat2/projects/other/Tool_does_special/ProjectX/Sample/tooloutputR2

Which will result in a hard 'No such file or directory' Looking on the server, the directory exists, the access-rights are correct (even tested with the user in question); back on the client:

ls -al the full file throws the 'No such file or directory'

ls -al'ing a parent directory from the full path, ie.

/storageDat/dat2/projects/other/Tool_does_special/

works and shows the subdirectories (ProjectX. ProjectY)

ls -al'ing /storageDat/dat2/projects/other/Tool_does_special/ProjectX returns the all-present-error message;

BUT going into the directory

cd /storageDat/dat2/projects/other/Tool_does_special/

and then executing a ls -alR shows all files in all subdirectories just fine. Directly following up with a

ls -al on

/storageDat/dat2/projects/other/Tool_does_special/ProjectX/Sample/tooloutputR2

however fails with a 'No such file[...]' message.

We think it is somewhere NFS cache related but simply cannot pinpoint the error, cannot predict its appearance, and cannot reliably make it go away. Not even thinking about fixing it.

Any input would be greatly appreciated! (And yes, I did rename my actual folders to something I can post online)

Martin Schröder
  • 315
  • 1
  • 5
  • 24
Mone
  • 41
  • 1
  • 3

1 Answers1

1

The same problem happened to me in my servers. I had 12 servers that mount a shared folder from a 13th server that only had NFS and the others 12 with Autofs.

Well, the thing is that in that 13th server I found out that autofs was running and whit the same config file that the others 12 servers, so basically he was trying to mount itself in the same origin/destination file.

I stopped Autofs service in that server and I was able to access the folder. This happend when you are working in several machines at once, at some point I setup autofs in the server when I should not.

chan!

Eduardo
  • 11
  • 1