So we are having a medium-sized NFS4 storage server running CentOS 6.6
exporting an NFS share /storageDat
(which is the NFS root, with two RAID volumes bound in there: ./dat1
and ./dat2
); Export options: rw,sync,no_wdelay,no_subtree_check,fsid=0
We have the NFS mounted on quite a few Fedora20 workstations and desktops (>100) and most of the time everything works well, using large MTUs and client-side mounting options of rw,relatime,vers=4.0,rsize=8192,wsize=8192,namlen=255,soft,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=10.216.10.1,local_lock=none,addr=10.216.14.200
, with mountpoint /storageDat
also quite fast (read >400MByte/s transfers)
HOWEVER: From time to time, on single machines, NFS will do the following: a program to run has to access a certain deep folder on the NFS:
/storageDat/dat2/projects/other/Tool_does_special/ProjectX/Sample/tooloutputR2
Which will result in a hard 'No such file or directory' Looking on the server, the directory exists, the access-rights are correct (even tested with the user in question); back on the client:
ls -al
the full file throws the 'No such file or directory'
ls -al
'ing a parent directory from the full path, ie.
/storageDat/dat2/projects/other/Tool_does_special/
works and shows the subdirectories (ProjectX. ProjectY)
ls -al
'ing /storageDat/dat2/projects/other/Tool_does_special/ProjectX
returns the all-present-error message;
BUT going into the directory
cd /storageDat/dat2/projects/other/Tool_does_special/
and then executing a ls -alR
shows all files in all subdirectories just fine. Directly following up with a
ls -al
on
/storageDat/dat2/projects/other/Tool_does_special/ProjectX/Sample/tooloutputR2
however fails with a 'No such file[...]' message.
We think it is somewhere NFS cache related but simply cannot pinpoint the error, cannot predict its appearance, and cannot reliably make it go away. Not even thinking about fixing it.
Any input would be greatly appreciated! (And yes, I did rename my actual folders to something I can post online)