11

Using nfsstat -c, I'm seeing a high "authrefrsh" (known as "newcred" on some systems) count on my NFS client pc for operations like ls and find on directories containing ~1000 files. This correlates with very poor performance (20+ minute directory listings). Cached NFS operations do not exhibit this behavior (the authrefrsh or the slowdown).

authrefrsh = calls every time I check nfsstat:

$ nfsstat -c

Client rpc stats:
calls      retrans    authrefrsh
280462     0          280462

Client nfs v3:
null         getattr      setattr      lookup       access       readlink
0         0% 126990   45% 0         0% 10062     3% 58592    20% 0         0%
read         write        create       mkdir        symlink      mknod
25030     8% 0         0% 65        0% 0         0% 2         0% 0         0%
remove       rmdir        rename       link         readdir      readdirplus
0         0% 0         0% 0         0% 0         0% 0         0% 59654    21%
fsstat       fsinfo       pathconf     commit
0         0% 20        0% 10        0% 0         0%

Connection details:

$ mount.nfs -v nfshost:/share/dir /somedir
mount.nfs: timeout set for Tue Feb 21 18:12:18 2012
mount.nfs: trying text-based options 'vers=4,addr=192.168.xx.xx,clientaddr=192.168.xx.xx'
mount.nfs: mount(2): Operation not permitted
mount.nfs: trying text-based options 'addr=192.168.xx.xx'
mount.nfs: prog 100003, trying vers=3, prot=6
mount.nfs: trying 192.168.xx.xx prog 100003 vers 3 prot TCP port 2049
mount.nfs: prog 100005, trying vers=3, prot=17
mount.nfs: trying 192.168.xx.xx prog 100005 vers 3 prot UDP port 1011
nfshost:/share/dir on /somedir type nfs

nfshost RPC environment:

$ rpcinfo -T udp nfshost nfs
program 100003 version 2 ready and waiting
program 100003 version 3 ready and waiting
program 100003 version 4 ready and waiting

$ rpcinfo -T udp nfshost mountd
program 100005 version 1 ready and waiting
program 100005 version 2 ready and waiting
program 100005 version 3 ready and waiting

$ rpcinfo -T udp nfshost nlockmgr
program 100021 version 1 ready and waiting
rpcinfo: RPC: Program/version mismatch; low version = 1, high version = 4
program 100021 version 2 is not available
program 100021 version 3 ready and waiting
program 100021 version 4 ready and waiting

$ rpcinfo -T udp nfshost llockmgr
rpcinfo: RPC: Program not registered

$ rpcinfo nfshost
program version netid     address                service    owner
100000    2    tcp       0.0.0.0.0.111          portmapper unknown
100000    2    udp       0.0.0.0.0.111          portmapper unknown
100024    1    udp       0.0.0.0.2.212          status     unknown
100024    1    tcp       0.0.0.0.2.215          status     unknown
100021    1    udp       0.0.0.0.226.67         nlockmgr   unknown
100021    3    udp       0.0.0.0.226.67         nlockmgr   unknown
100021    4    udp       0.0.0.0.226.67         nlockmgr   unknown
100021    1    tcp       0.0.0.0.134.55         nlockmgr   unknown
100021    3    tcp       0.0.0.0.134.55         nlockmgr   unknown
100021    4    tcp       0.0.0.0.134.55         nlockmgr   unknown
100011    1    udp       0.0.0.0.3.230          rquotad    unknown
100011    2    udp       0.0.0.0.3.230          rquotad    unknown
100011    1    tcp       0.0.0.0.3.233          rquotad    unknown
100011    2    tcp       0.0.0.0.3.233          rquotad    unknown
100003    2    udp       0.0.0.0.8.1            nfs        unknown
100003    3    udp       0.0.0.0.8.1            nfs        unknown
100003    4    udp       0.0.0.0.8.1            nfs        unknown
100003    2    tcp       0.0.0.0.8.1            nfs        unknown
100003    3    tcp       0.0.0.0.8.1            nfs        unknown
100003    4    tcp       0.0.0.0.8.1            nfs        unknown
100005    1    udp       0.0.0.0.3.243          mountd     unknown
100005    1    tcp       0.0.0.0.3.246          mountd     unknown
100005    2    udp       0.0.0.0.3.243          mountd     unknown
100005    2    tcp       0.0.0.0.3.246          mountd     unknown
100005    3    udp       0.0.0.0.3.243          mountd     unknown
100005    3    tcp       0.0.0.0.3.246          mountd     unknown

Environment:

$ uname -a
Linux whiteheat 3.0.0-15-generic #26-Ubuntu SMP Fri Jan 20 17:23:00 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux

$ mount.nfs a b -V
mount.nfs: (linux nfs-utils 1.2.4)
peterh
  • 4,914
  • 13
  • 29
  • 44
Chris Betti
  • 380
  • 3
  • 14
  • any updates? I've noticed poorer performance of nfs clients with newer kernels, e.g. SLES 11 SP2 and CentOS 6.4 vs SLES 9 SP4. The only difference I see in stats is that authrefrsh is very high. I'm assuming this is extra overhead that causes a degrade in performance. – Banjer May 06 '13 at 11:41
  • No updates, sorry. I've moved away from NFS for my application, because SSH + SCP was an option. The issue was crippling :) – Chris Betti May 06 '13 at 15:45
  • are you sure it's not nfs v3 vs. v4 issue? – kofemann May 07 '13 at 07:50
  • fyi this Unix SE question has more leads on this issue: http://unix.stackexchange.com/questions/13557/slow-nfs-nfsstat-c-what-is-authrefrsh-aka-newcreds-field-about-in-detail – Banjer May 11 '13 at 14:20
  • 1
    More on the history of the bug: https://bugzilla.redhat.com/show_bug.cgi?id=785931 – Deer Hunter Jun 11 '14 at 06:39
  • Are you hosting mailboxes over NFS? That behavior seems very similar to webmail users checking their inboxes all the time (and increasing getattr counts). We once had this problem and increase the webmail refresh time "fixed" the issue. There is only so much I/O your storage will take. – Giovanni Tirloni Aug 11 '14 at 18:18
  • This was a shared server hosting files primarily in the 1mb to 1gb range. – Chris Betti Aug 11 '14 at 22:10
  • the readdirplus issue has caused issues to many people, it is incredibly stupid design to turn off the feature when it's most needed. But as to the issue, I think this is caused by either "noac" or wrong "actimeo" settings. – Florian Heigl Sep 24 '14 at 19:05

1 Answers1

2

I encountered this exact issue with NFS. The problem in my case was caused by actimeo being set too low. While you may not be using this exact setting, there is a whole family of settings that can cause havoc: acregmin, acregmax, acdirmin, and acdirmax. Essentially what happens is that the system is caching the file attributes from the NFS. These settings affect how long the file attributes are kept before refreshing from NFS. On a system with heavy usage, these refreshes become painfully obvious.

Another problematic setting is noac. If you use this, you guarantee that any writes will be immediately available to all other clients. However, writes will wait until finished writing to the remote rather than using write-behind. This can bring a system to its knees if it frequently writes to NFS.

This is an interesting blog article where they talk about the different options and their effect on NFS performance. You could also look at the man page for NFS for more guidance. Unfortunately, authrefrsh can be a bit of a red herring and my issue may be totally unrelated, albeit with similar symptoms.

Foosh
  • 200
  • 3
  • 9
  • iirc noac also just concerns credential caching, so writes to permission metadata are immediate, no "writes". Didn't edit since i'm not all sure now. – Florian Heigl Jan 07 '15 at 13:39
  • i'm just seeing a noac-related performance issue so this is a thing in fact, if working with non-enterprise NFS servers. – Florian Heigl May 03 '16 at 17:05