5

I am currently hunting down some performance issues on some VMWare virtual machines with the same amount of memory and cpus, running Oracle Linux 6.6 (Kernel 2.6) and 6.7 (Kernel 3.8). The machines mount the same share over NFS4 which contains some shared libraries that are included in the build using LD_LIBRARY_PATH. Both systems use the same mount options (default), which apparently means "hard" for 6.7 and "soft" for 6.6. Starting with 6.7 we observed that the performance of our compilation process dropped by a factor of 5 while the CPUs were idle around 80%, but also no high io-wait could be observed (top reports ~0.4%wa).

Trying to reproduce the issue I soon found out that not only compilation but almost any command like for example "ls" was much slower on 6.7 as soon as the shared libraries from the NFS mount were included via LD_LIBRARY_PATH.

I started my investigation using a simple "time":

On 6.6: Without LD_LIBRARY_PATH and PATH being set:

$ time for i in $(seq 0 1000); do ls done;
... ls output 
real    0m2.917s
user    0m0.288s
sys     0m1.012s

with LD_LIBRARY_PATH and PATH being set to include directories on NFS

$ time for i in $(seq 0 1000); do ls done;
... ls output
real    0m2.766s
user    0m0.184s
sys     0m1.051s

On 6.7 without LD_LIBRARY_PATH

$ time for i in $(seq 0 1000); do ls done;
...
real    0m5.144s
user    0m0.280s
sys     0m1.172s

and with LD_LIBRARY_PATH

$ time for i in $(seq 0 1000); do ls done;
...
real    1m27.329s
user    0m0.537s
sys     0m1.792s

The huge overhead made me curious and with strace I found that the resolution of some shared libraries on the NFS share takes a very long time:

Again, without the LD_LIBRARY_PATH being set the "open" calls in the strace output look like this:

$ strace -T ls 2>&1|vim - # keep only the "open" calls

open("/etc/ld.so.cache", O_RDONLY)      = 3 <0.000014>
open("/lib64/libselinux.so.1", O_RDONLY) = 3 <0.000013>
open("/lib64/librt.so.1", O_RDONLY)     = 3 <0.000016>
open("/lib64/libcap.so.2", O_RDONLY)    = 3 <0.000014>
open("/lib64/libacl.so.1", O_RDONLY)    = 3 <0.000014>
open("/lib64/libc.so.6", O_RDONLY)      = 3 <0.000016> 
open("/lib64/libdl.so.2", O_RDONLY)     = 3 <0.000014>
open("/lib64/libpthread.so.0", O_RDONLY) = 3 <0.000014>
open("/lib64/libattr.so.1", O_RDONLY)   = 3 <0.000014>
open("/proc/filesystems", O_RDONLY)     = 3 <0.000032>
open("/usr/lib/locale/locale-archive", O_RDONLY) = 3 <0.000014>
open(".", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 3 <0.001255>

with LD_LIBRARY_PATH it looks like this:

open("/usr/local/lib/librt.so.1", O_RDONLY) = -1 ENOENT (No such file or directory) <0.000013>
open("/lib64/librt.so.1", O_RDONLY)     = 3 <0.000018>
open("/oracle/current/lib/libcap.so.2", O_RDONLY) = -1 ENOENT (No such file or directory) <0.006196>
open("/opt/development/opt/gcc/gcc-5.3.0/lib64/libcap.so.2", O_RDONLY) = -1 ENOENT (No such file or directory) <0.002042>
open("/usr/local/lib/libcap.so.2", O_RDONLY) = -1 ENOENT (No such file or directory) <0.000035>
open("/lib64/libcap.so.2", O_RDONLY)    = 3 <0.000039>
open("/oracle/current/lib/libacl.so.1", O_RDONLY) = -1 ENOENT (No such file or directory) <0.009304>
open("/opt/development/opt/gcc/gcc-5.3.0/lib64/libacl.so.1", O_RDONLY) = -1 ENOENT (No such file or directory) <0.009107>
open("/usr/local/lib/libacl.so.1", O_RDONLY) = -1 ENOENT (No such file or directory) <0.000023>
open("/lib64/libacl.so.1", O_RDONLY)    = 3 <0.000027>
open("/oracle/current/lib/libc.so.6", O_RDONLY) = -1 ENOENT (No such file or directory) <0.009520>
open("/opt/development/opt/gcc/gcc-5.3.0/lib64/libc.so.6", O_RDONLY) = -1 ENOENT (No such file or directory) <0.007850>
open("/usr/local/lib/libc.so.6", O_RDONLY) = -1 ENOENT (No such file or directory) <0.000024>
open("/lib64/libc.so.6", O_RDONLY)      = 3 <0.000030>
open("/oracle/current/lib/libdl.so.2", O_RDONLY) = -1 ENOENT (No such file or directory) <0.006916>
open("/opt/development/opt/gcc/gcc-5.3.0/lib64/libdl.so.2", O_RDONLY) = -1 ENOENT (No such file or directory) <0.013979>
open("/usr/local/lib/libdl.so.2", O_RDONLY) = -1 ENOENT (No such file or directory) <0.000023>
open("/lib64/libdl.so.2", O_RDONLY)     = 3 <0.000030>
open("/oracle/current/lib/libpthread.so.0", O_RDONLY) = -1 ENOENT (No such file or directory) <0.015317>
open("/opt/development/opt/gcc/gcc-5.3.0/lib64/libpthread.so.0", O_RDONLY) = -1 ENOENT (No such file or directory) <0.014230>
open("/usr/local/lib/libpthread.so.0", O_RDONLY) = -1 ENOENT (No such file or directory) <0.000014>
open("/lib64/libpthread.so.0", O_RDONLY) = 3 <0.000019>
open("/oracle/current/lib/libattr.so.1", O_RDONLY) = -1 ENOENT (No such file or directory) <0.015212>
open("/opt/development/opt/gcc/gcc-5.3.0/lib64/libattr.so.1", O_RDONLY) = -1 ENOENT (No such file or directory) <0.011979>
open("/usr/local/lib/libattr.so.1", O_RDONLY) = -1 ENOENT (No such file or directory) <0.000014>
open("/lib64/libattr.so.1", O_RDONLY)   = 3 <0.000018>
open("/proc/filesystems", O_RDONLY)     = 3 <0.000025>
open(".", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 3 <0.000014>

Apart from making more calls, which is the same on 6.6, on 6.7 some of the (unsuccessful) opens take a very long time on 6.7 (up to 0.01 compared to ~0.000020).

So I started investigating NFS. And nfsstat indeed shows some surprisingly different stats for both systems:

On 6.7

$ nfsstat
Client rpc stats:
calls      retrans    authrefrsh
1314991    0          1315849 

Client nfs v4:
null         read         write        commit       open         open_conf    
0         0% 3782      0% 1589      0% 1         0% 561257   42% 53        0% 
open_noat    open_dgrd    close        setattr      fsinfo       renew        
0         0% 0         0% 4750      0% 383       0% 7         0% 4094      0% 
setclntid    confirm      lock         lockt        locku        access       
2         0% 2         0% 80        0% 0         0% 80        0% 538017   40% 
getattr      lookup       lookup_root  remove       rename       link         
172506   13% 20536     1% 2         0% 112       0% 541       0% 2         0% 
symlink      create       pathconf     statfs       readlink     readdir      
0         0% 9         0% 5         0% 2057      0% 164       0% 942       0% 
server_caps  delegreturn  getacl       setacl       fs_locations rel_lkowner  
12        0% 2968      0% 0         0% 0         0% 0         0% 80        0% 
secinfo      exchange_id  create_ses   destroy_ses  sequence     get_lease_t  
0         0% 0         0% 0         0% 0         0% 0         0% 0         0% 
reclaim_comp layoutget    getdevinfo   layoutcommit layoutreturn getdevlist   
0         0% 0         0% 0         0% 0         0% 0         0% 0         0% 
(null)       
0         0% 

On 6.6

$ nfsstat 
Client rpc stats:
calls      retrans    authrefrsh
637725     0          637781  

Client nfs v4:
null         read         write        commit       open         open_conf    
0         0% 23782     3% 13127     2% 48        0% 41244     6% 406       0% 
open_noat    open_dgrd    close        setattr      fsinfo       renew        
0         0% 0         0% 31228     4% 14668     2% 7         0% 27319     4% 
setclntid    confirm      lock         lockt        locku        access       
1         0% 1         0% 8493      1% 2         0% 8459      1% 175320   27% 
getattr      lookup       lookup_root  remove       rename       link         
134732   21% 112688   17% 2         0% 1007      0% 6728      1% 4         0% 
symlink      create       pathconf     statfs       readlink     readdir      
11        0% 129       0% 5         0% 7624      1% 143       0% 11507     1% 
server_caps  delegreturn  getacl       setacl       fs_locations rel_lkowner  
12        0% 12732     1% 0         0% 0         0% 0         0% 6335      0% 
secinfo      exchange_id  create_ses   destroy_ses  sequence     get_lease_t  
0         0% 0         0% 0         0% 0         0% 0         0% 0         0% 
reclaim_comp layoutget    getdevinfo   layoutcommit layoutreturn getdevlist   
0         0% 0         0% 0         0% 0         0% 0         0% 0         0% 
(null)       
0         0% 

which seem to confirm long open times on 6.7, but I don't really know how to tackle this with mount options, etc.

More experiments have shown that even when mounting NFS from inside a Docker container with a up to date OS (CentOS 7, Ubuntu 16.04), which I did to rule out issues with nfsutils, etc, performance on 6.7 hosts always showed the same slow performance when using NFS4. With NFS3 performance on 6.7 is as good as on 6.6.

At this point I expect either the kernel(-module) or the vmware-tools on the underlying host to cause the problem, but I am missing ideas how to dig further.

Is this a known problem? Did I miss some possible culprits? How would you dig further? Is there a way to profile the NFS-client? How can I rule out vmware drivers as a problem?

And of course: Has anyone a simple solution for me?

EDIT: I did dig in a somewhat different direction this morning: With tcpdump I checked the NFS traffic (again), and it looks like there is no caching happening on 6.7. Every access to a (non-existent) shared library always causes actual NFS traffic since the LD_LIBRARY_PATH doesn't contain most of the libraries usually returns reply ok 52 getattr ERROR: No such file or directory. On 6.6 only the first causes actual traffic. Having learnt this, I was able to workaround the underlying performance issue for standard commands like "ls" by moving the NFS paths from the LD_LIBRARY_PATH into an extra ld.so.conf file with the libraries needed for our compilation process. However, this still is just a workaround and the issue now seems to be, that there is no caching happening in the NFS client. So I tried again activating the filesystem cache on NFS as suggested here, but still every "open" causes NFS traffic and compilation is still unacceptably slow.

As requested by shodanshok:

On 6.6

server:/share /mnt nfs4 rw,relatime,vers=4,rsize=1048576,wsize=1048576,namlen=255,soft,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=<clientip>,minorversion=0,local_lock=none,addr=<serverip> 0 0

On 6.7 (after activating fsc)

server:/share /mnt nfs4 ro,relatime,vers=4.0,rsize=1048576,wsize=1048576,namlen=255,soft,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=<clientip>,fsc,local_lock=none,addr=<serverip> 0 0

Neither machine uses nscd. I did however install cachefilesd on both machines some time ago to see if that would help with the performance, which it didn't. Currently the cache on 6.6 is not even active (/var/cache/fscache/* is empty) and starting with using fsc this morning in the mount options it actually contains 3 files on 6.7, but it doesn't seem to cache non-existent shared library path, so the performance hasn't changed. For the non-existent files I would expect acregmin, etc to have an impact, but while their (default) values - to me - seem reasonable, they don't seem to have any effect.

Output of mountstats

Stats for server:/share mounted on /mnt:
NFS mount options: rw,vers=4.0,rsize=1048576,wsize=1048576,namlen=255,acregmin=3,acregmax=60,acdirmin=30,acdirmax=60,soft,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=10.114.160.159,fsc,local_lock=none
NFS server capabilities: caps=0xfff7,wtmult=512,dtsize=32768,bsize=0,namlen=255
NFSv4 capability flags: bm0=0xfdffafff,bm1=0xf9be3e,acl=0x0,pnfs=notconfigured
NFS security flavor: 1  pseudoflavor: 0

Does anybody know what these flags (i.e. caps, bm0, bm1, ...) mean?

Sanitized ps ax output: On 6.6

PID TTY      STAT   TIME COMMAND
  1 ?        Ss     0:01 /sbin/init
  2 ?        S      0:00 [kthreadd]
  3 ?        S      0:05 [ksoftirqd/0]
  6 ?        S      0:00 [migration/0]
  7 ?        S      0:10 [watchdog/0]
 37 ?        S<     0:00 [cpuset]
 38 ?        S<     0:00 [khelper]
 39 ?        S<     0:00 [netns]
 40 ?        S      0:06 [sync_supers]
 41 ?        S      0:00 [bdi-default]
 42 ?        S<     0:00 [kintegrityd]
 43 ?        S<     0:00 [kblockd]
 50 ?        S      0:49 [kworker/1:1]
 51 ?        S<     0:00 [xenbus_frontend]
 52 ?        S<     0:00 [ata_sff]
 53 ?        S      0:00 [khubd]
 54 ?        S<     0:00 [md]
 55 ?        S      0:01 [khungtaskd]
 56 ?        S      0:04 [kswapd0]
 57 ?        SN     0:00 [ksmd]
 58 ?        S      0:00 [fsnotify_mark]
 59 ?        S<     0:00 [crypto]
 64 ?        S<     0:00 [kthrotld]
 66 ?        S<     0:00 [kpsmoused]
240 ?        S      0:00 [scsi_eh_0]
241 ?        S      0:00 [scsi_eh_1]
248 ?        S<     0:00 [mpt_poll_0]
249 ?        S<     0:00 [mpt/0]
250 ?        S      0:00 [scsi_eh_2]
313 ?        S<     0:00 [kdmflush]
325 ?        S      0:00 [kjournald]
445 ?        S<s    0:00 /sbin/udevd -d
706 ?        S<     0:00 [vmmemctl]
815 ?        S<     0:00 [kdmflush]
865 ?        S      0:08 [kjournald]
907 ?        S      0:00 [kauditd]
1091 ?        S      0:11 [flush-252:2]
1243 ?        S     26:05 /usr/sbin/vmtoolsd
1311 ?        Ssl    0:03 /sbin/rsyslogd -i /var/run/syslogd.pid -c 5
1334 ?        Ss     0:06 cachefilesd -f /etc/cachefilesd.conf
1361 ?        Ss     6:55 irqbalance --pid=/var/run/irqbalance.pid
1377 ?        Ss     0:02 rpcbind
1397 ?        Ss     0:00 rpc.statd
1428 ?        S<     0:00 [rpciod]
1433 ?        Ss     0:00 rpc.idmapd 
1507 ?        S<     0:00 [nfsiod]
1508 ?        S      0:00 [nfsv4.0-svc]  
1608 ?        Ss     0:00 xinetd -stayalive -pidfile /var/run/xinetd.pid
1783 ?        Ss     0:11 crond
1796 ?        Ss     0:00 /usr/sbin/atd
1807 ?        Ss     0:01 rhnsd
1989 ?        S     99:05 /usr/lib/vmware-tools/sbin64/vmtoolsd -n vmusr
4879 ?        S      0:00 /bin/sh /etc/xrdp/startwm.sh
4904 ?        Ss     0:02 /bin/dbus-daemon --fork --print-pid 5 --print- address 7 --session
4924 ?        S     60:14 /usr/lib/vmware-tools/sbin64/vmtoolsd -n vmusr

and on 6.7:

PID TTY      STAT   TIME COMMAND
  1 ?        Ss     0:01 /sbin/init
  3 ?        S      0:10 [ksoftirqd/0]
  5 ?        S<     0:00 [kworker/0:0H]
  8 ?        S      0:19 [migration/0]
 11 ?        S      0:02 [watchdog/0]
 47 ?        S<     0:00 [cpuset]
 48 ?        S<     0:00 [khelper]
 49 ?        S      0:00 [kdevtmpfs]
 50 ?        S<     0:00 [netns]
 51 ?        S      0:00 [bdi-default]
 52 ?        S<     0:00 [kintegrityd]
 53 ?        S<     0:00 [crypto]
 54 ?        S<     0:00 [kblockd]
 62 ?        S<     0:00 [ata_sff]
 63 ?        S      0:00 [khubd]
 64 ?        S<     0:00 [md]
 66 ?        S      0:00 [khungtaskd]
 67 ?        S      0:36 [kswapd0]
 68 ?        SN     0:00 [ksmd]
 69 ?        S      0:00 [fsnotify_mark]
 80 ?        S<     0:00 [kthrotld]
 84 ?        S<     0:00 [deferwq]
151 ?        S<     0:00 [ttm_swap]
273 ?        S      0:00 [scsi_eh_0]
274 ?        S      0:00 [scsi_eh_1]
281 ?        S<     0:00 [mpt_poll_0]
282 ?        S<     0:00 [mpt/0]
283 ?        S      0:00 [scsi_eh_2]
374 ?        S<     0:00 [kdmflush]
387 ?        S      0:00 [kjournald]
480 ?        S<s    0:00 /sbin/udevd -d
872 ?        S<     0:00 [kworker/2:1H]
1828 ?        S<     0:00 [kdmflush]
1834 ?        S<     0:00 [kdmflush]
1837 ?        S<     0:00 [kdmflush]
1840 ?        S<     0:00 [kdmflush]
1881 ?        S      0:00 [kjournald]
1882 ?        S      0:03 [kjournald]
1883 ?        S      0:03 [kjournald]
1884 ?        S      3:14 [kjournald]
1926 ?        S      0:00 [kauditd]
2136 ?        S      1:37 [flush-252:1]
2137 ?        S      0:02 [flush-252:2]
2187 ?        S      5:04 /usr/sbin/vmtoolsd
2214 ?        S      0:00 /usr/lib/vmware-vgauth/VGAuthService -s
2264 ?        Sl     1:54 ./ManagementAgentHost
2327 ?        Ssl    0:00 /sbin/rsyslogd -i /var/run/syslogd.pid -c 5
2368 ?        Ss     0:00 rpcbind
2390 ?        Ss     0:00 rpc.statd
2425 ?        S<     0:00 [rpciod]
2430 ?        Ss     0:00 rpc.idmapd
2456 ?        Ss     0:00 dbus-daemon --system
2473 ?        S      0:00 [kworker/7:2]
2501 ?        S<     0:00 [nfsiod]
2504 ?        S      0:00 [nfsv4.0-svc]
2519 ?        Ss     0:00 /usr/sbin/acpid
2531 ?        Ssl    0:02 hald
2532 ?        S      0:00 hald-runner
2564 ?        S      0:00 hald-addon-input: Listening on /dev/input/ event1 /dev/input/event0
2580 ?        S      0:00 hald-addon-acpi: listening on acpid socket /var/run/acpid.socket
2618 ?        Ss     0:00 /usr/sbin/sshd
2629 ?        Ss     0:00 xinetd -stayalive -pidfile /var/run/xinetd.pid 
2778 ?        S      0:00 qmgr -l -t fifo -u
2811 ?        S      0:56 /usr/bin/python /usr/sbin/osad --pid-file /var/run/osad.pid
2887 ?        S<     0:00 [dm_bufio_cache]
3008 ?        Ss     0:00 rhnsd
3117 ?        S      9:44 /usr/lib/vmware-tools/sbin64/vmtoolsd -n vmusr
3195 ?        S      0:00 /usr/libexec/polkit-1/polkitd
3825 ?        S<     0:17 [loop0]
3828 ?        S<     0:21 [loop1]
3830 ?        S<     0:00 [kdmflush]
3833 ?        S<     0:00 [kcopyd]
3834 ?        S<     0:00 [dm-thin]
6876 ?        S      0:00 (unlinkd)
19358 ?        S      0:00 [flush-0:19]
24484 ?        S<     0:00 /sbin/udevd -d
24921 ?        S<     0:00 /sbin/udevd -d
26201 ?        Ss     0:00 cachefilesd -f /etc/cachefilesd.conf
29311 ?        S<     0:00 [kdmflush]
29316 ?        S      0:00 [jbd2/dm-6-8]
29317 ?        S<     0:00 [ext4-dio-unwrit]

A few days ago, I was sure I had found the issue when comparing the output of sysctl -a on both systems showed a difference in fs.nfs.idmap_cache_timeout which is set to 600 on 6.6 and to 0 on 6.7, but changing it also didn't have the desired effect.

I did find another useful command, though: rpcdebug -m nfs -s all prints out a lot of debugging information regarding caching into the system log (/var/log/messages in my case). Most entries when doing my ls' look like the following

Feb 27 10:45:16 host kernel: NFS: nfs_lookup_revalidate(//opt) is valid
Feb 27 10:45:16 host kernel: NFS: nfs_lookup_revalidate(opt/gcc) is valid
Feb 27 10:45:16 host kernel: NFS: nfs_lookup_revalidate(gcc/gcc-5.3.0) is valid
Feb 27 10:45:16 host kernel: NFS: nfs_lookup_revalidate(gcc-5.3.0/lib64) is valid

and of these blocks multiple instances per second (this is even with lookupcache=all).

Cheers!

Nodebody
  • 151
  • 4
  • I just tought if your /etc/ld.so.cache is up to date. Please run `sudo ldconfig` to update it and measure again. – DLade Feb 17 '17 at 13:35

2 Answers2

2

BTW, be careful using soft mounts:

With soft-mounted filesystems, you have to worry about damaging data due to incomplete writes, losing access to the text segment of a swapped process, and making soft-mounted filesystems more tolerant of variances in server response time.

To guarantee data integrity, all filesystems mounted read-write should be hard-mounted.

Soft mount issues

Using hard,intr is recommended on all NFS mounted file systems.

DLade
  • 21
  • 3
1

Do you use fs-cache or nscd on your CentOS 6.6 installation? Can you post a ps ax of both machines, as well the output of /proc/mounts/ ?

shodanshok
  • 44,038
  • 6
  • 98
  • 162
  • I cannot really dump my whole `ps ax` output. Were you looking for anything besides cachefilesd and nscd? – Nodebody Feb 17 '17 at 10:28
  • I would like to see if some caching process is running. Can you post a cleaned / sanitized `ps ax` output? – shodanshok Feb 17 '17 at 11:46
  • I can only think that some default parameters have changed. Try mounting with explict options `-o ac,lookupcache=all`. Do it make some differences? – shodanshok Feb 17 '17 at 14:10
  • Sorry for replying late: Explicit mount options didn't change the behavior. – Nodebody Feb 27 '17 at 08:32
  • Ok, one last thing: can you try mounting the share with the `-o async` option? Do it change anything? – shodanshok Feb 27 '17 at 09:49
  • The server specifies the share as "sync" and while changing it to async would be okay (at least on one of the shares), it is rather difficult for me to do on that specific server instance. I might have to set up my own test server to play around some more with server side settings. Anyway, thanks a lot for your input! – Nodebody Feb 27 '17 at 10:31
  • 1
    Please note that specifying "async" client side is different than "sync" on server side. You can (and should) export your share with "sync" option, but you client is free to mount it with "async" behavior. Basically, server-side "sync/async" controls server-side write caching, while "async" is client-side caching. – shodanshok Feb 27 '17 at 10:48