8

I am running into a scenario where I'm seeing a high server load (sometimes upwards of 20 or 30) and a very low CPU usage (98% idle). I'm wondering if these wait states are coming as part of an NFS filesystem connection. Here is what I see in VMStat

procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 2  1      0 1298784      0      0    0    0    16     5    0    9  1  1 97  2  0
 0  1      0 1308016      0      0    0    0     0     0    0 3882  4  3 80 13  0
 0  1      0 1307960      0      0    0    0   120     0    0 2960  0  0 88 12  0
 0  1      0 1295868      0      0    0    0     4     0    0 4235  1  2 84 13  0
 6  0      0 1292740      0      0    0    0     0     0    0 5003  1  1 98  0  0
 4  0      0 1300860      0      0    0    0     0   120    0 11194  4  3 93  0  0
 4  1      0 1304576      0      0    0    0   240     0    0 11259  4  3 88  6  0
 3  1      0 1298952      0      0    0    0     0     0    0 9268  7  5 70 19  0
 3  1      0 1303740      0      0    0    0    88     8    0 8088  4  3 81 13  0
 5  0      0 1304052      0      0    0    0     0     0    0 6348  4  4 93  0  0
 0  0      0 1307952      0      0    0    0     0     0    0 7366  5  4 91  0  0
 0  0      0 1307744      0      0    0    0     0     0    0 3201  0  0 100  0  0
 4  0      0 1294644      0      0    0    0     0     0    0 5514  1  2 97  0  0
 3  0      0 1301272      0      0    0    0     0     0    0 11508  4  3 93  0  0
 3  0      0 1307788      0      0    0    0     0     0    0 11822  5  3 92  0  0

From what I can tell when the IO goes up the waits go up. Could NFS be the cause here or should I be worried about something else? This is a VPS box on a fiber channel SAN. I'd think the bottleneck wouldn't be the SAN. Comments?

Mech
  • 660
  • 2
  • 10
  • 22

2 Answers2

9

you can try to use iostat to pin down which device generates the i/o wait:

# iostat -k -h -n 5

see the iostat man page for further details. nfs is often part of the problem especially if you serve a large number of small files or have particular many file operations. you can tune nfs access by using the usual mount options like rsize=32768,wsize=32768. there's a good whitepaper by netapp covering this topic: http://media.netapp.com/documents/tr-3183.pdf

also make sure you have no drops on the network interface..

hope this helps

frank.

fen
  • 415
  • 4
  • 8
  • Freaking awesome! That was just it. It shows NFS as the device which is what I suspected (or hoped). I'm not terribly worried about the NFS since it's a backup device for offsite backups so if that's waiting I'm fine with that. Thanks again for the tip that was exactly the kind of information I was searching for. – Mech Mar 10 '10 at 13:46
0

Adding async option to /etc/exports helped me to bring back the load average in norms.

/mnt/dir      *(rw,async,pnfs,no_root_squash,no_subtree_check)