Apologies if I'm not using the proper jargon (although I'm a longtime linux user, I'm not an admin) or if this is a FAQ (though searching SE got lots of hits, I didn't see anything quite like this question):
I'm a user on a science cluster (with jobs managed by PBS/Torque, on RHEL5, FWIW). I'm about to start my first really-big job, so I asked the admin some configuration questions, to avoid stupid mistakes. I was mostly right, but he added the advice to "make sure you are not hammering the disk server with too much I/O," with followup to "use top [to] see if the nfs is going nuts."
How to do that? This is a cluster, so a lot is going on "behind the scenes" that is transparent to me. Plus I have next-to-no privileges. I also am limited to CLI via SSH, but that's the least of my problems. On the plus side, I do seem to be able to shell into any of the compute nodes, including those with attached disk(s).
So I'm wondering, how best to monitor NFS from userland? I know a little bit about top
and NFS, so I know I can do
top -p$(pgrep nfsd -d ',')
to get the list of NFS processes (no?). But what I'd really like to know--again, as a user (I have neither sudo
nor root) on RHEL5 (yes, we're still running that)--are
- One, or a few, aggregate statistics for NFS load across all NFS processes. Is this something I can get from
top
or another tool, without scraping output and doing my own math? And should I be monitoring processes other thannfsd
? - Advice concerning quantification of "NFS going nuts." If I can get one/few aggregate statistics, I can presumably get a pre-my-job baseline, but that still doesn't tell me "how high is too high."
Note: top
appears not to be the tool to use for this task, but at least it is available to me. The list of tools which are not available include
- nfsstat
- iostat
- iotop