I have the following NFS-based storage setup:
Computes nodes are Linux. The NFS servers are Solaris.
A not-so-important user runs a bunch of read intensive jobs on a subset of the compute nodes. As a result, the whole group of compute nodes becomes very slow (ls
blocks for 30 seconds). I was able to track down that the dedicated NFS server is hitting the limit of the san's read throughput.
How to implement quality of service (QoS) limiting the NFS bandwidth to nodes, processes, or users?