This boils down to file system and procfs tuning. What you explain as 'high' load is situation where other normal processes on system are starved from reads and they forced to wait.
Situation is characterized by
- high share of CPU wait time (check top %wa)
- many processes in D state (uninterruptible sleep due to waiting for reads from disk)
Using noop scheduler does not help here since noop is simple FIFO scheduler, and can't
help you bring more fairness to disk access game. So I would suggest going with deadline scheduler.
Deadline scheduler idea is to project ultimate deadline for certain disk IO operation, and while maintaining two simple queues - one for read and one for write; you can tune affinity for reads over writes and times how long you can tolerate some read/write to be sitting in the queue before current operation is stopped and near expired IO task is catered.
Additionally what you want is to have a lot of directory entry and file inode data in RAM, cached. This kind of cache will greatly save disk IO while traversing such large directory/file structure.
grep ^Slab /proc/meminfo
This will tell you how much memory is totally dedicated to directory entry and file cache.
Details on what and how that Slab memory is split/used can be found in
/proc/slabinfo
You can run slabtop
to get interactive usage stats.
Ultimately if you decide to grow more of this kind of cache you want to reduce value of
sysctl -w vm.vfs_cache_pressure=20
This is by default set at 100, which is in situations when system is low on memory trying to fairly reduce amount of memory used for caching d_entry an file inodes vs page cache (i.e. file/ program data cache in RAM)
By reducing the value you will prefer to keep those d_entry/file inode cache and hence require less read operations to re-read that same data from disk if it was removed from cache due to memory pressure.
Further, to improve your disk read capabilities I would recommend increasing read ahead.
blockdev --getra /dev/sda
blockdev --setra 2048 /dev/sda
This should help you squeeze some extra IOPS especially if your system is doing more reads than writes. (to check that iostat can help; first line is always aggregate use since boot time so easy to devise ratios from that)
Next what I would consider tuning is downsizing is nr_requests
echo 32 > /sys/block/sda/queue/nr_requests
By doing so you will essentially have shorter batches which will allow for more latency at expense of some throughput we gained up there. Systems with many processes will benefit from this since it will be harder for single process to dominate the IO queue while others starve.
More about this can be found also here: hardware RAID controller tuning
Another case where you may have high loads is if your normal system activities are interrupted by some intermittent large write batch, e.g. large file download, copy, unpack operation. Writes can also easily saturate disk IO and to combat these I would recommend to
down tune following somewhat.
vm.dirty_background_ratio
vm.dirty_ratio
Careful thou don't go too low. To get the idea you can use atop
tool and check disk stats where you can see how much dirty data is normally in memory; how much processes benefit from dirty memory (WCANCL column in disk stats) and go somewhat above those usage rates.
These will help bring in kernels mechanism for writeback throttling that tries to slow down processes which affect systems IO by doing heavy writes. for more info check writeback throttling