1

In several contexts, I've seen a behavior on Linux systems in which large volumes of filesystem write operations (e.g., many gigabytes of writes, very quickly) will overwhelm memory, apparently waiting for I/O operations to complete and buffered data (that's been written) to be flushed to disk to free up memory for subsequent writes. When this circumstance occurs, if I look at "vmstat -s", I can see the amount of free memory becoming less and less, until it reaches zero. I most frequently see this problem when writing to very slow disks (such as USB-attached external drives that have a filesystem on them), but I've also seen it with more "regular" SATA disks when large volumes of data are written very quickly. At best, this seems to cause write operations to eventually block, waiting for memory to become available. At worst, if a high volume of writes continues to occur once the system is in this state, the pressure on memory becomes so great that the OOM Killer runs and randomly kills off processes to free up memory. It apparently doesn't even need to be multiple users doing writes to make this happen, as I've created this situation myself (without even anyone else using the system) when attempting to write very high volumes of data to a filesystem very quickly.

My guess (and I stress, it's only a best guess) is that the system isn't particularly aggressive in flushing buffered output to disk and freeing up the associated memory. But I'm not sure what I can tune, or even look at, to determine if this is truly the case, and perhaps make the flushing of write buffers more aggressive.

Am I on the right track here, as far as a guess at what's going on? If so, is there anything I can tune to try to make the system more aggressively flush pending I/O to disk and free up the buffer memory?

patbarron
  • 121
  • 5
  • 2
    If you're trying to write data faster than the disk can handle, the data is going to back up in the page cache no matter what - the kernel can't really start running a "try harder to write" code path. How much swap space do you have? If it's none or very little, you may wind up thrashing your system - see https://serverfault.com/questions/255661/linux-oom-disk-i-o-also-swap-what-is-it-good-for for a similar situation. Also, how many disk(s) do you have? If everything is contesting for a single disk drive, once you do start paging things will get really bad really fast. – Andrew Henle Apr 05 '18 at 20:25
  • 1
    If you do have control over the applications/code that do huge amounts of IO, you can try using direct IO to bypass the page cache. See https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/5/html/global_file_system/s1-manage-direct-io for an explanation. If you're writing so much data you'll blow out the entire page cache many times over while causing memory problems for the entire system, there's really no point in using the page cache in the first place. – Andrew Henle Apr 05 '18 at 20:27
  • Thanks for this feedback! Of course you're right, there is no way to magically push data to disk "harder" - I guess what I am asking is, if there is a way to make the system more aggressive in pushing data to disk (and freeing the page cache pages) in its "spare time" (if it gets a little break from the heavy write traffic), since I am not sure if it actually does that unless memory is short. In this context, the data in question is basically "write once, read never (or not for a very long time)", so it's not important to keep it in the page cache at all, it won't be referenced again soon. – patbarron Apr 05 '18 at 20:29
  • The systems I've seen this on generally have one physical disk or RAID array. – patbarron Apr 05 '18 at 20:32

1 Answers1

3

This sounds like a well-known topic (e.g. see The pernicious USB-stick stall problem and Toward less-annoying background writeback on LWN) but unfortunately you didn't tell us about the versions of software on your system (e.g. kernel version) so it's going to be difficult to say anything too specific to your circumstance. I've never known excessive writeback to cause OOMs (rather it resulted in poor responsiveness for me). To check whether it really was all writeback related (as opposed to cached memory that could be easily dropped in a OOM situation) you would need to monitor the Dirty and Writeback rows in /proc/meminfo and carefully look at the state of memory shown in the OOM splat.

Generally, this should be less of an issue on newer (at the time of writing) kernels because measures were introduced to perform dynamic writeback throttling in 4.10 (but note writeback throttling is disabled by default if you are using the CFQ I/O scheduler on 4.12+).

On the manual end of the spectrum, Chris Siebenmann found that tuning the dirty_background_bytes and dirty_bytes kept his machine responsive when writing to USB drives. Those values live in /proc/sys/vm and are mentioned along with other in the LWN USB-stick stall article linked earlier (also see the answers on Limit Linux background flush (dirty pages) for discussion about this technique). Be warned though: setting these values incorrectly can hurt throughput.

Anon
  • 1,210
  • 10
  • 23