Is it possible to make the OOM killer intervent earlier?

35

15

I try to tweak my development system to maximal reliability. I disabled swap, because for GUI usage it mostly renders the machine unresponsive in such a way not useable anymore. Nevertheless, if agressive appications eat up the memory, some mechanisms seem to kick in that making the most out of it on cost of speed. There is no harddrive swap operation, but the system is getting unresponsive likewise. So I want to let the OOM killer kick in before the system make any special efforts on memory gain. Is it possible to configure the OOM killer to act if there is less than 100 MB free physical memory for example?

dronus

Posted 2012-03-29T08:43:19.600

Reputation: 1 482

2I think the real issue here is, there's not enough ram to start with. You won't use swap unless there's no ram. By turning off swap... you run out of ram and have no where to page it to. Which causes ugly things to happen. Your system seems to be set up badly, and no amount of tweaking will fix that. – Journeyman Geek – 2012-03-29T08:51:05.350

9I don't agree. Development and 'power use' often involves experimental usage. For example, when using a command line image processing tool, there are no specs how much memory it's operation take in relation to the image size. So I just give it a run. And I don't expect it to render my whole machine useless. For a single experiment, I could use ulimit to keep it secured, but for whole system operation with sometimes plenty of operations, the containment of one process is not so usefull but a 'life insurance' for the whole machine definetly is. – dronus – 2012-03-29T11:14:21.573

Simply said, in my wide field of everyday usage, there are plenty of tasks my machine is able to handle, but some it is not able. – dronus – 2012-03-29T11:18:33.440

1The fact that your system grids to a halt when using swap is suspect. Your computer is using swap cause its out of memory. Swap is slowing down cause disk access is slow. Disk access is slow due to ???. Its problems all the way down. Its not just that you're low on ram. Its that you can't use the one way to mitgate that due to something else. – Journeyman Geek – 2012-03-29T13:35:04.993

7@JourneymanGeek, you are off in left field. Disks are slow compared to ram, period, hence heavy swapping always grinds the system to a halt. Of course he is out of memory because he tried running a program that uses a lot of memory. The question is what to do when out of memory? Kill the hog, or slow down due to having no memory left for the disk cache. – psusi – 2012-03-29T17:38:39.890

1@psusi: You don't understand how swap space works, with slow down being an assumption. We are all running page files and swap spaces world wide; and then you come with the suggestion to disable it, you have to explain a little bit more than that... – Tamara Wijsman – 2012-03-29T17:57:20.300

2@TomWijsman, Disk IO is many orders of magnitude slower than memory IO, so using disk swap has always meant a huge slow down. Sometimes ( especially in the old days where ram was expensive and so most people didn't have much ) that's preferable to not being able to do what you were trying at all. These days the disk is SO much slower than ram, and ram is cheap enough that most people have plenty, so on the rare occasion where they accidentally run something that uses more ram than they have, it is often better to give up than take 1000 times as long to do it. – psusi – 2012-03-29T18:33:03.610

@psusi: You don't understand how disk swap works. If you think that "your memory is waiting on your disk" you either have a badly implemented kernel at your disposal or don't know what you are talking about. Your entire comment makes no sense as a result, please come with a theoretically backed up explanation instead of guessing... – Tamara Wijsman – 2012-03-29T18:59:41.890

1I would also like OOM killer to trigger a little earlier. Sometimes it is obvious that the I have overloaded my system, but it takes minutes of swapping and mouse/keyboard jitter before the killer acts. – joeytwiddle – 2013-01-18T06:23:07.873

Answers

37

I also struggled with that issue. I just want my system to stay responsive, no matter what, and I prefer losing processes to waiting a few minutes. There seems to be no way to achieve this using the kernel oom killer.

However, in the user space, we can do whatever we want. So i wrote the Early OOM Daemon ( https://github.com/rfjakob/earlyoom ) that will kill the largest process (by RSS) once the available RAM goes below 10%.

Without earlyoom, it has been easy to lock up my machine (8GB RAM) by starting http://www.unrealengine.com/html5/ a few times. Now, the guilty browser tabs get killed before things get out of hand.

Jakob

Posted 2012-03-29T08:43:19.600

Reputation: 775

3Thanks for scratching this itch! Loving earlyoom so far. – Thomas Ferris Nicolaisen – 2016-02-18T13:15:02.913

1Just figured out Android does the same for a long time. I am not sure if it is using custom code like yours for that. – dronus – 2016-05-23T12:08:44.533

1I am testing earlyoom now, it does well in a first trigger test. I just wonder why this can't be implemented by kernel configuration or system tools. – dronus – 2016-05-23T12:23:53.577

12

The default policy of the kernel is to allow applications to keep allocating virtual memory as long as there is free physical memory. The physical memory isn't actually used until the applications touch the virtual memory they allocated, so an application can allocate much more memory than the system has, then start touching it later, causing the kernel to run out of memory, and trigger the out of memory (OOM) killer. Before the hogging process is killed though, it has caused the disk cache to be emptied, which makes the system slow to respond for a while until the cache refills.

You can change the default policy to disallow memory overcommit by writing a value of 2 to /proc/sys/vm/overcommit_memory. The default value of /proc/sys/vm/overcommit_ratio is 50, so the kernel will not allow applications to allocate more than 50% of ram+swap. If you have no swap, then the kernel will not allow applications to allocate more than 50% of your ram, leaving the other 50% free for the cache. That may be a bit excessive, so you may want to increase this value to say, 85% or so, so applications can allocate up to 85% of your ram, leaving 15% for the cache.

psusi

Posted 2012-03-29T08:43:19.600

Reputation: 7 195

3Save your work before trying this! :P I had immediate failures from everything (bash, window manager etc). – jozxyqk – 2015-03-12T05:18:11.160

1Changing these values from there defaults without theoretical background is not going to reach in a more reliable system, you can only justify that change with proper statistics. Just because you can change it doesn't mean you should. If you constantly in low memory conditions that means that you are using more memory than you have and should buy more memory, it doesn't mean you should fiddle with your settings and kill random applications. Interrupting with your daily working or introducing corruption, that's really not the way to go... – Tamara Wijsman – 2012-03-29T18:56:07.200

3@TomWijsman, the question makes it clear that he isn't constantly in low memory conditions; he just sometimes runs a command that takes an unexpectedly large amount of memory. Buying more memory is not the the only solution when you run out. Other potential solutions include finding better ways to make use of the memory you have, or just not doing whatever needs that much memory. The question makes it clear that the latter is more acceptable than going out and buying more ram. – psusi – 2012-03-29T19:02:10.670

Which line in the question makes this clear? I see the opposite given in I disabled swap, because for GUI usage it mostly renders the machine unresponsive in such a way not useable anymore.. He mentioned GUI, while you are assuming he runs a command. Buying more memory is the first solution, using less memory yourself is the second solution, making your system unstable by fiddling with the stable defaults is the last solution. The question doesn't have to be answered literally, so I don't see what's your problem that you have to bother both of us in the comments. Rant doesn't help... – Tamara Wijsman – 2012-03-29T19:06:45.330

4Hey, this answer sounded quite cool. Unfortunately, the 'commit' refers to virtual memory demand it seems, which is quite bad estimated by application programmers. For example with my (no swap) desktop running, there is about 400 of 2000mb physical memory used, but 1600mb 'commit'ted as /proc/meminfo's Committed_AS states. With some applications running, this value easily exceeds the physical memory so it's hard to set a feasable limit by this. – dronus – 2012-03-29T22:17:11.963

@dronus, yes, it is tricky trying to get it "just right". – psusi – 2012-03-30T13:49:26.263

9

For me setting vm.admin_reserve_kbytes=262144 does exactly this thing. OOM killer intervents before system goes completely unresponsive.

Michael Vigovsky

Posted 2012-03-29T08:43:19.600

Reputation: 91

2I like idea, but does it means you have 256MiB of physical memory never used? – Jérôme Pouiller – 2018-06-14T13:05:49.887

1256MiB will be used for caches. Caches are really important, it's not about just running faster, system wouldn't work at all if there's no enough memory for caches. Code of every running program can be unloaded from memory because it's mmaped and can be read back from disk. Without caches every task switch will require disk read and system will become completely unresponsive. – Michael Vigovsky – 2018-06-15T14:41:08.743

@MichaelVigovsky Do you have any proof of caching statement? It isn't mentioned at all in kernel docs: The amount of free memory in the system that should be reserved for users with the capability cap_sys_admin. – PF4Public – 2020-02-22T15:58:54.107

4

The other answers have good automatic solutions, but I find it can be helpful to also enable the SysRq key for when things get out of hand. With the SysRq key, you'd be manually messaging the kernel, and you can do things like a safe reboot (with SysRQ + REISUB) even if userspace has completely frozen.

To allow the kernel to listen to requests, set kernel.sysrq = 1, or enable just the functions you're likely to use with a bitmask (documented here). For example kernel.sysrq = 244 will enable all the combos needed for the safe reboot above as well as manual invocation of the OOM killer with SysRq + F.

timuzhti

Posted 2012-03-29T08:43:19.600

Reputation: 294

-2

Reliability isn't reached by low memory conditions and an OOM killer.

It is wrong to organize a party in a closet and place "cleaning out my closet" on your small playlist.

Is it possible to make the OOM killer intervent earlier?

Doing this will have unintended side results, because you have no control over what is killed.

I try to tweak my development system to maximal reliability.

Maximal reliability involves testing your system and improving your system based on these tests.

Just tweaking random things won't get you anywhere...

I disabled swap, because for GUI usage it mostly renders the machine unresponsive in such a way not useable anymore. Nevertheless, if agressive appications eat up the memory, some mechanisms seem to kick in that making the most out of it on cost of speed.

Due to low memory conditions, disabling the swap won't improve the behavior, it does the opposite.

To increase reliability in this situation, add more memory such that your system is more responsive and there are no random processes being killed without the user's intention. You shouldn't resort on low memory conditions and a mechanism like this, especially not in a development environment...

There is no harddrive swap operation, but the system is getting unresponsive likewise.

Low memory conditions indeed result in unresponiveness, whether you have a swap or not.

So I want to let the OOM killer kick in before the system make any special efforts on memory gain.

Special efforts that will do more harm than good, as I explained above. Instead, you could kill processes you don't need yourself, but I guess you can't do that so the OOM will kill processes that you need.

Is it possible to configure the OOM killer to act if there is less than 100 MB free physical memory for example?

Might be, but you get a higher return on investment if you just buy some extra memory which doesn't really cost much these days. Consider that you're going to hit yourself in the foot on the long run if you continue to work on low memory conditions. OOM is like a bailiff, it doesn't assist you, it assists the OS...

Tamara Wijsman

Posted 2012-03-29T08:43:19.600

Reputation: 54 163

7Of course disabling swap improves the behavior because instead of thrashing the disk, the OOM kicks in and kills the memory hog. Running out of ram isn't the problem ( and adding more just means you have to try harder to run out ). The problem is what to do when you DO run out. You want the OOM to kill the hog, and thus relieve the low memory condition. – psusi – 2012-03-29T17:43:18.933

@psusi: How is limiting favor such that an application I could need is killed an improvement of behavior? I'd rather just have some more memory and/or swap space such that my application keeps on running. You'll want to have some space for your other applications (either by moving them to swap or by extra memory), not your current application getting killed. *That's a bad UX...* – Tamara Wijsman – 2012-03-29T18:01:02.150

7Because killing an application that is trying to use more memory than you have is preferable to bringing the entire system to its knees. In a perfect world you would have unlimited memory and never run out, but in reality, sometimes you run out by accident and would rather be told "not enough memory" than have the system grind to a halt. – psusi – 2012-03-29T18:25:44.637

I'm assuming no such thing; it was explicitly stated in the question. – psusi – 2012-03-29T18:56:02.453

@psusi: You are just assuming it brings the system to its knees. You are also assuming that people hit their memory limit very often, I'm a quite heavy developer / gamer / ... and am yet to hit my memory limit. My page file gets many thanks from me! I'm talking about people, not about the low-memory OP. Hence it's just an assumption... – Tamara Wijsman – 2012-03-29T19:08:08.107

5Buying some extra memory might solve some problems, depending on the amount bought. But it doesn't change the fact that there may be inexpected usages by orders of magnitude. So I want the application to fail, but NOT the system under those conditions. Some examples: Process a folder full of compressed images, most of them "normal" size, but some of them really large. A small mistake could make a dead loop with memory runaway eating 1GB/s. Accidentally open a video file in a text editor. Usually this ends with symptoms like jerky mouse and almost dead UI until the OOM kicks in. – dronus – 2012-03-29T22:03:56.910

@dronus: You can just send a kill signal. Dead loops eating memory mean you aren't doing defensive programming and not using proper patterns, and aren't preferring for and foreach over while. ;) – Tamara Wijsman – 2012-03-29T22:09:05.373

6@TomWijsman there are also almost-dead loops as there are algorithms that behave linear in mean case but exponential in worst case, depending on input data. And I cannot send a kill signal if the mouse is jerky and clicks as well as keyboard input shows a one minute latency. I usually change to a text mode terminal then and wait minutes for the login to proceed just to issue a kill blindly typed. – dronus – 2012-03-29T22:23:47.947

7I have no problems with killing applications that would run dead either. Consider a system with 2GB physical + 2GB swap. An application that quickly runs out the physical memory can easily eat the swap too. It would just die later, after rendering the system unresponsive for minutes to hours. So why not kill it quickly before GUI operation get flaky? Much processes do all their work with 10mb, some take 1gb, and some rare would need 10gb, that's life. – dronus – 2012-03-29T22:43:55.363

@dronus: Why not introduce a debug safe-guard into your application or write something generic that monitors it and kills it, I do almost never experience this and am still able to kill it when it does eventually happen. Actually, there are people that been there, done that...

– Tamara Wijsman – 2012-03-29T22:53:06.753

@dronus: Could it be that you are experiencing a bug? Consider enabling DMA, looking into hdparm, updating your kernel which might make the experience better.

– Tamara Wijsman – 2012-03-29T22:57:48.653

@dronus: This article might also be interesting if you want to look further into OOM: Taming the OOM killer. Although the more simple solution would be something like ulimit -v.

– Tamara Wijsman – 2012-03-29T23:00:16.653

4@TomWijsman: I don't like introducing anything into the application because this is a general problem that can be caused by several applications. That's why I still see the OOM as the apropriate tool. ulimit -v is unrealiable because it can only limit one application and doesn't care about the total available ressources. – dronus – 2012-04-05T08:22:54.347

4I think this answer is a bit like 'don't do anything real bad to solve a problem' but in my opinion a system that can made unresponsive by a bunch of seemingly harmless user actions is worse than a system that kills them early. – dronus – 2013-09-13T19:12:07.237

@dronus: Compared to back in April 2012, these days we have cgroups to manage this; so, you might be able to set up cgroups in a way that does what you want. I still feel like dealing more active with the program will yield you better results than dealing passively with it; the goal would be to get the programs to run in the memory conditions you can provide, and programs that do not fit into it could be configured to be less needy or could have a bug reported (or feature request) so the developers could look into the high memory usage. It's just, OOM doesn't feel reliable to me... – Tamara Wijsman – 2013-09-13T20:12:01.780