40

Today I (accidentally) ran some program on my Linux box that quickly used a lot of memory. My system froze, became unresponsive and thus I was unable to kill the offender.

How can I prevent this in the future? Can't it at least keep a responsive core or something running?

voretaq7
  • 79,345
  • 17
  • 128
  • 213
johv
  • 501
  • 1
  • 4
  • 4
  • Duplicate of [System hanging when it runs out of memory](http://unix.stackexchange.com/questions/28175/system-hanging-when-it-runs-out-of-memory/289945), and it's a well-known [bug](https://bugs.launchpad.net/ubuntu/+source/linux/+bug/159356) – Dan Dascalescu Jun 16 '16 at 13:03

6 Answers6

16

I'll bet that the system didn't actually "freeze" (in the sense that the kernel hung), but rather was just very unresponsive. Chances are it was just swapping very hard, causing interactive performance and system throughput to drop like a stone.

You could turn off swap, but that just changes the problem from poor performance to OOM-killed processes (and all the fun that causes), along with decreased performance due to less available disk cache.

Alternately, you could use per-process resource limits (commonly referred to as rlimit and/or ulimit) to remove the possibility of a single process taking a ridiculous amount of memory and causing swapping, but that just pushes you into entertaining territory with processes that die at inconvenient moments because they wanted a little more memory than the system was willing to give them.

If you knew you were going to do something that was likely to cause massive memory usage, you could probably write a wrapper program that did an mlockall() and then exec'd your shell; that'd keep it in memory, and would be the closest thing to "keep a responsive core" you're likely to get (because it's not that the CPU is being overutilised that is the problem).

Personally, I subscribe to the "don't do stupid things" method of resource control. If you've got root, you can do all sorts of damage to a system, and so doing anything that you don't know the likely results of is a risky business.

womble
  • 95,029
  • 29
  • 173
  • 228
  • 2
    Unfortunately, "don't do stupid things" doesn't help users who run memory-hogging applications like Chrome (see issues [134612](https://bugs.chromium.org/p/chromium/issues/detail?id=134612), [393395](https://bugs.chromium.org/p/chromium/issues/detail?id=393395)). – Dan Dascalescu Jun 16 '16 at 12:08
  • 1
    @DanDascalescu And it's not always obvious that you are doing something stupid. My machine hung the other day because I changed a "UNION" in a (complicated) SQLite query to "UNION ALL". – Michael Jul 30 '19 at 21:39
  • Known-buggy programs can (and should) be run in a resource-constrained configuration -- `ulimit`, or even cgroups these days, if you're a hip youngster, does the job quite well. If you're making changes to queries in production without validating their effects in a non-critical environment, that's your root cause problem. – womble Jul 30 '19 at 23:10
15

As mentioned above in comment by Tronic, it is possible to call OOM-killer (out of memory killer) directly by the keyboard combination SysRq-F.

SysRq key is usually combined within PrtSc key on keyboards.

OOM-killer kills some process(-es) and system becomes responsive again. Direct acces to OOM-killer may not be enabled by default, plz checkout this question to findout how to check its status and/or enable it.

PS: This helped me a lot. I agree with opinion that this is the most useful advise about that problem if it caused by Chrome or whatever memory greedy software. But you need to keep in mind that OOM-killer could kill some really important process, use it carefully.

Arkemlar
  • 251
  • 2
  • 5
6

This is a bug known since 2007 - see System freeze on high memory usage.

In this situation, Windows displays a dialog warning the user to close one or more applications.

Dan Dascalescu
  • 590
  • 1
  • 9
  • 21
5

You can use a daemon like earlyoom that checks Swap and available RAM, you can configure how much memory you want to be available, both RAM and SWAP, then if that treshold happens it kills the largest memory eater, that normally is the guilty eater, you can also have an exception list if you wish so.

  • Thank you so much! I've been looking for something like this for years now! It's been a thing for six years now, but I only learned about it from your answer. – silviot May 17 '21 at 13:21
2

If you feel like recompiling the kernel, you could try the patch from the EDIT section of this question: https://stackoverflow.com/q/52067753/10239615
It does not evict the Active(file) pages during high memory pressure and thus it allows OOM-killer to trigger almost instantly because the kernel no longer needs to spend minutes of constant from-disk re-reading of every process's executable code pages causing a frozen OS.

0

This is something particularly difficult to prevent. It's because the kernel starts swapping. One solution is to turn swap off. When the system runs out of memory, rather than start swapping, the kernel will kill some processes; usually it picks up the correct process to kill, but it is anyway better to kill a random process than to have an unresponsive system.

This can be a particularly good solution for servers, because servers often have enough RAM and when they start to use swap space it means something's wrong anyway. However, desktops usually need the swap space, so I think there's no good solution for desktops. I often turn swap space off in servers, especially when there is suspicion of a memory leak.

Antonis Christofides
  • 2,556
  • 2
  • 22
  • 35
  • 4
    Turning off swap on any system is a bad idea, because it doesn't allow the unused pages to be swapped out and the free space used for disk cache. This is *especially* true when there's a memory leak. – womble May 19 '12 at 10:55
  • 2
    And with swap off, the system can still get slow due to paging. It will just be paging clean pages madly instead of dirty ones. (Since, without swap, it can never evict a dirty page, it will always have to evict clean ones.) – David Schwartz May 19 '12 at 11:35
  • 1
    I have a server which has a memory leak. The first time it happened, I had to press the reset button, because the server became unresponsive. But now that I've turned swap off, the server just kills the apache child if it becomes too large (it's a safeguard in addition to MaxRequestsPerChild). The result is that the server runs without problem. It doesn't have many unused pages anyway, and it certainly isn't paging clean pages madly. – Antonis Christofides May 19 '12 at 11:52
  • @AntonisChristofides: I'm not sure what you think the takeaway lesson from that is. Your solution is certainly a bad one because it hampers performance due to the inability to evict rarely-accessed dirty pages from physical memory, it didn't solve the underlying problem, and you run the risk that the OOM killer might kill a critical process. You happened not to encounter the particular hazard I was warning about, but you're still at risk for it because you have no swap. – David Schwartz May 19 '12 at 13:33
  • Some general advice with memory problems of that kind: Leave swap on. look at what the slub/slab allocators are doing, they can get you into memory problems from behind. Check whether tuning "swappiness" and "vfs_cache_pressure" helps. – rackandboneman May 19 '12 at 15:50
  • 15
    With or without swap it still freezes before the OOM killer gets run automatically. This is really a kernel bug that should be fixed (i.e. run OOM killer earlier, before dropping all disk cache). Unfortunately kernel developers and a lot of other folk fail to see the problem. Common suggestions such as disable/enable swap, buy more RAM, run less processes, set limits etc. do not address the underlying problem that the kernel's low memory handling sucks camel's balls. Meanwhile, I suggest running the OOM killer manually (SysRq-F) when the system freezes as that will make it recover faster. – Tronic Jun 20 '12 at 16:52
  • What up @Tronic. Long time no see. Your tip for manually running oom-killer via `SysRq-F` is the best tip I've come across for this problem (y) – lkraav Oct 16 '16 at 20:09
  • If the unresponsiveness is caused by some program you just opened and if not much time has passed since then, closing it will work fine without letting the OOM killer choose which process to kill – golimar Nov 26 '19 at 17:37