7

I'm having problems with the OOM killer on one of my Linux (2.6.37) installs. The computer has 4GB of memory which I sometimes utilize fully. In those cases, I expect the OOM handler to come in and do its job by killing a process or two. Instead of doing this, or perhaps while attempting to do it, the system locks up, doing disk I/O like there is no tomorrow. Here's the thing: I DON'T have any swap enabled. For some reason my swapless system is still locking up with massive amounts of disk I/O, even though the appropriate course of action is to just kill a process or two. Thoughts?

The whole issue makes me wonder whether Linux requires swap in some way which I am not aware of. An explanation of whether this is the case and why would be greatly appreciated. I am familiar with the ideas of swap on a conceptual level (i.e. virtual memory, paging, overcommit), but I wonder if there is any implementation detail that I may have missed.

extramuros
  • 73
  • 1
  • 4
  • What exactly is it that you're doing? Perhaps the application you're running deals with running out of memory by making up its own "swap" and writing things to disk (most database servers will write temporary tables to disk if they won't fit in RAM)? If you have no swap, then the IO is almost certainly unconnected to memory usage. Could it be that you have some other issue (eg failing drive) that is causing the system to lock up while it attempts to read the drive over and over? Have you checked the system logs, or in the case of a hard lock, the console for messages? – DerfK Apr 04 '11 at 21:43
  • 1
    Next time this happens, check `iostat` to see what volume the system is writing and/or reading to. – EEAA Apr 04 '11 at 21:45
  • 6
    Relying on the (controversial) OOM killer for normal operation is adventurous. I wouldn't expect anything reliable from it. Why don't you simply add some swap (or even RAM) for this OOM situation not to happen ? – jlliagre Apr 04 '11 at 23:33
  • @ErikA I'd recommend trying `iotop` in addition to `iostat` – Jason Axelson Apr 05 '11 at 00:12
  • 4
    Linux uses virtual memory for a lot more than userspace memory (heap or stack). If you use all your RAM - for instance in heap in a program - then you "crowd out" the virtual memory cache which also caches executable pages (programs and libraries). In such a scenario it is well possible if a program or library gets evicted from the cache it has to be re-read from disk as soon as pages from this program or library are needed. – AndreasM Jun 29 '11 at 06:21
  • @AndreasM - sounds like an answer to the question. So iostat should show massive reads in this situation. – Nils Jul 06 '11 at 19:42
  • This is an old post, but I recently came across this page [https://chrisdown.name/2018/01/02/in-defence-of-swap.html]|(in defence of swap) which did a good job of explaining to me why I need to enable swap, regardless of how much RAM I have – Daniel Lawson Jan 08 '18 at 07:27

2 Answers2

7

The real question is, why are you running with no swap? Especially if you are seeing (serious) performance issues related to running out of RAM? You know not having swap can actually make your system slower, right?

The obvious solution is to add some swap space, and not have your system crap out on you. Considering how cheap disk space is, I can't think of any common situations1 where you should ever build a system without swap.

As to answering your question, I don't remember all of the low-level details on why swap is important even on systems where you aren't going to exhaust the memory, but there have been arguments on the Linux Kernel mailing list about whether it's reasonable to run systems without swap (and there haven't been a lot of conclusive answers). The general consensus is typically to always have swap, and adjust the swapiness as needed.

Also, I think you're misunderstanding some important caveats regarding the Linux OOM killer. First of all, relying on it to handle your Out of Memory issues is a Very Bad Idea (tm). It can be very indiscriminate about what it kills, and it is entirely possible that you will be left with an unstable or even unusable system. Yes, it attempts to kill recent processes that are eating lots of memory (a minor safeguard to try to catch a run away process), but there's no guarantees. I've seen it kill ssh, kill Xen processes (on a Xen virtual host server, causing VMs to crash), and in one case it killed NFS.

As for the IO. . . I don't know for sure what would be causing it. Perhaps a filesystem or disk related process got killed? Perhaps a process has some sort of "cache to disk" functionality built in when it can't allocate enough memory?

Another note, if this is a desktop, swap is required for Suspend to Disk. If it's a server, relying on OOM is never a good idea, as it compromises stability for, well, no good reason at all.

[1] Embedded systems are about the only obvious exception, and they aren't particularly common (and if you're dealing with embedded systems, you're already going to be aware of the requirements).

Christopher Cashell
  • 8,999
  • 2
  • 31
  • 43
  • Relying on OOM killer may be okay if you actively adjust `oom_adj` for important and possibly non-important processes, too. Even if you have swap the system may hit OOM situation and OOM killer will kill some process so you cannot just ignore `oom_adj` unless you turn off overcommit and deal with the consequences of that (basically really poor performance when system is near OOM). – Mikko Rantalainen Jan 07 '18 at 19:22
5

I think AndreasM has hit it on the head (the reason for the disk going all thrashy.) Executables are demand paged -- so in normal operation you will have nearly all of your executables nad libraries sitting in good ol' physical RAM. But when RAM runs low, but not low enough for the out-of-memory killer to be run, these pages are evicted from RAM. So you end up with a situation where pages are evicted -- at first, no problem, because they are evicted least-recently-used first and it kicks out pages you aren't using anyway. But then, it kicks out the ones you are using, just to have to page them right back in moments later. Thrash city.

Basically, if something used just a bit more RAM, you probably would have the OOM killer kick in but you weren't there yet. As a few have said, OOM killer is indiscriminate, it's really more of a last resort to avoid a kernel panic than something you should consider to use in normal operation. If you have some custom setup, I'd consider writing up some daemon to monitor free memory, and kill using the policy of your choosing when it approaches full.

Mikko Rantalainen
  • 858
  • 12
  • 27
hwertz
  • 51
  • 1
  • 1