11

Is there any way to get a core dump of, or be able to debug a process that has been killed by oom-killer?

Or even set oom-killer to try to kill a process using ABRT instead?

TrapAlice
  • 111
  • 1
  • 1
  • 4

2 Answers2

6

Another approach, is to disable overcommitting of memory.

To restore some semblance of sanity to your memory management:

  1. Disable the OOM Killer (Put vm.oom-kill = 0 in /etc/sysctl.conf)
  2. Disable memory overcommit (Put vm.overcommit_memory = 2 in /etc/sysctl.conf)

These settings will make Linux behave in the traditional way (if a process requests more memory than is available malloc() will fail and the process requesting the memory is expected to cope with that failure).

Note that this is a ternary value:
  • 0 = "estimate if we have enough RAM"
  • 1 = "Always say yes"
  • 2 = "say no if we don't have the memory"

This will force the application to handle running out of memory itself, and possibly its logs / coredump / etc. could give you something useful.

UPDATE #1

NOTE: When your system runs out of memory, you will not be able to spawn new processes! You may be locked out of the system.

sourcejedi
  • 1,050
  • 10
  • 19
nishantjr
  • 241
  • 2
  • 9
  • 3
    This is a terrible idea. Most software running on your system probably doesn't handle the return value from memory allocation failure correctly. Doing this will cause code-paths that practically never get executed by anyone to run and in the worst case could even introduce security vulnerabilities on your system from running these untested and unexpected code paths. – KJ Tsanaktsidis Apr 22 '19 at 23:51
  • Does this behave differently than what you get if you do a `mlockall` at startup? – Joseph Garvin Oct 20 '20 at 19:59
5
echo 1 > /proc/sys/vm/oom_dump_tasks

which seems about the max that you can get the kernel to display on out-of-memory errors.

https://www.kernel.org/doc/Documentation/sysctl/vm.txt

Enables a system-wide task dump (excluding kernel threads) to be produced when the kernel performs an OOM-killing and includes such information as pid, uid, tgid, vm size, rss, nr_ptes, swapents, oom_score_adj score, and name. This is helpful to determine why the OOM killer was invoked, to identify the rogue task that caused it, and to determine why the OOM killer chose the task it did to kill.

If this is set to zero, this information is suppressed. On very large systems with thousands of tasks it may not be feasible to dump the memory state information for each one. Such systems should not be forced to incur a performance penalty in OOM conditions when the information may not be desired.

If this is set to non-zero, this information is shown whenever the OOM killer actually kills a memory-hogging task.

HBruijn
  • 72,524
  • 21
  • 127
  • 192