30

How do I get the Linux OOM killer to not kill my processes when physical memory is low but there is plenty of swap space?

I have disabled OOM killing and overcommit with sysctl vm.overcommit_memory=2.

The VM has 3 GB of absolutely free unfragmented swap and the processes that is being OOM killed has max memory usage less than 200MB.

I know that long term swapping will be horrible for performance, but I need to use the swap right now to do functional testing under valgrind where memory requirements are much greater.

Mar  7 02:43:11 myhost kernel: memcheck-amd64- invoked oom-killer: gfp_mask=0x24002c2, order=0, oom_score_adj=0
Mar  7 02:43:11 myhost kernel: memcheck-amd64- cpuset=/ mems_allowed=0
Mar  7 02:43:11 myhost kernel: CPU: 0 PID: 3841 Comm: memcheck-amd64- Not tainted 4.4.0-x86_64-linode63 #2
Mar  7 02:43:11 myhost kernel: Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.2-0-g33fbe13 by qemu-project.org 04/01/2014
Mar  7 02:43:11 myhost kernel: 0000000000000000 0000000000000000 ffffffff8158cbcc ffff880032d7bc18
Mar  7 02:43:11 myhost kernel: ffffffff811c6a55 00000015118e701d ffffffff81044a8d 00000000000003e2
Mar  7 02:43:11 myhost kernel: ffffffff8110f5a1 0000000000000000 00000000000003e2 ffffffff81cf15cc
Mar  7 02:43:11 myhost kernel: Call Trace:
Mar  7 02:43:11 myhost kernel: [<ffffffff8158cbcc>] ? dump_stack+0x40/0x50
Mar  7 02:43:11 myhost kernel: [<ffffffff811c6a55>] ? dump_header+0x59/0x1dd
Mar  7 02:43:11 myhost kernel: [<ffffffff81044a8d>] ? kvm_clock_read+0x1b/0x1d
Mar  7 02:43:11 myhost kernel: [<ffffffff8110f5a1>] ? __raw_callee_save___pv_queued_spin_unlock+0x11/0x1e
Mar  7 02:43:11 myhost kernel: [<ffffffff81183316>] ? oom_kill_process+0xc0/0x34f
Mar  7 02:43:11 myhost kernel: [<ffffffff811839b2>] ? out_of_memory+0x3bf/0x406
Mar  7 02:43:11 myhost kernel: [<ffffffff81187bbd>] ? __alloc_pages_nodemask+0x8ba/0x9d8
Mar  7 02:43:11 myhost kernel: [<ffffffff811b82e8>] ? alloc_pages_current+0xbc/0xe0
Mar  7 02:43:11 myhost kernel: [<ffffffff811b096c>] ? __vmalloc_node_range+0x12d/0x20a
Mar  7 02:43:11 myhost kernel: [<ffffffff811e0e62>] ? alloc_fdtable+0x6a/0xd8
Mar  7 02:43:11 myhost kernel: [<ffffffff811b0a83>] ? __vmalloc_node+0x3a/0x3f
Mar  7 02:43:11 myhost kernel: [<ffffffff811e0e62>] ? alloc_fdtable+0x6a/0xd8
Mar  7 02:43:11 myhost kernel: [<ffffffff811b0ab0>] ? vmalloc+0x28/0x2a
Mar  7 02:43:11 myhost kernel: [<ffffffff811e0e62>] ? alloc_fdtable+0x6a/0xd8
Mar  7 02:43:11 myhost kernel: [<ffffffff811e1338>] ? dup_fd+0x103/0x1f0
Mar  7 02:43:11 myhost kernel: [<ffffffff810dd143>] ? copy_process+0x5aa/0x160d
Mar  7 02:43:11 myhost kernel: [<ffffffff8110f5a1>] ? __raw_callee_save___pv_queued_spin_unlock+0x11/0x1e
Mar  7 02:43:11 myhost kernel: [<ffffffff810de2fc>] ? _do_fork+0x7d/0x291
Mar  7 02:43:11 myhost kernel: [<ffffffff810ea186>] ? __set_current_blocked+0x47/0x52
Mar  7 02:43:11 myhost kernel: [<ffffffff810ea1f2>] ? sigprocmask+0x61/0x6a
Mar  7 02:43:11 myhost kernel: [<ffffffff81998eae>] ? entry_SYSCALL_64_fastpath+0x12/0x71
Mar  7 02:43:11 myhost kernel: Mem-Info:
Mar  7 02:43:11 myhost kernel: active_anon:15 inactive_anon:18 isolated_anon:0
Mar  7 02:43:11 myhost kernel: active_file:7 inactive_file:8 isolated_file:0
Mar  7 02:43:11 myhost kernel: unevictable:0 dirty:3 writeback:26 unstable:0
Mar  7 02:43:11 myhost kernel: slab_reclaimable:1798 slab_unreclaimable:3674
Mar  7 02:43:11 myhost kernel: mapped:8 shmem:1 pagetables:752 bounce:0
Mar  7 02:43:11 myhost kernel: free:1973 free_pcp:0 free_cma:0
Mar  7 02:43:11 myhost kernel: Node 0 DMA free:3944kB min:60kB low:72kB high:88kB active_anon:0kB inactive_anon:0kB active_file:28kB inactive_file:32kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15992kB managed:15908kB
 mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:72kB slab_unreclaimable:236kB kernel_stack:48kB pagetables:60kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:36
0 all_unreclaimable? yes
Mar  7 02:43:11 myhost kernel: lowmem_reserve[]: 0 972 972 972
Mar  7 02:43:11 myhost kernel: Node 0 DMA32 free:3948kB min:3956kB low:4944kB high:5932kB active_anon:60kB inactive_anon:72kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:1032064kB manag
ed:999552kB mlocked:0kB dirty:12kB writeback:104kB mapped:32kB shmem:4kB slab_reclaimable:7120kB slab_unreclaimable:14460kB kernel_stack:2112kB pagetables:2948kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_t
mp:0kB pages_scanned:792 all_unreclaimable? yes
Mar  7 02:43:11 myhost kernel: lowmem_reserve[]: 0 0 0 0
Mar  7 02:43:11 myhost kernel: Node 0 DMA: 20*4kB (UM) 17*8kB (UM) 13*16kB (M) 14*32kB (UM) 8*64kB (UM) 4*128kB (M) 4*256kB (M) 0*512kB 1*1024kB (M) 0*2048kB 0*4096kB = 3944kB
Mar  7 02:43:11 myhost kernel: Node 0 DMA32: 934*4kB (UM) 28*8kB (UM) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 3960kB
Mar  7 02:43:11 myhost kernel: 71 total pagecache pages
Mar  7 02:43:11 myhost kernel: 42 pages in swap cache
Mar  7 02:43:11 myhost kernel: Swap cache stats: add 245190, delete 245148, find 77026/136093
Mar  7 02:43:11 myhost kernel: Free swap  = 3118172kB
Mar  7 02:43:11 myhost kernel: Total swap = 3334140kB
Mar  7 02:43:11 myhost kernel: 262014 pages RAM
Mar  7 02:43:11 myhost kernel: 0 pages HighMem/MovableOnly
Mar  7 02:43:11 myhost kernel: 8149 pages reserved
Mar  7 02:43:11 myhost kernel: [ pid ]   uid  tgid total_vm      rss nr_ptes nr_pmds swapents oom_score_adj name
Mar  7 02:43:11 myhost kernel: [ 2054]     0  2054     5101        1      15       4      283             0 upstart-udev-br
Mar  7 02:43:11 myhost kernel: [ 2063]     0  2063    12362        1      28       4      184         -1000 systemd-udevd
Mar  7 02:43:11 myhost kernel: [ 3342]   102  3342     9780        1      23       3       89             0 dbus-daemon
Mar  7 02:43:11 myhost kernel: [ 3423]     0  3423    10864        1      26       3       85             0 systemd-logind
Mar  7 02:43:11 myhost kernel: [ 3441]     0  3441    15344        0      34       3      184         -1000 sshd
Mar  7 02:43:11 myhost kernel: [ 3450]     0  3450     4786        0      14       3       43             0 atd
Mar  7 02:43:11 myhost kernel: [ 3451]     0  3451     5915        0      17       4       65             0 cron
Mar  7 02:43:11 myhost kernel: [ 3457]   101  3457    63962        0      28       3      202             0 rsyslogd
Mar  7 02:43:11 myhost kernel: [ 3516]     0  3516     3919        1      13       3      156             0 upstart-file-br
Mar  7 02:43:11 myhost kernel: [ 3518]     0  3518     4014        0      13       3      265             0 upstart-socket-
Mar  7 02:43:11 myhost kernel: [ 3557]     0  3557    66396        0      32       3     1802             0 fail2ban-server
Mar  7 02:43:11 myhost kernel: [ 3600]     0  3600     3956        1      13       3       39             0 getty
Mar  7 02:43:11 myhost kernel: [ 3601]     0  3601     3198        1      12       3       37             0 getty
Mar  7 02:43:11 myhost kernel: [ 3673]     0  3673    26411        1      55       3      252             0 sshd
Mar  7 02:43:11 myhost kernel: [ 3740]  1000  3740    26411        1      52       3      253             0 sshd
Mar  7 02:43:11 myhost kernel: [ 3741]  1000  3741     5561        0      16       3      431             0 bash
Mar  7 02:43:11 myhost kernel: [ 3820]   103  3820     7863        1      21       3      152             0 ntpd
Mar  7 02:43:11 myhost kernel: [ 3837]  1000  3837    31990        0      58       4    12664             0 memcheck-amd64-
Mar  7 02:43:11 myhost kernel: [ 3841]  1000  3841    32006        0      59       4    12812             0 memcheck-amd64-
Mar  7 02:43:11 myhost kernel: [ 3844]  1000  3844    31950        0      57       4    12035             0 memcheck-amd64-
Mar  7 02:43:11 myhost kernel: [ 3849]  1000  3849    31902        0      56       4    11482             0 memcheck-amd64-
Mar  7 02:43:11 myhost kernel: [ 3853]  1000  3853     1087        0       7       3       27             0 lsof
Mar  7 02:43:11 myhost kernel: [ 3854]     0  3854    26140        5      55       3      230             0 sshd
Mar  7 02:43:11 myhost kernel: [ 3855]   104  3855    15699        0      33       3      202             0 sshd
Mar  7 02:43:11 myhost kernel: Out of memory: Kill process 3841 (memcheck-amd64-) score 11 or sacrifice child
Mar  7 02:43:11 myhost kernel: Killed process 3841 (memcheck-amd64-) total-vm:128024kB, anon-rss:0kB, file-rss:0kB

This is /proc/meminfo

MemTotal:        1015460 kB
MemFree:          277508 kB
MemAvailable:     322032 kB
Buffers:            8336 kB
Cached:            42208 kB
SwapCached:        46088 kB
Active:            58844 kB
Inactive:         116100 kB
Active(anon):      34784 kB
Inactive(anon):    89620 kB
Active(file):      24060 kB
Inactive(file):    26480 kB
Unevictable:           0 kB
Mlocked:               0 kB
SwapTotal:       3334140 kB
SwapFree:        3215756 kB
Dirty:                16 kB
Writeback:             0 kB
AnonPages:        121128 kB
Mapped:            15072 kB
Shmem:                 4 kB
Slab:              22668 kB
SReclaimable:       8028 kB
SUnreclaim:        14640 kB
KernelStack:        2016 kB
PageTables:         2532 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:     3841868 kB
Committed_AS:     380460 kB
VmallocTotal:   34359738367 kB
VmallocUsed:           0 kB
VmallocChunk:          0 kB
DirectMap4k:       14208 kB
DirectMap2M:     1034240 kB
DirectMap1G:           0 kB
030
  • 5,731
  • 12
  • 61
  • 107
Coder
  • 425
  • 1
  • 4
  • 6
  • I don't know if you can exclude a certain process but in this blog post you'll find a little script that allows you to constantly watch the "victim score" via proc filesystem: [https://blogs.oracle.com/ksplice/entry/solving_problems_with_proc](https://blogs.oracle.com/ksplice/entry/solving_problems_with_proc). Maybe this helps to gain further insight what happens in your VM. – initall Mar 07 '16 at 08:19
  • @initall Thank you, however that doesn't help because the oom killer shouldn't run at all since there is 3GB free swap. – Coder Mar 07 '16 at 08:30
  • Your process was in the middle of forking and the kernel was trying to allocate duplicate internal data structures for the child process. This requires specific amounts of kernel memory which weren't available at the time. Linux has many internal data structures which are sized up or down according to the amount of physical RAM in the system. The sensible solution to this is to increase the total amount of RAM. – Michael Hampton Mar 07 '16 at 08:36
  • @MichaelHampton How do you know that the problem is that kernel memory was not available? How much kernel memory was needed? Since there was 3GB swap available why wasn't that virtual memory used? If what you're saying is true then how could any process ever fork after all physical memory is used? – Coder Mar 07 '16 at 08:52
  • @MichaelHampton If the problem is that there is not enough memory to fork fork() should fail with errno=12. The process should not be OOM kiled. – Coder Mar 07 '16 at 08:54
  • 8
    It's blatantly obvious from the call trace that the kernel didn't have enough memory. As for why it didn't swap, that can have many different causes, all of which are too long to explain fully in 500 characters. But yours looks like it's that there were no reclaimable pages (`all_unreclaimable` is yes). These are pages locked into RAM, generally because they're pinned or in use by the kernel. Nothing you had left in RAM was swappable at the time; everything that _could_ have been swapped out already had been. Again, the solution is more RAM. – Michael Hampton Mar 07 '16 at 09:00
  • @MichaelHampton Do you have pointers to anything I can read that does explain it? – Coder Mar 07 '16 at 09:02
  • 1
    @MichaelHampton The rest of the memory is being used by regular applications. Why can't the kernel push them to swap? Please respond to my question "If what you're saying is true then how could any process ever fork after all physical memory is used?" – Coder Mar 07 '16 at 09:04
  • https://www.google.com/search?q=linux+unreclaimable+pages – Michael Hampton Mar 07 '16 at 09:05
  • 1
    @MichaelHampton I disabling the forking and now fail2ban invokes the oom killer causing my processes to be killed. What is the point of having swap if the kernel won't use it? More importantly, how do I configure the kernel so that it stops running out of memory. – Coder Mar 07 '16 at 09:15
  • 1
    For the last time, you spend the $10 and add more RAM. Swap is not magic "RAM-on-disk" and as has been thoroughly explained, simply having it does not mean it is usable. – Michael Hampton Mar 07 '16 at 09:17
  • @MichaelHampton This is not $10. I will have to do this for many many machines. You're saying that it should not be possible to write code to handle low memory situations even though I have disabled over commit? Instead of code handling malloc() failures and handling errno=12 the code should check to see if kernel memory is low before allocating or forking? Beyond that if any process on the host allocates or forks while kernel memory is low I should expect my high priority processes to be killed? – Coder Mar 07 '16 at 09:22
  • 1
    If you have "many many machines" then Linode might not be your best bet. Consider an on-premise "private" cloud, or a service like AWS where it's easier to scale up and down on demand, so that you are only using resources when you really need them. Regardless, you need enough memory, and you don't have it. This is not a situation where writing more code is a viable solution. – Michael Hampton Mar 07 '16 at 09:34
  • 1
    @MichaelHampton It's very easy to scale up on Linode for me. Regardless I want to be able to fix my system so that it doesn't crash when there is low physical memory. Is that not a reasonable goal? – Coder Mar 07 '16 at 09:36
  • @MichaelHampton I am not dramatically exceeding my physical memory. I am going over physical limit by 210MB when there is 3GB more of free swap. – Coder Mar 07 '16 at 09:44
  • 4
    @MatthewIfe: If you know the answer, please post it here. Stack Exchange sites are for the benefit of everyone who reads, not just the OP who asked the question. – R.. GitHub STOP HELPING ICE Mar 08 '16 at 00:46
  • I see you're running Linode's kernel - is it possible that they've intentionally patched it not to use swap because they don't want you using swap on their servers? If so, it looks like they broke no-overcommit mode in the process... – R.. GitHub STOP HELPING ICE Mar 08 '16 at 00:48
  • 4
    Swapping in a VM is not considered Best Practice. Allocate more real memory to your VM. If you can't add more memory, bring it in-house to physical hardware rather than leaving it in an undersized rental. – Criggie Mar 08 '16 at 03:10
  • "There are several things that might cause an OOM event other than the system running out of RAM and available swap space due to the workload. The kernel might not be able to utilize swap space optimally due to the type of workload on the system. Applications that utilize mlock() or HugePages have memory that can't be swapped to disk when the system starts to run low on physical memory. Kernel data structures can also take up too much space exhausting memory on the system and causing an OOM situation. Many NUMA architecture–based systems can experience OOM conditions because of one node" – Michael Martinez Dec 02 '17 at 01:03

7 Answers7

41

This appears to be problem in combination of two factors:

  • Using a virtual machine.
  • A possible kernel bug.

This is partly one of the lines which describes why this happens:

Mar  7 02:43:11 myhost kernel: memcheck-amd64- invoked oom-killer: gfp_mask=0x24002c2, order=0, oom_score_adj=0

The other line is this:

Mar  7 02:43:11 myhost kernel: 0 pages HighMem/MovableOnly

|The first line is the GFP mask assigned for the allocation. It basically describes what the kernel is allowed/not allowed to do to satify this request. The mask indicates a bunch of standard flags. The last bit, '2' however indicates the memory allocation should come from the HighMem zone.

If you look closely at the OOM output, you'll see no HighMem/Normal zone actually exists.

Mar  7 02:43:11 myhost kernel: Node 0 DMA: 20*4kB (UM) 17*8kB (UM) 13*16kB (M) 14*32kB (UM) 8*64kB (UM) 4*128kB (M) 4*256kB (M) 0*512kB 1*1024kB (M) 0*2048kB 0*4096kB = 3944kB
Mar  7 02:43:11 myhost kernel: Node 0 DMA32: 934*4kB (UM) 28*8kB (UM) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 3960kB

HighMem (generally also called Normal on x86_64) tends to map memory for zones outside of the standard 896MiB ranges directly kernel accessible on 32 bit systems. On x86_64 HighMem/Normal seems to cover all pages above 3GiB in size.

DMA32 contains a zone used for memory that would be accessible on 32-bit DMA devices, that is you can address them with 4 byte pointers. I believe DMA is for 16-bit DMA devices.

Generally speaking, on low memory systems Normal wouldn't exist, given that DMA32 covers all available virtual addresses already.

The reason you OOM kill is because there is a memory allocation for a HighMem zone with 0 pages available. Given the out of memory handler has absolutely no way to satisfy making this zone have pages to use by swapping, killing other processes or any other trick, OOM-killer just kills it.

I believe this is caused by the host VM ballooning on boot up. On KVM systems, there are two values you can set.

  • The current memory.
  • The available memory.

The way this works is that you can hot-add memory to your server up to the available memory. Your system however is actually given the current memory.

When a KVM VM boots up, it starts with the maximum allotment of memory possible to be given (the available memory). Gradually during the boot phase of the system KVM claws back this memory using its ballooning, leaving you instead with the current memory setting you have.

Its my belief thats what happened here. Linode allow you to expand the memory, giving you much more at system start.

This means that there is a Normal/HighMem zone at the beginning of the systems lifetime. When the hypervisor balloons it away, the normal zone rightly disappears from the memory manager. But, I suspect that the flag setting whether the said zone is available to allocate from is not cleared when it should. This leads the kernel to attempt to allocate from a zone that does not exist.

In terms of resolving this you have two options.

  1. Bring this up on the kernel mailing lists to see if this really is a bug, behaviour expected or nothing at all to do with what I'm saying.

  2. Request that linode set the 'available memory' on the system to be the same 1GiB assignment as the 'current memory'. Thus the system never balloons and never gets a Normal zone at boot, keeping the flag clear. Good luck getting them to do that!

You should be able to test that this is the case by setting up your own VM in KVM setting available to 6GiB, current to 1GiB and running your test using the same kernel to see if this behaviour you see above occurs. If it does, change the 'available' setting to equal the 1GiB current and repeat the test.

I'm making a bunch of educated guesses here and reading inbetween the lines somewhat to come up with this answer, but what I'm saying seems to fit the facts outlined already.

I suggest testing my hypothesis and letting us all know the outcome.

Matthew Ife
  • 22,927
  • 2
  • 54
  • 71
33

To answer your headline question, use oom_score_adj(kernel >=2.6.36) or for earlier kernels (>=2.6.11) oom_adj, see man proc

/proc/[pid]/oom_score_adj (since Linux 2.6.36) This file can be used to adjust the badness heuristic used to select which process gets killed in out-of-memory conditions...

/proc/[pid]/oom_adj (since Linux 2.6.11) This file can be used to adjust the score used to select which process should be killed in an out-of-memory (OOM) situation...

There's lots more to read but setting oom_score_adj to -1000 or oom_adj to -17 is going to achieve what you want.

The trouble is something else will be killed. Perhaps it would be better to determine why OOM is being invoked and deal with that.

user9517
  • 114,104
  • 20
  • 206
  • 289
  • Since there is 3 gigs free swap I think that the OOM killer should not be triggered. I want it to not be triggered. For this test I can't have any of the processes on the host be killed, and they shouldn't be killed because there is so much free swap. – Coder Mar 07 '16 at 08:27
  • 4
    +1 for "solve the underlying problem". Is it possible that the offending piece of software (or something else) has just tried to malloc a big chunk of core? It's requests for *more* memory, that are going to bring things into red-alert territory, that tend to trigger the OOM killer. – MadHatter Mar 07 '16 at 09:30
  • 12
    @Coder: The Linux kernel programmers and the Linux kernel clearly know more than you. Your process was killed because (despite your protestations) there was insufficient memory. If you think this is incorrect then file a [bug report](https://bugzilla.kernel.org/). If you're not going to listen to what people who are clearly knowledgeable have to say then perhaps you should pay for your support because advice is worth what you pay for it. The advice won't be different but you'll have paid for it so will value it more. – user9517 Mar 07 '16 at 10:02
  • @madhatter No, in this case another process tried to allocate a small chunk of memory. For the moment give me the benefit of the doubt that my process has no error. – Coder Mar 07 '16 at 10:11
  • @Iain It is clear that there was insufficient physical memory to satisfy the kernel. My question is "how do I prevent that situation from resulting in my processes being killed." Buying more memory is not a general solution to the problem because there will always be an upper limit to physical memory and I don't want my process to be killed when there is plenty of swap left. – Coder Mar 07 '16 at 10:13
  • @Iain I would happily pay for advice. Do you know any places where I can do so? – Coder Mar 07 '16 at 10:14
  • @Coder and you're confident it's not trying to allocate more memory that's pinned in-core, yes? – MadHatter Mar 07 '16 at 10:19
  • @MadHatter I'm not confident. I don't know how to check that. Do you have any advice? – Coder Mar 07 '16 at 10:35
  • 4
    @Coder I'm no programmer, sadly. It's just that caught between two possibilities: that the kernel doesn't know how to use VM, and that a programmer has made an error, I know which one my money's on. – MadHatter Mar 07 '16 at 11:57
  • @coder I have asked someone to contact you. – user9517 Mar 07 '16 at 19:23
  • 1
    @coder I'm the 'someone'. Let me know how to get in touch. – Matthew Ife Mar 07 '16 at 19:35
  • 1
    @MadHatter from running 1000s of linux systems I can tell you: it is NOT the case that one could assume there are no issues in the memory management or any other part of the kernel. This isn't like a high-grade unix platform and while everything normally works just fine it is *not* sensible to take either side in any dogmatic way. – Florian Heigl Nov 24 '17 at 16:09
  • @FlorianHeigl what makes you think I'm taking a dogmatic position? I'm making a bet ("*I know which one my money's on*"), that's all. While we're waving our experience around, in my 23 years of administering Linux systems, which surely also run into the thousands, I have seen many genuine errors in the VM subsystem - but I have seen many, **many** more bad programmers. Once the programmer can confirm (s)he's not asking the VM subsystem for anything crazy, we can move on to doubting the subsystem itself. The author of the answer we're writing under clearly also feels the same way. – MadHatter Nov 24 '17 at 16:37
12

Several thoughts (from my comments above), and links to interresting reads about your situation:

Olivier Dulac
  • 1,202
  • 7
  • 14
  • 1
    oom_adj is only valid for older kernels, newer ones use oom_score_adj. – user9517 Mar 07 '16 at 12:06
  • disclaimer: I can not give more detailled infos than the few links above, as I can't access a linux system at the moment... and there are so many things to check. Maybe someone will step in and provide nice step by step procedures... (the serverfault answer, last of the "good reads" link in my answer, was like that, and is an incredible read.) – Olivier Dulac Mar 07 '16 at 12:17
6

beside mentioned oom_score_adj increasing for the process in question (which probably won't help much -- it would make it less likely that that process would be killed FIRST, but as that is only memory intensive process system probably won't recover until it is finally killed), here are few ideas to tweak:

  • if you set vm.overcommit_memory=2, also tweak vm.overcommit_ratio to maybe 90 (alternatively, set vm.overcommit_memory=0 - see kernel overcommit docs)
  • increase vm.min_free_kbytes in order to always keep some physical RAM free and thus reduce chances of OOM needing to kill something (but do not overdo it, as it will OOM instantly).
  • increase vm.swappiness to 100 (to make kernel swap more readily)

Note that if you have too little memory to accomplish the task at hand, even if you do not OOM, it may (or may not) become EXTREMELY slow - half an hour job (on system with enough RAM) can easily take several weeks (when RAM is replaced with swap) to complete in extreme cases, or even hang whole VM. That is especially the case if swap is on classical rotational disks (as opposed to SSDs) due to massive random reads/writes which are very expensive on them.

Matija Nalis
  • 2,409
  • 23
  • 37
4

I would try to enable overcommit and see if that helps. Your process seems to fail inside a fork call, which requires as much virtual memory as the initial process had. overcommit_memory=2 doesn't make your process immune to OOM killer, it just prevents your process from triggering it by allocating too much. Other processes may produce unrelated allocation errors (e.g. getting a contiguous memory block), which still trigger the OOM killer and get your process disposed of.

Alternatively (and more to the point), as several comments suggest, buy more RAM.

Dmitry Grigoryev
  • 588
  • 4
  • 14
0

I ran into this on a vm with no swap 2G ram, running debian stretch (latest)

only when running a script from userland cron. The same script worked from command line !!

It took an full-upgrade to buster to fix the problem. stretch has a bug in the kernel or I suppose cron, setting different memory limits.

0

Short story - try a different kernel version. I have a system that showed OOM errors with 4.2.0-x and 4.4.0-x kernels, but not with 3.19.0-x.

Long story: (not too long!) I've got a Compaq DC5000 still in service here -- currently with 512MB of RAM (and some portion of that, like 32-128MB, being given to the onboard video..) Mostly serving NFS, I do have a monitor hooked up to it so occasionally I'll log into it (Ubuntu Classic, no Unity.)

Via Ubuntu HWE I was running 3.19.x kernel for a good while; it'd end up swapping out like 200-300MB of stuff, but apparently it was unused stuff, there wouldn't be any swap activity from it having to swap it in later as far as I could tell.

4.2.0-x kernel and now 4.4.0-x kernel, I can start a chunky NFS write to it, only 220MB into the swap (i.e. 1.3GB free), and it'll start OOM killing things. I won't claim if it's a kernel bug or "tuning issue" (like a 64MB reserve that's normally fine, but too high on a ~400MB or so system?)

No disrespect to those who are saying it's somehow breaking just because he expects to use swap; with all due respect you're wrong. It won't be fast, but I used to go 1 or 2GB into the swap on a few 512MB-1GB systems. Of course some types of software mlock's a bunch of RAM but in my case (since I'm running the same software just on a different kernel) this is clearly not the case.