I have a server where the OOM killer has been invoked once or twice almost each night since two weeks ago. The server should have more than enough memory it could free (cache/buffers, inactive_file
below) as well as more than enough free swap space, and I cannot make heads and tails of the numbers printed by the kernel. I've read multiple posts about this all around the internet, and I'm well aware what the numbers printed by free
mean, but I just can't make any headway on analyzing this particular issue.
Here's some more info:
- It's a virtual machine (Ubuntu 16.04), Ubuntu's kernel 4.4.0-59-generic.
- The host is VMware ESXi 6.5.
- The VM runs several containes via lxc. Therefore the number of processes can be rather high.
- The VM has 28 GB of memory assigned and uses an additional swap file of ~ 20 GB.
Why is the OOM killer invoked? What can I do (apart from blindly adding more memory — I'd like to understand if and why these numbers actually indicate that more memory would help)?
Here's the kernel messages from when the OOM killer was last invoked:
Feb 01 00:37:02 akira kernel: php invoked oom-killer: gfp_mask=0x26000c0, order=2, oom_score_adj=0
Feb 01 00:37:02 akira kernel: php cpuset=lakota mems_allowed=0
Feb 01 00:37:02 akira kernel: CPU: 1 PID: 31693 Comm: php Not tainted 4.4.0-59-generic #80-Ubuntu
Feb 01 00:37:02 akira kernel: Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/05/2016
Feb 01 00:37:02 akira kernel: 0000000000000286 00000000eaaf82b8 ffff88052d9afaf0 ffffffff813f7583
Feb 01 00:37:02 akira kernel: ffff88052d9afcc8 ffff88071bf9aa00 ffff88052d9afb60 ffffffff8120ad5e
Feb 01 00:37:02 akira kernel: ffff88073fd1a870 ffff88073fd1a860 ffffea000419f440 0000000100000001
Feb 01 00:37:02 akira kernel: Call Trace:
Feb 01 00:37:02 akira kernel: [<ffffffff813f7583>] dump_stack+0x63/0x90
Feb 01 00:37:02 akira kernel: [<ffffffff8120ad5e>] dump_header+0x5a/0x1c5
Feb 01 00:37:02 akira kernel: [<ffffffff81192722>] oom_kill_process+0x202/0x3c0
Feb 01 00:37:02 akira kernel: [<ffffffff81192b49>] out_of_memory+0x219/0x460
Feb 01 00:37:02 akira kernel: [<ffffffff81198abd>] __alloc_pages_slowpath.constprop.88+0x8fd/0xa70
Feb 01 00:37:02 akira kernel: [<ffffffff81198eb6>] __alloc_pages_nodemask+0x286/0x2a0
Feb 01 00:37:02 akira kernel: [<ffffffff81198f6b>] alloc_kmem_pages_node+0x4b/0xc0
Feb 01 00:37:02 akira kernel: [<ffffffff8107ea5e>] copy_process+0x1be/0x1b70
Feb 01 00:37:02 akira kernel: [<ffffffff81391bcc>] ? apparmor_file_alloc_security+0x5c/0x220
Feb 01 00:37:02 akira kernel: [<ffffffff811ed05a>] ? kmem_cache_alloc+0x1ca/0x1f0
Feb 01 00:37:02 akira kernel: [<ffffffff81347bd3>] ? security_file_alloc+0x33/0x50
Feb 01 00:37:02 akira kernel: [<ffffffff810805a0>] _do_fork+0x80/0x360
Feb 01 00:37:02 akira kernel: [<ffffffff81080929>] SyS_clone+0x19/0x20
Feb 01 00:37:02 akira kernel: [<ffffffff818384f2>] entry_SYSCALL_64_fastpath+0x16/0x71
Feb 01 00:37:02 akira kernel: Mem-Info:
Feb 01 00:37:02 akira kernel: active_anon:939644 inactive_anon:396161 isolated_anon:0
active_file:1683401 inactive_file:3767879 isolated_file:0
unevictable:1481 dirty:902 writeback:0 unstable:0
slab_reclaimable:155382 slab_unreclaimable:15433
mapped:71733 shmem:15843 pagetables:19280 bounce:0
free:196889 free_pcp:19 free_cma:0
Feb 01 00:37:02 akira kernel: Node 0 DMA free:15900kB min:36kB low:44kB high:52kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15992kB managed:15908kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:8kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
Feb 01 00:37:02 akira kernel: lowmem_reserve[]: 0 2940 28091 28091 28091
Feb 01 00:37:02 akira kernel: Node 0 DMA32 free:121140kB min:7068kB low:8832kB high:10600kB active_anon:284776kB inactive_anon:330268kB active_file:701204kB inactive_file:1373280kB unevictable:1828kB isolated(anon):0kB isolated(file):0kB present:3129280kB managed:3048656kB mlocked:1828kB dirty:276kB writeback:0kB mapped:36756kB shmem:4548kB slab_reclaimable:212192kB slab_unreclaimable:5196kB kernel_stack:1056kB pagetables:7168kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
Feb 01 00:37:02 akira kernel: lowmem_reserve[]: 0 0 25150 25150 25150
Feb 01 00:37:02 akira kernel: Node 0 Normal free:650516kB min:60476kB low:75592kB high:90712kB active_anon:3473800kB inactive_anon:1254376kB active_file:6032400kB inactive_file:13698236kB unevictable:4096kB isolated(anon):0kB isolated(file):0kB present:26214400kB managed:25754528kB mlocked:4096kB dirty:3332kB writeback:0kB mapped:250176kB shmem:58824kB slab_reclaimable:409336kB slab_unreclaimable:56528kB kernel_stack:9296kB pagetables:69952kB unstable:0kB bounce:0kB free_pcp:72kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
Feb 01 00:37:02 akira kernel: lowmem_reserve[]: 0 0 0 0 0
Feb 01 00:37:02 akira kernel: Node 0 DMA: 1*4kB (U) 1*8kB (U) 1*16kB (U) 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15900kB
Feb 01 00:37:02 akira kernel: Node 0 DMA32: 20703*4kB (UME) 4794*8kB (UME) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 121164kB
Feb 01 00:37:02 akira kernel: Node 0 Normal: 146130*4kB (UMEH) 7997*8kB (UMEH) 3*16kB (H) 3*32kB (H) 3*64kB (H) 3*128kB (H) 3*256kB (H) 1*512kB (H) 0*1024kB 0*2048kB 0*4096kB = 650496kB
Feb 01 00:37:02 akira kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Feb 01 00:37:02 akira kernel: 5489461 total pagecache pages
Feb 01 00:37:02 akira kernel: 21519 pages in swap cache
Feb 01 00:37:02 akira kernel: Swap cache stats: add 390089, delete 368570, find 24240772/24285246
Feb 01 00:37:02 akira kernel: Free swap = 22195940kB
Feb 01 00:37:02 akira kernel: Total swap = 23064572kB
Feb 01 00:37:02 akira kernel: 7339918 pages RAM
Feb 01 00:37:02 akira kernel: 0 pages HighMem/MovableOnly
Feb 01 00:37:02 akira kernel: 135145 pages reserved
Feb 01 00:37:02 akira kernel: 0 pages cma reserved
Feb 01 00:37:02 akira kernel: 0 pages hwpoisoned
… (snip process list) …
Feb 01 00:37:02 akira kernel: Out of memory: Kill process 12508 (mysqld) score 51 or sacrifice child
Feb 01 00:37:02 akira kernel: Killed process 12508 (mysqld) total-vm:3794008kB, anon-rss:2625732kB, file-rss:5980kB
I'm also logging various values from /proc/meminfo
each minute. Here's the graph from midnight to 01:00:
Output of systctl -a | grep '^vm'
:
vm.admin_reserve_kbytes = 8192
vm.block_dump = 0
vm.compact_unevictable_allowed = 1
vm.dirty_background_bytes = 0
vm.dirty_background_ratio = 10
vm.dirty_bytes = 0
vm.dirty_expire_centisecs = 3000
vm.dirty_ratio = 20
vm.dirty_writeback_centisecs = 500
vm.dirtytime_expire_seconds = 43200
vm.drop_caches = 0
vm.extfrag_threshold = 500
vm.hugepages_treat_as_movable = 0
vm.hugetlb_shm_group = 0
vm.laptop_mode = 0
vm.legacy_va_layout = 0
vm.lowmem_reserve_ratio = 256 256 32 1
vm.max_map_count = 65530
vm.memory_failure_early_kill = 0
vm.memory_failure_recovery = 1
vm.min_free_kbytes = 67584
vm.min_slab_ratio = 5
vm.min_unmapped_ratio = 1
vm.mmap_min_addr = 65536
vm.nr_hugepages = 0
vm.nr_hugepages_mempolicy = 0
vm.nr_overcommit_hugepages = 0
vm.nr_pdflush_threads = 0
vm.numa_zonelist_order = default
vm.oom_dump_tasks = 1
vm.oom_kill_allocating_task = 0
vm.overcommit_kbytes = 0
vm.overcommit_memory = 0
vm.overcommit_ratio = 50
vm.page-cluster = 3
vm.panic_on_oom = 0
vm.percpu_pagelist_fraction = 0
vm.stat_interval = 1
vm.swappiness = 60
vm.user_reserve_kbytes = 131072
vm.vfs_cache_pressure = 10000
vm.zone_reclaim_mode = 0
Thanks!