I am aware that there are many threads like this one, but none of them actually give a methodology on how to read an OOM message.
I have a db that is being monitored through NRPE for available memory, using a shell script check that calls free -m ...
underneath grepping for available memory and checking if the threshold is fine.
Mysql was killed by OOM without that check ever being triggered. The message was:
[45948878.475805] master invoked oom-killer: gfp_mask=0x3000d0, order=2, oom_score_adj=0
[45948878.483847] master cpuset=/ mems_allowed=0
[45948878.488366] CPU: 9 PID: 1095 Comm: master Not tainted 3.10.0-514.16.1.el7.x86_64 #1
[45948878.496464] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
[45948878.506124] ffff880ee42d5e20 0000000076e51669 ffff880ee4b43ae8 ffffffff81686ac3
[45948878.514480] ffff880ee4b43b78 ffffffff81681a6e ffffffff812aedab ffff880f02e96ad0
[45948878.522887] ffff880ee4b43bc0 ffffffff00000206 fffeefff00000000 0000000000000008
[45948878.531320] Call Trace:
[45948878.534171] [<ffffffff81686ac3>] dump_stack+0x19/0x1b
[45948878.539728] [<ffffffff81681a6e>] dump_header+0x8e/0x225
[45948878.545437] [<ffffffff812aedab>] ? cred_has_capability+0x6b/0x120
[45948878.552019] [<ffffffff81184c7e>] oom_kill_process+0x24e/0x3c0
[45948878.558306] [<ffffffff81093c0e>] ? has_capability_noaudit+0x1e/0x30
[45948878.565052] [<ffffffff811854b6>] out_of_memory+0x4b6/0x4f0
[45948878.571022] [<ffffffff81682577>] __alloc_pages_slowpath+0x5d7/0x725
[45948878.577769] [<ffffffff8118b5c5>] __alloc_pages_nodemask+0x405/0x420
[45948878.584562] [<ffffffff8108353d>] copy_process+0x1dd/0x1960
[45948878.590536] [<ffffffff81084e71>] do_fork+0x91/0x2c0
[45948878.595893] [<ffffffff81085126>] SyS_clone+0x16/0x20
[45948878.601405] [<ffffffff816974d9>] stub_clone+0x69/0x90
[45948878.606940] [<ffffffff81697189>] ? system_call_fastpath+0x16/0x1b
[45948878.613511] Mem-Info:
[45948878.616212] active_anon:14681394 inactive_anon:285983 isolated_anon:92
active_file:2380 inactive_file:3203 isolated_file:76
unevictable:0 dirty:0 writeback:0 unstable:0
slab_reclaimable:94372 slab_unreclaimable:33256
mapped:11621 shmem:768108 pagetables:32080 bounce:0
free:93898 free_pcp:64 free_cma:0
[45948878.652587] Node 0 DMA free:15908kB min:16kB low:20kB high:24kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15992kB managed:15908kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
[45948878.694520] lowmem_reserve[]: 0 2977 60382 60382
[45948878.700271] Node 0 DMA32 free:230644kB min:3328kB low:4160kB high:4992kB active_anon:2476892kB inactive_anon:24492kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3129332kB managed:3049456kB mlocked:0kB dirty:0kB writeback:0kB mapped:156kB shmem:72884kB slab_reclaimable:156940kB slab_unreclaimable:34796kB kernel_stack:19120kB pagetables:5252kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
[45948878.746051] lowmem_reserve[]: 0 0 57404 57404
[45948878.751495] Node 0 Normal free:63356kB min:64236kB low:80292kB high:96352kB active_anon:56248684kB inactive_anon:1119440kB active_file:11540kB inactive_file:19792kB unevictable:0kB isolated(anon):368kB isolated(file):304kB present:59768832kB managed:58782432kB mlocked:0kB dirty:0kB writeback:0kB mapped:46328kB shmem:2999548kB slab_reclaimable:220548kB slab_unreclaimable:98228kB kernel_stack:25488kB pagetables:123068kB unstable:0kB bounce:0kB free_pcp:5208kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:5126 all_unreclaimable? no
[45948878.800106] lowmem_reserve[]: 0 0 0 0
[45948878.804880] Node 0 DMA: 1*4kB (U) 0*8kB 0*16kB 1*32kB (U) 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15908kB
[45948878.821646] Node 0 DMA32: 2212*4kB (UEM) 1679*8kB (UEM) 845*16kB (UEM) 1332*32kB (UEM) 648*64kB (UEM) 330*128kB (UEM) 192*256kB (UEM) 38*512kB (UM) 0*1024kB 0*2048kB 0*4096kB = 230744kB
[45948878.841497] Node 0 Normal: 16437*4kB (U) 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 65748kB
[45948878.855460] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
[45948878.864552] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[45948878.873384] 773511 total pagecache pages
[45948878.877730] 0 pages in swap cache
[45948878.881437] Swap cache stats: add 0, delete 0, find 0/0
[45948878.887060] Free swap = 0kB
[45948878.890329] Total swap = 0kB
[45948878.893592] 15728539 pages RAM
[45948878.897028] 0 pages HighMem/MovableOnly
[45948878.901251] 266590 pages reserved
Am I monitoring memory usage incorrectly? What does this message say? I don't know how to interpret the values for memory. What can I be checking on the system to prevent this from happening again?