0

We're running a linux server hard, but it seems ok, as of 6 GiB RAM the kernel sees fit to allocate 1.6 GiB to cache, and we have 9.9 GiB free swap. When we ran out of memory, instead of freeing up cache or swapping out processes, the kernel oom decided to kill the database server. We have swappiness set to 0 (longer side conversation). We also have 1.7 GiB database server memory pinned in RAM, presumably allocated in large contiguous chunks.

While it's quite nice to have spare RAM used for cache, we'd highly prefer that be freed for applications, prior to kneecapping the server. We were caught quite by surprise.

Any help in predicting and managing this (aside from adding RAM to the system) would be much appreciated! (Our immediate response was to dial back swappiness, resulting in increased free memory, same cache allocation, and concomitant swap IO and increased cpu WaitIO. It appears wasteful to have free memory while users wait for swap, but its better than a dead system.)

Relevant stats.

sar -s 15:00:00 -e 16:00:00
Linux 2.6.32-358.2.1.el6.x86_64 (system)     04/23/2014      _x86_64_        (3 CPU)

03:00:02 PM     CPU     %user     %nice   %system   %iowait    %steal     %idle
03:10:01 PM     all     17.90      0.00     35.27      3.29      0.12     43.42
03:20:02 PM     all     19.94      0.00     36.40      8.73      0.12     34.82
03:30:02 PM     all     20.18      0.00     42.10     20.37      0.16     17.19
03:40:01 PM     all      5.75      0.00     19.93      5.05      0.06     69.20
03:50:01 PM     all     22.21      0.00     34.31      8.72      0.16     34.59
Average:        all     17.17      0.00     33.58      9.23      0.12     39.90

sar -s 15:00:00 -e 16:00:00
Linux 2.6.32-358.2.1.el6.x86_64 (system)    04/23/2014  _x86_64_    (3 CPU)

03:00:02 PM     CPU     %user     %nice   %system   %iowait    %steal     %idle
03:10:01 PM     all     17.90      0.00     35.27      3.29      0.12     43.42
03:20:02 PM     all     19.94      0.00     36.40      8.73      0.12     34.82
03:30:02 PM     all     20.18      0.00     42.10     20.37      0.16     17.19
03:40:01 PM     all      5.75      0.00     19.93      5.05      0.06     69.20
03:50:01 PM     all     22.21      0.00     34.31      8.72      0.16     34.59
Average:        all     17.17      0.00     33.58      9.23      0.12     39.90

sar -r -s 15:00:00 -e 16:00:00
Linux 2.6.32-358.2.1.el6.x86_64 (system)    04/23/2014  _x86_64_    (3 CPU)

03:00:02 PM kbmemfree kbmemused  %memused kbbuffers  kbcached  kbcommit   %commit
03:10:01 PM    125396   5843216     97.90     22160   1989352  23282984    141.51
03:20:02 PM     44788   5923824     99.25     19092   1759124  23759172    144.40
03:30:02 PM     34400   5934212     99.42     11864   1649488  23856192    144.99
03:40:01 PM   4139748   1828864     30.64     22972    341264   9558784     58.10
03:50:01 PM   1364632   4603980     77.14     30784   2576044  11827612     71.88
Average:      1141793   4826819     80.87     21374   1663054  18456949    112.18

sar -R -s 15:00:00 -e 16:00:00
Linux 2.6.32-358.2.1.el6.x86_64 (system)    04/23/2014  _x86_64_    (3 CPU)

03:00:02 PM   frmpg/s   bufpg/s   campg/s
03:10:01 PM    -24.49     -0.54     26.00
03:20:02 PM    -34.14     -1.30    -97.51
03:30:02 PM     -4.43     -3.08    -46.70
03:40:01 PM   1732.71      4.69   -552.15
03:50:01 PM  -1188.87      3.35    957.39
Average:       100.36      0.62     55.05

sar -S -s 15:00:00 -e 16:00:00
Linux 2.6.32-358.2.1.el6.x86_64 (system)    04/23/2014  _x86_64_    (3 CPU)

03:00:02 PM kbswpfree kbswpused  %swpused  kbswpcad   %swpcad
03:10:01 PM  10471148     13892      0.13      4300     30.95
03:20:02 PM  10471148     13892      0.13      4300     30.95
03:30:02 PM  10471148     13892      0.13      4300     30.95
03:40:01 PM  10475820      9220      0.09      3484     37.79
03:50:01 PM  10475820      9220      0.09      3484     37.79
Average:     10473017     12023      0.11      3973     33.04

And kernel log

Apr 23 15:31:01 kernel: proc_count invoked oom-killer: gfp_mask=0xd0, order=1, oom_adj=0, oom_score_adj=0
Apr 23 15:31:01 kernel: proc_count cpuset=/ mems_allowed=0
Apr 23 15:31:01 kernel: Pid: 1713, comm: proc_count Not tainted 2.6.32-358.2.1.el6.x86_64 #1
Apr 23 15:31:01 kernel: Call Trace:
Apr 23 15:31:01 kernel: [] ? cpuset_print_task_mems_allowed+0x91/0xb0
Apr 23 15:31:01 kernel: [] ? dump_header+0x90/0x1b0
Apr 23 15:31:01 kernel: [] ? oom_kill_process+0x82/0x2a0
Apr 23 15:31:01 kernel: [] ? select_bad_process+0xe1/0x120
Apr 23 15:31:01 kernel: [] ? out_of_memory+0x220/0x3c0
Apr 23 15:31:01 kernel: [] ? __alloc_pages_nodemask+0x8ac/0x8d0
Apr 23 15:31:01 kernel: [] ? xen_force_evtchn_callback+0xd/0x10
Apr 23 15:31:01 kernel: [] ? alloc_pages_current+0xaa/0x110
Apr 23 15:31:01 kernel: [] ? __get_free_pages+0xe/0x50
Apr 23 15:31:01 kernel: [] ? copy_process+0xe4/0x1450
Apr 23 15:31:01 kernel: [] ? kmem_cache_alloc+0x182/0x190
Apr 23 15:31:01 kernel: [] ? do_fork+0x94/0x460
Apr 23 15:31:01 kernel: [] ? alloc_fd+0x92/0x160
Apr 23 15:31:01 kernel: [] ? fd_install+0x47/0x90
Apr 23 15:31:01 kernel: [] ? _spin_lock_irq+0x10/0x40
Apr 23 15:31:01 kernel: [] ? sys_clone+0x28/0x30
Apr 23 15:31:01 kernel: [] ? stub_clone+0x13/0x20
Apr 23 15:31:01 kernel: [] ? system_call_fastpath+0x16/0x1b
Apr 23 15:31:01 kernel: Mem-Info:
Apr 23 15:31:01 kernel: Node 0 DMA per-cpu:
Apr 23 15:31:01 kernel: CPU    0: hi:    0, btch:   1 usd:   0
Apr 23 15:31:01 kernel: CPU    1: hi:    0, btch:   1 usd:   0
Apr 23 15:31:01 kernel: CPU    2: hi:    0, btch:   1 usd:   0
Apr 23 15:31:01 kernel: Node 0 DMA32 per-cpu:
Apr 23 15:31:01 kernel: CPU    0: hi:  186, btch:  31 usd:  29
Apr 23 15:31:01 kernel: CPU    1: hi:  186, btch:  31 usd:   0
Apr 23 15:31:01 kernel: CPU    2: hi:  186, btch:  31 usd:   0
Apr 23 15:31:01 kernel: Node 0 Normal per-cpu:
Apr 23 15:31:01 kernel: CPU    0: hi:  186, btch:  31 usd:  58
Apr 23 15:31:01 kernel: CPU    1: hi:  186, btch:  31 usd:   0
Apr 23 15:31:01 kernel: CPU    2: hi:  186, btch:  31 usd:   0
Apr 23 15:31:01 kernel: active_anon:657282 inactive_anon:234243 isolated_anon:0
Apr 23 15:31:01 kernel: active_file:1582 inactive_file:1577 isolated_file:0
Apr 23 15:31:01 kernel: unevictable:415617 dirty:0 writeback:4 unstable:0
Apr 23 15:31:01 kernel: free:7393 slab_reclaimable:13322 slab_unreclaimable:36105
Apr 23 15:31:01 kernel: mapped:410321 shmem:409696 pagetables:86995 bounce:0
Apr 23 15:31:01 kernel: Node 0 DMA free:7760kB min:12kB low:12kB high:16kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:7756kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
Apr 23 15:31:01 kernel: lowmem_reserve[]: 0 4024 6044 6044
Apr 23 15:31:01 kernel: Node 0 DMA32 free:17736kB min:6620kB low:8272kB high:9928kB active_anon:2256636kB inactive_anon:564276kB active_file:2088kB inactive_file:2180kB unevictable:624700kB isolated(anon):0kB isolated(file):0kB present:4120800kB mlocked:0kB dirty:0kB writeback:16kB mapped:624832kB shmem:624832kB slab_reclaimable:32124kB slab_unreclaimable:90036kB kernel_stack:32696kB pagetables:206100kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
Apr 23 15:31:01 kernel: lowmem_reserve[]: 0 0 2020 2020
Apr 23 15:31:01 kernel: Node 0 Normal free:4076kB min:3320kB low:4148kB high:4980kB active_anon:372492kB inactive_anon:372696kB active_file:4240kB inactive_file:4128kB unevictable:1037768kB isolated(anon):0kB isolated(file):0kB present:2068480kB mlocked:24776kB dirty:0kB writeback:0kB mapped:1016452kB shmem:1013952kB slab_reclaimable:21164kB slab_unreclaimable:54384kB kernel_stack:4328kB pagetables:141880kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
Apr 23 15:31:01 kernel: lowmem_reserve[]: 0 0 0 0
Apr 23 15:31:01 kernel: Node 0 DMA: 2*4kB 1*8kB 2*16kB 1*32kB 0*64kB 2*128kB 1*256kB 0*512kB 1*1024kB 1*2048kB 1*4096kB = 7760kB
Apr 23 15:31:01 kernel: Node 0 DMA32: 4083*4kB 12*8kB 4*16kB 3*32kB 2*64kB 0*128kB 0*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 17740kB
Apr 23 15:31:01 kernel: Node 0 Normal: 615*4kB 11*8kB 1*16kB 1*32kB 0*64kB 0*128kB 0*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 4132kB
Apr 23 15:31:01 kernel: 414562 total pagecache pages
Apr 23 15:31:01 kernel: 1075 pages in swap cache
Apr 23 15:31:01 kernel: Swap cache stats: add 22900328, delete 22899253, find 492200942/494634739
Apr 23 15:31:01 kernel: Free swap  = 10471148kB
Apr 23 15:31:01 kernel: Total swap = 10485040kB
Apr 23 15:31:01 kernel: 1572863 pages RAM
Apr 23 15:31:01 kernel: 80710 pages reserved
Apr 23 15:31:01 kernel: 2926057 pages shared
Apr 23 15:31:01 kernel: 1058807 pages non-shared
Apr 23 15:31:01 kernel: [ pid ]   uid  tgid total_vm      rss cpu oom_adj oom_score_adj name

skipping the long process lists

Apr 23 15:31:01 kernel: Out of memory: Kill process 1357 (oninit) score 99 or sacrifice child
Apr 23 15:31:01 kernel: Killed process 1359, UID 501, (oninit) total-vm:2056000kB, anon-rss:116kB, file-rss:10132kB
Apr 23 15:31:20 kernel: awk invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0, oom_score_adj=0
Apr 23 15:31:20 kernel: awk cpuset=/ mems_allowed=0
Apr 23 15:31:20 kernel: Pid: 22899, comm: awk Not tainted 2.6.32-358.2.1.el6.x86_64 #1
Apr 23 15:31:20 kernel: Call Trace:
Apr 23 15:31:20 kernel: [] ? cpuset_print_task_mems_allowed+0x91/0xb0
Apr 23 15:31:20 kernel: [] ? dump_header+0x90/0x1b0
Apr 23 15:31:20 kernel: [] ? oom_kill_process+0x82/0x2a0
Apr 23 15:31:20 kernel: [] ? select_bad_process+0xe1/0x120
Apr 23 15:31:20 kernel: [] ? out_of_memory+0x220/0x3c0
Apr 23 15:31:20 kernel: [] ? __alloc_pages_nodemask+0x8ac/0x8d0
Apr 23 15:31:20 kernel: [] ? alloc_pages_current+0xaa/0x110
Apr 23 15:31:20 kernel: [] ? __page_cache_alloc+0x87/0x90
Apr 23 15:31:20 kernel: [] ? find_get_page+0x1e/0xa0
Apr 23 15:31:20 kernel: [] ? filemap_fault+0x1a7/0x500
Apr 23 15:31:20 kernel: [] ? xen_set_pte_at+0xaf/0x170
Apr 23 15:31:20 kernel: [] ? __do_fault+0x54/0x530
Apr 23 15:31:20 kernel: [] ? handle_pte_fault+0xf7/0xb50
Apr 23 15:31:20 kernel: [] ? retint_restore_args+0x5/0x6
Apr 23 15:31:20 kernel: [] ? xen_hypervisor_callback+0x1b/0x20
Apr 23 15:31:20 kernel: [] ? error_exit+0x2a/0x60
Apr 23 15:31:20 kernel: [] ? retint_restore_args+0x5/0x6
Apr 23 15:31:20 kernel: [] ? xen_hypervisor_callback+0x1b/0x20
Apr 23 15:31:20 kernel: [] ? __raw_callee_save_xen_pmd_val+0x11/0x1e
Apr 23 15:31:20 kernel: [] ? handle_mm_fault+0x23a/0x310
Apr 23 15:31:20 kernel: [] ? __do_page_fault+0x139/0x480
Apr 23 15:31:20 kernel: [] ? xen_force_evtchn_callback+0xd/0x10
Apr 23 15:31:20 kernel: [] ? check_events+0x12/0x20
Apr 23 15:31:20 kernel: [] ? xen_force_evtchn_callback+0xd/0x10
Apr 23 15:31:20 kernel: [] ? check_events+0x12/0x20
Apr 23 15:31:20 kernel: [] ? do_page_fault+0x3e/0xa0
Apr 23 15:31:20 kernel: [] ? page_fault+0x25/0x30
Apr 23 15:31:20 kernel: Mem-Info:
Apr 23 15:31:20 kernel: Node 0 DMA per-cpu:
Apr 23 15:31:20 kernel: CPU    0: hi:    0, btch:   1 usd:   0
Apr 23 15:31:20 kernel: CPU    1: hi:    0, btch:   1 usd:   0
Apr 23 15:31:20 kernel: CPU    2: hi:    0, btch:   1 usd:   0
Apr 23 15:31:20 kernel: Node 0 DMA32 per-cpu:
Apr 23 15:31:20 kernel: CPU    0: hi:  186, btch:  31 usd:  32
Apr 23 15:31:20 kernel: CPU    1: hi:  186, btch:  31 usd:   0
Apr 23 15:31:20 kernel: CPU    2: hi:  186, btch:  31 usd:   0
Apr 23 15:31:20 kernel: Node 0 Normal per-cpu:
Apr 23 15:31:20 kernel: CPU    0: hi:  186, btch:  31 usd:  90
Apr 23 15:31:20 kernel: CPU    1: hi:  186, btch:  31 usd:   0
Apr 23 15:31:20 kernel: CPU    2: hi:  186, btch:  31 usd:   0
Apr 23 15:31:20 kernel: active_anon:659392 inactive_anon:234908 isolated_anon:0
Apr 23 15:31:20 kernel: active_file:1093 inactive_file:1129 isolated_file:0
Apr 23 15:31:20 kernel: unevictable:415617 dirty:0 writeback:0 unstable:0
Apr 23 15:31:20 kernel: free:6348 slab_reclaimable:11215 slab_unreclaimable:36129
Apr 23 15:31:20 kernel: mapped:410358 shmem:409696 pagetables:86756 bounce:0
Apr 23 15:31:20 kernel: Node 0 DMA free:7760kB min:12kB low:12kB high:16kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:7756kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
Apr 23 15:31:20 kernel: lowmem_reserve[]: 0 4024 6044 6044
Apr 23 15:31:20 kernel: Node 0 DMA32 free:14596kB min:6620kB low:8272kB high:9928kB active_anon:2264272kB inactive_anon:566132kB active_file:1476kB inactive_file:1444kB unevictable:624700kB isolated(anon):0kB isolated(file):0kB present:4120800kB mlocked:0kB dirty:0kB writeback:0kB mapped:624944kB shmem:624832kB slab_reclaimable:24184kB slab_unreclaimable:89940kB kernel_stack:32688kB pagetables:206172kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:1620 all_unreclaimable? no
Apr 23 15:31:20 kernel: lowmem_reserve[]: 0 0 2020 2020
Apr 23 15:31:20 kernel: Node 0 Normal free:3036kB min:3320kB low:4148kB high:4980kB active_anon:373296kB inactive_anon:373500kB active_file:2896kB inactive_file:3072kB unevictable:1037768kB isolated(anon):0kB isolated(file):2568kB present:2068480kB mlocked:24776kB dirty:0kB writeback:0kB mapped:1016488kB shmem:1013952kB slab_reclaimable:20676kB slab_unreclaimable:54576kB kernel_stack:4368kB pagetables:140852kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:9666 all_unreclaimable? no
Apr 23 15:31:20 kernel: lowmem_reserve[]: 0 0 0 0
Apr 23 15:31:20 kernel: Node 0 DMA: 2*4kB 1*8kB 2*16kB 1*32kB 0*64kB 2*128kB 1*256kB 0*512kB 1*1024kB 1*2048kB 1*4096kB = 7760kB
Apr 23 15:31:20 kernel: Node 0 DMA32: 3297*4kB 12*8kB 4*16kB 3*32kB 2*64kB 0*128kB 0*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 14596kB
Apr 23 15:31:20 kernel: Node 0 Normal: 356*4kB 1*8kB 0*16kB 1*32kB 0*64kB 0*128kB 0*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 3000kB
Apr 23 15:31:20 kernel: 414714 total pagecache pages
Apr 23 15:31:20 kernel: 1097 pages in swap cache
Apr 23 15:31:20 kernel: Swap cache stats: add 22900397, delete 22899300, find 492200999/494634809
Apr 23 15:31:20 kernel: Free swap  = 10471368kB
Apr 23 15:31:20 kernel: Total swap = 10485040kB
Apr 23 15:31:20 kernel: 1572863 pages RAM
Apr 23 15:31:20 kernel: 80710 pages reserved
Apr 23 15:31:20 kernel: 2926158 pages shared
Apr 23 15:31:20 kernel: 1059316 pages non-shared
Apr 23 15:31:20 kernel: [ pid ]   uid  tgid total_vm      rss cpu oom_adj oom_score_adj name

again skiping the long process list

Apr 23 15:31:20 kernel: Out of memory: Kill process 1357 (oninit) score 99 or sacrifice child
Apr 23 15:31:20 kernel: Killed process 1357, UID 501, (oninit) total-vm:2056000kB, anon-rss:648kB, file-rss:1637852kB
user3546411
  • 151
  • 6
  • Yes, if Linux runs out of memory, the killer starts killing "randomly", it didn't swap in your case because you had swapiness=0 and in any case if you add swap like you noticed, the server grinds to a halt with I/O, you can't win here. The way to go imho is to baseline your applications and understand their memory needs and cap them as per settings (number of children spawn, Java Xmx for ex), ulimit etc, plus monitor with an RRD tool what's going on in the server and what causes the RAM spikes. – LinuxDevOps Apr 23 '14 at 21:10
  • 1
    Swappiness of 0 is not supposed to stop swapping, it's supposed to swap only when needed. Out of memory is "needed". That's what swap's for, isn't it? Also, we have 1.6 GiB RAM used for cache. That's supposed to freed when required by processes. – user3546411 Apr 23 '14 at 21:21
  • yes right, supposed to swap only if needed, and also if you have free memory Linux is not supposed to swap but it does happen, thing is I suggest forget about swap and profile your apps, do capacity planning for memory – LinuxDevOps Apr 23 '14 at 21:29
  • We have done capacity planning; the applications use ~ 3.5 GiB RAM, on a 6 GiB system, of which ~ 1 GiB is typically quiescent and can be swapped (which is what happens when we have high swappiness). But then we have ~ 1 GiB free memory, 1.6 GiB cache, and users waiting (wait i/o) while swap happens. None of this gets to the point, though ... the kernel should free cache or swap, not kill processes. – user3546411 Apr 23 '14 at 22:58
  • do you have RRD-type system resource history (munin/cacti/nagios)? if you had oom then you ran out of memory at some point (spike?) and imho it's better to look at your processes than trying to control kernel swap/cache etc since it's somehow unpredictable (at least in my experience, as I said Linux can swap with memory available and yours is another example of unexpected behaviour), others may disagree. – LinuxDevOps Apr 24 '14 at 00:51
  • Your memory looks pretty fragmented. What is your workload? – Michael Hampton Apr 24 '14 at 01:07
  • LinuxDevOPs - yes, we have 20 second sample interval statistics from collectd. Nothing odd at the time - just a daily build of load. – user3546411 Apr 24 '14 at 17:43
  • Michael - yep - it does get fragmented. Lots of small processes (roughly 350) being run by a hundred users, doing lots of different stuff against the DB server. – user3546411 Apr 24 '14 at 17:44
  • Again nothing here is to the point: this is about oom killer, wnd whether it should consider the system oom when there's 1.6 GiB cache, and 9,9 GiB swap - or whether it is expected to free cache (as is broadly documented) or use swap (again, as is expected). – user3546411 Apr 24 '14 at 17:45
  • I do realize oom killer will kill processes with swap free if swap is not responsive enough, but given that the system runs fine using swap under swapiness of 1, I'd assert that's evidence operational swap is sufficiently responsive that oom killer should allow it to work. – user3546411 Apr 24 '14 at 17:47
  • Is it a CentOS server? see http://www.mysqlperformanceblog.com/2014/04/28/oom-relation-vm-swappiness0-new-kernel/ , an example where vm.swappiness=0 does avoid swapping, although I think 'swappiness' it's unreliable at best – LinuxDevOps Apr 28 '14 at 17:08

1 Answers1

0

I have a strong suspicion this is due to memory over commit. This is not typically tracked by Linux toolsets:

The current overcommit limit and amount committed are viewable in /proc/meminfo as CommitLimit and Committed_AS respectively.

When this happens, even though there is plenty of unused memory the kernel's heuristics will often cause it to pick the database server to kill (due to its large memory allocation).

We had to develop a plugin for our collectd monitoring to see this in action.

Our system runs a java process which had a memory leak that did not show in normal monitoring because the memory was never actually used. Hence we did not see memory use grow; however, we did see memory commit grow. We now believe this was the root cause problem.

I have seen online references to this problem in java app/db servers, where large numbers of java applications running with a database server can cause the same issue (high overcommit, followed by the kernel killing the DB server).

user3546411
  • 151
  • 6