0

I am trying to understand why OOM killer starts killing processes when there is plenty of memory free on the server:

Result of uname -a:

Linux hostname 2.6.32.43-0.4.1.xs1.8.0.835.170778xen #1 SMP Wed May 29 18:06:30 EDT 2013 i686 i686 i386 GNU/Linux

Here is the output of the /var/log/messages file at the time:

Oct 19 10:59:13 hostname kernel: [86864613.667317] DMA free:2884kB min:76kB low:92kB high:112kB active_anon:0kB inactive_anon:0kB active_file:4kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:16256kB mlocked:0kB dirty:0kB writeback:0kB mapped:4kB shmem:0kB slab_reclaimable:112kB slab_unreclaimable:4968kB kernel_stack:1616kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
Oct 19 10:59:13 hostname kernel: [86864613.667329] lowmem_reserve[]: 0 699 4021 4021
Oct 19 10:59:13 hostname kernel: [86864613.667337] Normal free:11300kB min:3424kB low:4280kB high:5136kB active_anon:0kB inactive_anon:0kB active_file:116kB inactive_file:104kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:715992kB mlocked:0kB dirty:0kB writeback:0kB mapped:180kB shmem:0kB slab_reclaimable:10700kB slab_unreclaimable:560156kB kernel_stack:2928kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:32 all_unreclaimable? no
Oct 19 10:59:13 hostname2 kernel: [86864613.667350] lowmem_reserve[]: 0 0 26574 26574
Oct 19 10:59:13 hostname kernel: [86864613.667357] HighMem free:2983676kB min:512kB low:4564kB high:8616kB active_anon:224476kB inactive_anon:69692kB active_file:47640kB inactive_file:55096kB unevictable:38204kB isolated(anon):0kB isolated(file):0kB present:3401572kB mlocked:38204kB dirty:32kB writeback:0kB mapped:36896kB shmem:2716kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
Oct 19 10:59:13 hostname kernel: [86864613.667370] lowmem_reserve[]: 0 0 0 0
Oct 19 10:59:13 hostname kernel: [86864613.667375] DMA: 691*4kB 8*8kB 6*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 2924kB
Oct 19 10:59:13 hostname kernel: [86864613.667386] Normal: 2751*4kB 37*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 11300kB
Oct 19 10:59:13 hostname kernel: [86864613.667397] HighMem: 26993*4kB 32975*8kB 16252*16kB 5804*32kB 1288*64kB 227*128kB 108*256kB 51*512kB 38*1024kB 14*2048kB 472*4096kB = 2983676kB
Oct 19 10:59:13 hostname kernel: [86864613.667410] 27660 total pagecache pages
Oct 19 10:59:13 hostname kernel: [86864613.667412] 0 pages in swap cache
Oct 19 10:59:13 hostname kernel: [86864613.667415] Swap cache stats: add 0, delete 0, find 0/0
Oct 19 10:59:13 hostname kernel: [86864613.667417] Free swap  = 524280kB
Oct 19 10:59:13 hostname kernel: [86864613.667419] Total swap = 524280kB
Oct 19 10:59:13 hostname kernel: [86864613.674877] 1050624 pages RAM
Oct 19 10:59:13 hostname kernel: [86864613.674885] 857090 pages HighMem
Oct 19 10:59:13 hostname kernel: [86864613.674887] 39051 pages reserved
Oct 19 10:59:13 hostname kernel: [86864613.674892] 74281 pages shared
Oct 19 10:59:13 hostname kernel: [86864613.674894] 235220 pages non-shared
Oct 19 10:59:13 hostname kernel: [86864613.674898] Out of memory: kill process 1729 (fe) score 52596 or a child
Oct 19 10:59:13 hostname kernel: [86864613.674902] Killed process 1730 (xapi)
Oct 19 10:59:13 hostname mpathalert: [error|hostname|1||http] Failed to parse HTTP response status line []
Oct 19 10:59:13 hostname xapi: [ info|hostname|0 thread_zero||watchdog] received signal: SIGKILL
Oct 19 10:59:13 hostname xapi: [ info|hostname|0 thread_zero||watchdog] xapi watchdog exiting.
Oct 19 10:59:13 hostname xapi: [ info|hostname|0 thread_zero||watchdog] Fatal: xapi died with signal -7: not restarting (watchdog never restarts on this signal)

Output of free -m:

             total       used       free     shared    buffers     cached
Mem:          4069       1236       2832          0          5        228
-/+ buffers/cache:       1002       3066
Swap:          511          0        511

As you can see there is lots of memory free. How can I investigate why its killing processes with enough memory usable?

Also if there is not enough memory, how would I check which process is causing out of memory?

W Khan
  • 1
  • 1

1 Answers1

0

Looking at free -m output after process was killed has not much meaning.
Output is dynamic and if it shows that you currently don't have shortage of memory it doesn't mean you didn't have it when OOM condition was triggered
I suggest to read this excellent Oracle article on configuring OOM killer http://www.oracle.com/technetwork/articles/servers-storage-dev/oom-killer-1911807.html

In this article they point out a way to exclude pid from OOM completely, although this is not recommended.
If this happens in your environment on a regular basis I would adjust your sar to run every minute versus default every 10 minutes and then you can have a better view on your ram consumption dynamics.

Dmitry Zayats
  • 1,378
  • 6
  • 7