7

. Greetings, fellow admins. I post in the hope of shedding light on the OOM-kills that haunt one of my company's machines. I cannot decide if they are legit OOMs or not.

It's a Centos 6.x with kernel 2.6.32-279.1.1.el6.x86_64.

Ram 8 gb, cpu athlon2-x4.

The big processes are mysql and vmware player 4, with a max of respectively 2 and 4 GB of ram constantly allocated plus some overhead. (Vmware is the one which gets killed, because of its bigger size.)

There's daemons running other than these 2, but they are very small, and very lightly loaded, so I dont understand where the remaining 1/1.5 gig of ram would disappear, not counting the huge swap...

Also, the kills happen only while it is running the backup cronjobs at night. (These are simple shell scripts with standard tools which dump some databases and zip some web and mailbox directories.)

Here, for example, it was doing a mysqldump -- and it's the first time it happens with this particular task. It used to happen almost every time, also coupled with 'page allocation failure', when I ran rsyncs or zips of a big directory tree (~1 million small files). BUT I moved all that to another machine with zfs: after this operation, the killer left me alone, for a while..

I hate that after juggling with the issue for months, reading and re reading every thread on the internet, I still cannot relate the info to my case. There is swap, why it doesn't swap instead of killing? And who takes all the ram anyway? (In the beginning there were a couple of legit memory leakers and I got them.) Can't be fragmentation either, as the failed requests are of order zero..

I paste some data before the actual kill logs:

vm.swappiness = 100
vm.vfs_cache_pressure = 5000
vm.min_free_kbytes = 262144

(these I added to try fixing it, they're probably a bit extreme but it runs smoothly anyway)

I experimented, in vain, with overcommit_memory=2, too. Isn't that supposed to disable the killer?

This is the normal memory status of the system. Note that vmware's ram counts as cache, because of the mmap-ed vmem. And by the way, vmware is set to allow reclaiming/swapping of vm memory. And it doesn't do it, ever.

           total       used       free     shared    buffers     cached
Mem:       7800792    7400032     400760          0      61100    4449196
-/+ buffers/cache:    2889736    4911056
Swap:      8388600     761588    7627012

SwapCached:       286648 kB
PageTables:        40200 kB
CommitLimit:    15409312 kB
Committed_AS:    8099460 kB
AnonHugePages:    192512 kB

Node 0, zone      DMA      4      1      1      3      1      1      0      0      1      1      3 
Node 0, zone    DMA32    378   1476   2541   1491    328    240     74     28      8      0      0 
Node 0, zone   Normal   1555    124    956   1825    659    175     54     31     15      0      0 

Finally, the OOM:


Jan  2 21:37:38  : vmware-vmx invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=0, oom_score_adj=0
Jan  2 21:37:38  : vmware-vmx cpuset=/ mems_allowed=0
Jan  2 21:37:38  : Pid: 19635, comm: vmware-vmx Not tainted 2.6.32-279.1.1.el6.x86_64 #1
Jan  2 21:37:38  : Call Trace:
Jan  2 21:37:38  : [] ? cpuset_print_task_mems_allowed+0x91/0xb0
Jan  2 21:37:38  : [] ? dump_header+0x90/0x1b0
Jan  2 21:37:38  : [] ? __delayacct_freepages_end+0x2e/0x30
Jan  2 21:37:38  : [] ? security_real_capable_noaudit+0x3c/0x70
Jan  2 21:37:38  : [] ? oom_kill_process+0x82/0x2a0
Jan  2 21:37:38  : [] ? select_bad_process+0xe1/0x120
Jan  2 21:37:38  : [] ? out_of_memory+0x220/0x3c0
Jan  2 21:37:38  : [] ? __alloc_pages_nodemask+0x89e/0x940
Jan  2 21:37:38  : [] ? alloc_pages_current+0xaa/0x110
Jan  2 21:37:38  : [] ? __get_free_pages+0xe/0x50
Jan  2 21:37:38  : [] ? __pollwait+0xb4/0xf0
Jan  2 21:37:38  : [] ? eventfd_poll+0x7d/0x80
Jan  2 21:37:38  : [] ? do_sys_poll+0x29b/0x520
Jan  2 21:37:38  : [] ? __pollwait+0x0/0xf0
Jan  2 21:37:38  : [] ? pollwake+0x0/0x60
Jan  2 21:37:38  : [] ? pollwake+0x0/0x60
Jan  2 21:37:38  : [] ? pollwake+0x0/0x60
Jan  2 21:37:38  : [] ? pollwake+0x0/0x60
Jan  2 21:37:38  : [] ? pollwake+0x0/0x60
Jan  2 21:37:38  : [] ? pollwake+0x0/0x60
Jan  2 21:37:38  : [] ? pollwake+0x0/0x60
Jan  2 21:37:38  : [] ? pollwake+0x0/0x60
Jan  2 21:37:38  : [] ? pollwake+0x0/0x60
Jan  2 21:37:38  : [] ? read_tsc+0x9/0x20
Jan  2 21:37:38  : [] ? ktime_get_ts+0xa9/0xe0
Jan  2 21:37:38  : [] ? poll_select_set_timeout+0x8d/0xa0
Jan  2 21:37:38  : [] ? sys_ppoll+0x4c/0x180
Jan  2 21:37:38  : [] ? system_call_fastpath+0x16/0x1b
Jan  2 21:37:38  : Mem-Info:
Jan  2 21:37:38  : Node 0 DMA per-cpu:
Jan  2 21:37:38  : CPU    0: hi:    0, btch:   1 usd:   0
Jan  2 21:37:38  : CPU    1: hi:    0, btch:   1 usd:   0
Jan  2 21:37:38  : CPU    2: hi:    0, btch:   1 usd:   0
Jan  2 21:37:38  : CPU    3: hi:    0, btch:   1 usd:   0
Jan  2 21:37:38  : Node 0 DMA32 per-cpu:
Jan  2 21:37:38  : CPU    0: hi:  186, btch:  31 usd: 175
Jan  2 21:37:38  : CPU    1: hi:  186, btch:  31 usd:  40
Jan  2 21:37:38  : CPU    2: hi:  186, btch:  31 usd: 180
Jan  2 21:37:38  : CPU    3: hi:  186, btch:  31 usd: 152
Jan  2 21:37:38  : Node 0 Normal per-cpu:
Jan  2 21:37:38  : CPU    0: hi:  186, btch:  31 usd: 170
Jan  2 21:37:38  : CPU    1: hi:  186, btch:  31 usd:  67
Jan  2 21:37:38  : CPU    2: hi:  186, btch:  31 usd: 108
Jan  2 21:37:38  : CPU    3: hi:  186, btch:  31 usd:  63
Jan  2 21:37:38  : active_anon:1467089 inactive_anon:263165 isolated_anon:64
Jan  2 21:37:38  : active_file:12404 inactive_file:65792 isolated_file:96
Jan  2 21:37:38  : unevictable:2 dirty:66080 writeback:1 unstable:0
Jan  2 21:37:38  : free:73888 slab_reclaimable:8971 slab_unreclaimable:10661
Jan  2 21:37:38  : mapped:780904 shmem:1035969 pagetables:10566 bounce:0
Jan  2 21:37:38  : Node 0 DMA free:15688kB min:500kB low:624kB high:748kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15284kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
Jan  2 21:37:38  : lowmem_reserve[]: 0 3254 7799 7799
Jan  2 21:37:38  : Node 0 DMA32 free:127608kB min:109180kB low:136472kB high:163768kB active_anon:2209600kB inactive_anon:441980kB active_file:35784kB inactive_file:207948kB unevictable:8kB isolated(anon):0kB isolated(file):256kB present:3333024kB mlocked:8kB dirty:211888kB writeback:0kB mapped:955900kB shmem:1376604kB slab_reclaimable:20276kB slab_unreclaimable:6460kB kernel_stack:488kB pagetables:8856kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:580800 all_unreclaimable? no
Jan  2 21:37:38  : lowmem_reserve[]: 0 0 4545 4545
Jan  2 21:37:38  : Node 0 Normal free:152256kB min:152456kB low:190568kB high:228684kB active_anon:3658756kB inactive_anon:610680kB active_file:13832kB inactive_file:55220kB unevictable:0kB isolated(anon):256kB isolated(file):128kB present:4654080kB mlocked:0kB dirty:52432kB writeback:4kB mapped:2167716kB shmem:2767272kB slab_reclaimable:15608kB slab_unreclaimable:36184kB kernel_stack:3016kB pagetables:33408kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:410848 all_unreclaimable? no
Jan  2 21:37:38  : lowmem_reserve[]: 0 0 0 0
Jan  2 21:37:38  : Node 0 DMA: 4*4kB 1*8kB 1*16kB 3*32kB 1*64kB 1*128kB 0*256kB 0*512kB 1*1024kB 1*2048kB 3*4096kB = 15688kB
Jan  2 21:37:38  : Node 0 DMA32: 240*4kB 925*8kB 1553*16kB 748*32kB 253*64kB 152*128kB 56*256kB 28*512kB 6*1024kB 0*2048kB 0*4096kB = 127608kB
Jan  2 21:37:38  : Node 0 Normal: 1155*4kB 1362*8kB 2962*16kB 860*32kB 330*64kB 113*128kB 23*256kB 18*512kB 9*1024kB 1*2048kB 0*4096kB = 152380kB
Jan  2 21:37:38  : 1162322 total pagecache pages
Jan  2 21:37:38  : 48040 pages in swap cache
Jan  2 21:37:38  : Swap cache stats: add 3148787, delete 3100747, find 1726176/2015458
Jan  2 21:37:38  : Free swap  = 7750696kB
Jan  2 21:37:38  : Total swap = 8388600kB
Jan  2 21:37:38  : 2031600 pages RAM
Jan  2 21:37:38  : 81402 pages reserved
Jan  2 21:37:38  : 1987047 pages shared
Jan  2 21:37:38  : 707527 pages non-shared
Jan  2 21:37:38  : [ pid ]   uid  tgid total_vm      rss cpu oom_adj oom_score_adj name
Jan  2 21:37:38  : [  457]     0   457     2673       90   1     -17         -1000 udevd
Jan  2 21:37:38  : [ 1261]     0  1261    62271      293   0       0             0 rsyslogd
Jan  2 21:37:38  : [ 1275]     0  1275     1171       91   0       0             0 mdadm
Jan  2 21:37:38  : [ 1284]    81  1284     5382      159   1       0             0 dbus-daemon
Jan  2 21:37:38  : [ 1295]    70  1295     6946      160   2       0             0 avahi-daemon
Jan  2 21:37:38  : [ 1296]    70  1296     6914       27   1       0             0 avahi-daemon
Jan  2 21:37:38  : [ 1344]    68  1344     6292      346   0       0             0 hald
Jan  2 21:37:38  : [ 1345]     0  1345     4524      152   0       0             0 hald-runner
Jan  2 21:37:38  : [ 1391]     0  1391     5053       87   2       0             0 hald-addon-inpu
Jan  2 21:37:38  : [ 1406]     0  1406     6014      103   2       0             0 vmware-usbarbit
Jan  2 21:37:38  : [ 1420]     0  1420    16028      119   1     -17         -1000 sshd
Jan  2 21:37:38  : [ 1439]    38  1439     7539      146   0       0             0 ntpd
Jan  2 21:37:38  : [ 1483]     0  1483    29309      135   2       0             0 crond
Jan  2 21:37:38  : [ 1494]     0  1494     5362       47   2       0             0 atd
Jan  2 21:37:38  : [ 1511]   501  1511    25371     5751   1       0             0 Xvnc
Jan  2 21:37:38  : [ 1521]   501  1521    26513       66   0       0             0 sh
Jan  2 21:37:38  : [ 1532]   512  1532    21634      461   0       0             0 Xvnc
Jan  2 21:37:38  : [ 1541]   501  1541     5006       82   1       0             0 dbus-launch
Jan  2 21:37:38  : [ 1542]   501  1542     5382      141   2       0             0 dbus-daemon
Jan  2 21:37:38  : [ 1546]   501  1546     5629      283   0       0             0 xfconfd
Jan  2 21:37:38  : [ 1551]   501  1551    28034       71   3       0             0 gpg-agent
Jan  2 21:37:38  : [ 1561]   501  1561    57668      351   1       0             0 xfce4-session
Jan  2 21:37:38  : [ 1565]   501  1565    49604      290   2       0             0 xfsettingsd
Jan  2 21:37:38  : [ 1567]   501  1567    55385      549   1       0             0 xfwm4
Jan  2 21:37:38  : [ 1569]   501  1569    59719     1148   2       0             0 xfce4-panel
Jan  2 21:37:38  : [ 1571]   501  1571    56642      311   1       0             0 Thunar
Jan  2 21:37:38  : [ 1573]   501  1573    81823      695   3       0             0 xfdesktop
Jan  2 21:37:38  : [ 1582]   501  1582    56799      339   1       0             0 xfce4-settings-
Jan  2 21:37:38  : [ 1584]     0  1584   520550      188   1       0             0 console-kit-dae
Jan  2 21:37:38  : [ 1650]   501  1650    55456      487   3       0             0 panel-6-systray
Jan  2 21:37:38  : [ 1654]   512  1654    26513       65   0       0             0 sh
Jan  2 21:37:38  : [ 1669]   512  1669     5006       68   2       0             0 dbus-launch
Jan  2 21:37:38  : [ 1670]   512  1670     5383      135   0       0             0 dbus-daemon
Jan  2 21:37:38  : [ 1674]   512  1674     5629      264   3       0             0 xfconfd
Jan  2 21:37:38  : [ 1680]   512  1680    28034       70   3       0             0 gpg-agent
Jan  2 21:37:38  : [ 1683]   500  1683    27549     6909   2       0             0 Xvnc
Jan  2 21:37:38  : [ 1694]   512  1694    57667      346   2       0             0 xfce4-session
Jan  2 21:37:38  : [ 1699]   512  1699    55386      461   1       0             0 xfwm4
Jan  2 21:37:38  : [ 1701]   512  1701    66152     1404   2       0             0 xfce4-panel
Jan  2 21:37:38  : [ 1703]   512  1703    56617      235   0       0             0 Thunar
Jan  2 21:37:38  : [ 1705]   512  1705    85537      529   1       0             0 xfdesktop
Jan  2 21:37:38  : [ 1707]   512  1707    49604      285   1       0             0 xfsettingsd
Jan  2 21:37:38  : [ 1715]   512  1715    56799      312   0       0             0 xfce4-settings-
Jan  2 21:37:38  : [ 1717]   512  1717    55456      438   3       0             0 panel-4-systray
Jan  2 21:37:38  : [ 1721]   500  1721    26513       66   0       0             0 sh
Jan  2 21:37:38  : [ 1740]   500  1740     5006       68   1       0             0 dbus-launch
Jan  2 21:37:38  : [ 1741]   500  1741     5383      167   2       0             0 dbus-daemon
Jan  2 21:37:38  : [ 1745]   500  1745     5629      275   1       0             0 xfconfd
Jan  2 21:37:38  : [ 1757]   500  1757    28070      139   3       0             0 gpg-agent
Jan  2 21:37:38  : [ 1758]     0  1758     1542       83   3       0             0 pptpd
Jan  2 21:37:38  : [ 1774]   500  1774    57667      356   0       0             0 xfce4-session
Jan  2 21:37:38  : [ 1779]   500  1779    55674      785   2       0             0 xfwm4
Jan  2 21:37:38  : [ 1781]   500  1781    65790     1363   1       0             0 xfce4-panel
Jan  2 21:37:38  : [ 1783]   500  1783    82194      451   0       0             0 Thunar
Jan  2 21:37:38  : [ 1785]   500  1785    85642      813   2       0             0 xfdesktop
Jan  2 21:37:38  : [ 1790]   500  1790    49604      283   2       0             0 xfsettingsd
Jan  2 21:37:38  : [ 1800]   500  1800    38863      313   3       0             0 xterm
Jan  2 21:37:38  : [ 1807]   500  1807    56798      353   3       0             0 xfce4-settings-
Jan  2 21:37:38  : [ 1808]   500  1808    55456      470   1       0             0 panel-6-systray
Jan  2 21:37:38  : [ 1811]   500  1811    27074       69   2       0             0 bash
Jan  2 21:37:38  : [ 1823]     0  1823     4704      145   0       0             0 smartd
Jan  2 21:37:38  : [ 1831]     0  1831     1014       48   2       0             0 mingetty
Jan  2 21:37:38  : [ 1833]     0  1833     1014       48   0       0             0 mingetty
Jan  2 21:37:38  : [ 1835]     0  1835     1014       48   1       0             0 mingetty
Jan  2 21:37:38  : [ 1837]     0  1837     1014       48   2       0             0 mingetty
Jan  2 21:37:38  : [ 1839]     0  1839     1014       49   3       0             0 mingetty
Jan  2 21:37:38  : [ 1843]     0  1843     1014       48   0       0             0 mingetty
Jan  2 21:37:38  : [ 2025]     0  2025    25340       59   2       0             0 vmnet-bridge
Jan  2 21:37:38  : [ 2033]     0  2033    25333       15   1       0             0 vmnet-netifup
Jan  2 21:37:38  : [ 2058]     0  2058    27069      101   0       0             0 vmnet-natd
Jan  2 21:37:38  : [ 2060]     0  2060    25333       15   1       0             0 vmnet-netifup
Jan  2 21:37:38  : [ 2097]     0  2097    30105       82   2       0             0 vmware-authdlau
Jan  2 21:37:38  : [ 2981]   500  2981    36335       76   0       0             0 su
Jan  2 21:37:38  : [ 2984]     0  2984    27074      233   1       0             0 bash
Jan  2 21:37:38  : [ 6347]   500  6347    39207      406   2       0             0 xterm
Jan  2 21:37:38  : [ 6349]   500  6349    27074       70   0       0             0 bash
Jan  2 21:37:38  : [ 6407]   500  6407    36335       77   0       0             0 su
Jan  2 21:37:38  : [ 6410]     0  6410    27074      251   0       0             0 bash
Jan  2 21:37:38  : [ 6481]     0  6481    57857      154   0       0             0 mysql
Jan  2 21:37:38  : [ 6911]     0  6911    19820      120   1       0             0 master
Jan  2 21:37:38  : [ 6914]    89  6914    19889      122   0       0             0 qmgr
Jan  2 21:37:38  : [ 6918]    89  6918    19839      141   0       0             0 tlsmgr
Jan  2 21:37:38  : [17572]     0 17572   103460     2142   3       0             0 Thunar
Jan  2 21:37:38  : [21227]   500 21227    38801      594   0       0             0 xterm
Jan  2 21:37:38  : [21229]   500 21229    27074       73   0       0             0 bash
Jan  2 21:37:38  : [29713]   500 29713    36870      214   3       0             0 lftp
Jan  2 21:37:38  : [32170]   500 32170    38815      184   0       0             0 xterm
Jan  2 21:37:38  : [32172]   500 32172    27074       77   1       0             0 bash
Jan  2 21:37:38  : [32189]   500 32189    36335       86   1       0             0 su
Jan  2 21:37:38  : [32197]     0 32197    27074       93   1       0             0 bash
Jan  2 21:37:38  : [16025]     0 16025     2070       89   3       0             0 pptpctrl
Jan  2 21:37:38  : [16026]     0 16026     5544      108   1       0             0 pppd
Jan  2 21:37:38  : [31174]     0 31174    27073      175   1       0             0 mysqld_safe
Jan  2 21:37:38  : [31909]    27 31909  1143356   587238   1       0             0 mysqld
Jan  2 21:37:38  : [32037]     0 32037    26546      169   0       0             0 mysqld_safe
Jan  2 21:37:38  : [32437]   495 32437   136524     7673   1       0             0 mysqld
Jan  2 21:37:38  : [32449]     0 32449    26546      169   2       0             0 mysqld_safe
Jan  2 21:37:38  : [  368]   493   368   211813     3831   0       0             0 mysqld
Jan  2 21:37:38  : [  884]   500   884    27074      310   1       0             0 bash
Jan  2 21:37:38  : [ 1065]   501  1065   122130     2881   3       0             0 vmplayer
Jan  2 21:37:38  : [ 2031]   500  2031    38570      281   0       0             0 xterm
Jan  2 21:37:38  : [ 2034]   500  2034    27074      180   0       0             0 bash
Jan  2 21:37:38  : [ 2051]   500  2051    36335      140   0       0             0 su
Jan  2 21:37:38  : [ 2055]     0  2055    27074      181   2       0             0 bash
Jan  2 21:37:38  : [16591]   501 16591    77851      712   3       0             0 vmware-unity-he
Jan  2 21:37:38  : [16803]     0 16803    26883      237   1       0             0 watch
Jan  2 21:37:38  : [19635]   501 19635  1693624   793343   1       0             0 vmware-vmx
Jan  2 21:37:38  : [ 2186]     0  2186    38139      158   0       0             0 proftpd
Jan  2 21:37:38  : [ 5289]   500  5289    38992      979   3       0             0 xterm
Jan  2 21:37:38  : [ 5291]   500  5291    27074      188   2       0             0 bash
Jan  2 21:37:38  : [ 5344]   500  5344    36335      148   1       0             0 su
Jan  2 21:37:38  : [ 5361]     0  5361    27074      350   0       0             0 bash
Jan  2 21:37:38  : [18529]   500 18529    26514      227   0       0             0 mysql-workbench
Jan  2 21:37:38  : [18534]   500 18534    26514      226   1       0             0 catchsegv
Jan  2 21:37:38  : [18536]   500 18536    26514       77   2       0             0 catchsegv
Jan  2 21:37:38  : [18537]   500 18537   227088     7571   2       0             0 mysql-workbench
Jan  2 21:37:38  : [  409]     0   409   131527     1556   1       0             0 geany
Jan  2 21:37:38  : [  410]     0   410     2054       92   2       0             0 gnome-pty-helpe
Jan  2 21:37:38  : [  411]     0   411    27074      238   1       0             0 bash
Jan  2 21:37:38  : [ 5750]     0  5750     2672       92   0     -17         -1000 udevd
Jan  2 21:37:38  : [ 5753]     0  5753     2672       87   0     -17         -1000 udevd
Jan  2 21:37:38  : [ 5788]     0  5788    10640      594   3       0             0 openvpn
Jan  2 21:37:38  : [ 5792]     0  5792    10640      598   3       0             0 openvpn
Jan  2 21:37:38  : [ 5800]    99  5800    11135      587   3       0             0 openvpn
Jan  2 21:37:38  : [21552]     0 21552   110137     2111   1       0             0 httpd
Jan  2 21:37:38  : [21555]    48 21555   139593     7684   3       0             0 httpd
Jan  2 21:37:38  : [21558]    48 21558   140002     8513   3       0             0 httpd
Jan  2 21:37:38  : [23283]   497 23283     9846      193   1       0             0 dkim-filter
Jan  2 21:37:38  : [23284]   497 23284    33979      524   1       0             0 dkim-filter
Jan  2 21:37:38  : [ 6819]     0  6819     2070      152   3       0             0 pptpctrl
Jan  2 21:37:38  : [ 6820]     0  6820     5544      237   1       0             0 pppd
Jan  2 21:37:39  : [17208]    48 17208   112903     4566   0       0             0 httpd
Jan  2 21:37:39  : [17209]    48 17209   138359     5895   0       0             0 httpd
Jan  2 21:37:39  : [17210]    48 17210   138693     7341   3       0             0 httpd
Jan  2 21:37:39  : [ 1255]     0  1255    24571      713   0       0             0 sshd
Jan  2 21:37:39  : [ 1278]     0  1278    13874      396   1       0             0 sftp-server
Jan  2 21:37:39  : [14064]    48 14064   138202     6622   3       0             0 httpd
Jan  2 21:37:39  : [14065]    48 14065   139625     7776   1       0             0 httpd
Jan  2 21:37:39  : [16899]    48 16899   138543     7523   3       0             0 httpd
Jan  2 21:37:39  : [32639]    89 32639    19924      722   0       0             0 pickup
Jan  2 21:37:39  : [ 4973]    48  4973   136179     4973   3       0             0 httpd
Jan  2 21:37:39  : [ 4976]    48  4976   138478     7371   0       0             0 httpd
Jan  2 21:37:39  : [ 4977]    48  4977   136173     4777   3       0             0 httpd
Jan  2 21:37:39  : [ 5662]     0  5662    35030      336   0       0             0 crond
Jan  2 21:37:39  : [ 5663]     0  5663     2297      282   2       0             0 sh
Jan  2 21:37:39  : [ 5664]     0  5664     2298      302   0       0             0 bash
Jan  2 21:37:39  : [ 5665]     0  5665    15910      437   0       0             0 mutt
Jan  2 21:37:39  : [ 5947]     0  5947     2298      337   1       0             0 bash
Jan  2 21:37:39  : [ 6416]    48  6416   110170     2070   3       0             0 httpd
Jan  2 21:37:39  : [ 6625]    48  6625   110170     1895   0       0             0 httpd
Jan  2 21:37:39  : [ 6642]     0  6642    32679     1632   2       0             0 mysqldump
Jan  2 21:37:39  : Out of memory: Kill process 19635 (vmware-vmx) score 199 or sacrifice child
Jan  2 21:37:39  : Killed process 19635, UID 501, (vmware-vmx) total-vm:6774496kB, anon-rss:74020kB, file-rss:3099352kB


Leo
  • 71
  • 1
  • 2

1 Answers1

7

Well, I think your min_free_kbytes is really high. I have a 16GB machine and my min is 67584kB.

Note that vmware's ram counts as cache, because of the mmap-ed vmem

Thats not always true. Only if the mmapped() file is opened in MAP_SHARED is that true. Else dirty pages are swap-backed. Which is the case for you it seems. If you add up the reported usage of that process given at the bottom of your output and convert it into pages (4k). It equals the RSS reported in the task dump for that process.

rss:74020kB, file-rss:3099352kB
74020 + 3099352 = 3173372
3173372 / 4 = 793343

is equal to ..

[19635]   501 19635  1693624   793343   1       0        0 vmware-vmx

As for why you OOM-kill. Well, thats a little bit more tricky.

When you reach min the kernel wants to recover memory up to high watermark bytes. The kernel thus has a check; if the amount of memory available to reclaim from the file cache will not be sufficient to put you back into the high watermark of that zone, it wont bother freeing file cache and go straight to reclaiming from anonymous memory.

We never reclaim from active. So -

if (file_inactive > zone_high - free_mem) then
   reclaim (zone_high - free_mem) file inactive pages
else
   reclaim from anonymous pool

In you're case that is 55220 is not greater than 228684-152456 (76428).

The reason this is an OOM-Kill and not swapping is because when you breach the min watermark the kernel goes into a direct_reclaim mode. In this mode, doing IO to free memory cannot be accomplished because it can cause a deadlock.

You're host would have been swapping at the time, but you're host has been allocating faster than it can swap out.

The best way to fix this would be to reduce your min watermark to something lower -- or better still get more memory and/or reduce the amount of things you run on the machine.

Matthew Ife
  • 22,927
  • 2
  • 54
  • 71