2

I have a 1GB RAM Linode on a CentOS LAMP running Drupal Aegir with just 3 Open Atrium sites, and about 10-20 concurrent users. At random times, it pukes an Out of Memory kill. I can't figure out what's causing it. I'm not sure if i need to do some memory usage tweaking on my CentOS LAMP stack. It appears Apache and/or PHP is causing problems. I have an MPM prefork on Apache. I need to get this under control quickly.. After a couple hours of a crash and burn OOM, here's some info. To my eye, it doesn't look like anything is wrong... Here's a bunch of juicy details. I'm hoping someone smarter than me can shed some light. I've included, config and performance info below. First the Linode OOM kills...

OOM Kill#1 screenshot

http://i1099.photobucket.com/albums/g396/awhomer/screenshots/OOM1.png

OOM Kill#2 screenshot

http://i1099.photobucket.com/albums/g396/awhomer/screenshots/OMM2.png

Type of MPM in use by Apache

httpd -V | grep 'MPM'
Server MPM: Prefork
-D APACHE_MPM_DIR="server/mpm/prefork"

Current settings in my httpd.conf

<IfModule prefork.c>
StartServers 8
MinSpareServers 5
MaxSpareServers 20
ServerLimit 256
MaxClients 256
MaxRequestsPerChild 4000

Now, here are my running processes

ps aux

USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root         1  0.0  0.0   2208   568 ?        Ss   10:20   0:01 init [3]     
root         2  0.0  0.0      0     0 ?        S    10:20   0:00 [kthreadd]
root         3  0.0  0.0      0     0 ?        S    10:20   0:00 [ksoftirqd/0]
root         5  0.0  0.0      0     0 ?        S    10:20   0:00 [kworker/u:0]
root         6  0.0  0.0      0     0 ?        S    10:20   0:00 [migration/0]
root         7  0.0  0.0      0     0 ?        S    10:20   0:00 [migration/1]
root         9  0.0  0.0      0     0 ?        S    10:20   0:00 [ksoftirqd/1]
root        10  0.0  0.0      0     0 ?        S    10:20   0:00 [migration/2]
root        12  0.0  0.0      0     0 ?        S    10:20   0:00 [ksoftirqd/2]
root        13  0.0  0.0      0     0 ?        S    10:20   0:00 [migration/3]
root        15  0.0  0.0      0     0 ?        S    10:20   0:00 [ksoftirqd/3]
root        16  0.0  0.0      0     0 ?        S<   10:20   0:00 [cpuset]
root        17  0.0  0.0      0     0 ?        S<   10:20   0:00 [khelper]
root        18  0.0  0.0      0     0 ?        S    10:20   0:00 [kdevtmpfs]
root        19  0.0  0.0      0     0 ?        S    10:20   0:00 [kworker/u:1]
root        21  0.0  0.0      0     0 ?        S    10:20   0:00 [xenwatch]
root        22  0.0  0.0      0     0 ?        S    10:20   0:00 [xenbus]
root       162  0.0  0.0      0     0 ?        S    10:20   0:00 [sync_supers]
root       164  0.0  0.0      0     0 ?        S    10:20   0:00 [bdi-default]
root       166  0.0  0.0      0     0 ?        S<   10:20   0:00 [kblockd]
root       174  0.0  0.0      0     0 ?        S    10:20   0:00 [kworker/3:1]
root       178  0.0  0.0      0     0 ?        S<   10:20   0:00 [md]
root       262  0.0  0.0      0     0 ?        S<   10:20   0:00 [rpciod]
root       275  0.0  0.0      0     0 ?        S    10:20   0:01 [kswapd0]
root       276  0.0  0.0      0     0 ?        SN   10:20   0:00 [ksmd]
root       277  0.0  0.0      0     0 ?        S    10:20   0:00 [fsnotify_mark]
root       281  0.0  0.0      0     0 ?        S    10:20   0:00 [ecryptfs-kthr]
root       283  0.0  0.0      0     0 ?        S<   10:20   0:00 [nfsiod]
root       284  0.0  0.0      0     0 ?        S<   10:20   0:00 [cifsiod]
root       287  0.0  0.0      0     0 ?        S    10:20   0:00 [jfsIO]
root       288  0.0  0.0      0     0 ?        S    10:20   0:00 [jfsCommit]
root       289  0.0  0.0      0     0 ?        S    10:20   0:00 [jfsCommit]
root       290  0.0  0.0      0     0 ?        S    10:20   0:00 [jfsCommit]
root       291  0.0  0.0      0     0 ?        S    10:20   0:00 [jfsCommit]
root       292  0.0  0.0      0     0 ?        S    10:20   0:00 [jfsSync]
root       293  0.0  0.0      0     0 ?        S<   10:20   0:00 [xfsalloc]
root       294  0.0  0.0      0     0 ?        S<   10:20   0:00 [xfs_mru_cache]
root       295  0.0  0.0      0     0 ?        S<   10:20   0:00 [xfslogd]
root       296  0.0  0.0      0     0 ?        S<   10:20   0:00 [glock_workque]
root       297  0.0  0.0      0     0 ?        S<   10:20   0:00 [delete_workqu]
root       298  0.0  0.0      0     0 ?        S<   10:20   0:00 [gfs_recovery]
root       299  0.0  0.0      0     0 ?        S<   10:20   0:00 [crypto]
root       862  0.0  0.0      0     0 ?        S    10:20   0:00 [khvcd]
root       978  0.0  0.0      0     0 ?        S<   10:20   0:00 [kpsmoused]
root       979  0.0  0.0      0     0 ?        S    10:20   0:01 [kworker/1:1]
root       982  0.0  0.0      0     0 ?        S    10:20   0:00 [kworker/2:1]
root      1017  0.0  0.0      0     0 ?        S<   10:20   0:00 [deferwq]
root      1020  0.0  0.0      0     0 ?        S    10:20   0:00 [kjournald]
root      1044  0.0  0.0      0     0 ?        S    10:20   0:00 [kauditd]
root      1077  0.0  0.0   2424   364 ?        S<s  10:20   0:00 /sbin/udevd -d
root      2734  0.0  0.0      0     0 ?        S    10:20   0:00 [flush-202:0]
root      2780  0.0  0.0   2452    40 ?        Ss   10:20   0:00 /sbin/dhclient
root      2847  0.0  0.0  10624   464 ?        S<sl 10:20   0:00 auditd
root      2849  0.0  0.0  11184   572 ?        S<sl 10:20   0:00 /sbin/audispd
root      2869  0.0  0.0   1964   628 ?        Ss   10:20   0:00 syslogd -m 0
root      2872  0.0  0.0   1808   292 ?        Ss   10:20   0:00 klogd -x
named     2913  0.0  0.1  58936  1752 ?        Ssl  10:20   0:00 /usr/sbin/named
dbus      2935  0.0  0.0   2896   808 ?        Ss   10:20   0:00 dbus-daemon --s
root      2971  0.0  0.0  23268   828 ?        Ssl  10:20   0:01 automount
root      2990  0.0  0.0   7256   748 ?        Ss   10:20   0:00 /usr/sbin/sshd
ntp       3004  0.0  0.4   4548  4544 ?        SLs  10:20   0:01 ntpd -u ntp:ntp
root      3015  0.0  0.0   5344   176 ?        Ss   10:20   0:00 /usr/sbin/vsftp
root      3051  0.0  0.0   4676   956 ?        S    10:20   0:00 /bin/sh /usr/bi
mysql     3143 13.5  1.2 124592 12888 ?        Sl   10:20  56:33 /usr/libexec/my
root      3181  0.0  0.0   9372  1020 ?        Ss   10:20   0:00 sendmail: accep
smmsp     3189  0.0  0.1   8280  1152 ?        Ss   10:20   0:00 sendmail: Queue
root      3198  0.0  0.0   2044   224 ?        Ss   10:20   0:00 gpm -m /dev/inp
root      3215  0.0  0.1   5384  1092 ?        Ss   10:21   0:00 crond
xfs       3233  0.0  0.0   3308   780 ?        Ss   10:21   0:00 xfs -droppriv -
root      3349  0.0  0.0   2408   332 ?        Ss   10:21   0:00 /usr/sbin/atd
root      3372  0.0  1.0  26696 10704 ?        SN   10:21   0:00 /usr/bin/python
root      3374  0.0  0.0   2704   832 ?        SN   10:21   0:01 /usr/libexec/ga
root      3375  0.0  1.3  19420 13676 ?        Ss   10:21   0:02 /usr/bin/perl /
root      3378  0.0  0.0   1792   428 hvc0     Ss+  10:21   0:00 /sbin/mingetty
apache    5161  0.1  3.5  53992 36252 ?        S    14:28   0:11 /usr/sbin/httpd
apache    5162  0.0  3.5  53880 36104 ?        S    14:28   0:09 /usr/sbin/httpd
apache    5163  0.1  3.5  54128 36424 ?        S    14:28   0:13 /usr/sbin/httpd
root     18629  0.0  0.9  27828  9596 ?        Ss   12:09   0:01 /usr/sbin/httpd
apache   18631  0.0  3.4  53064 35476 ?        S    12:09   0:15 /usr/sbin/httpd
apache   18632  0.0  3.5  53636 35984 ?        S    12:09   0:15 /usr/sbin/httpd
apache   18633  0.1  3.4  53340 35816 ?        S    12:09   0:19 /usr/sbin/httpd
apache   18634  0.1  3.6  54936 37544 ?        S    12:09   0:20 /usr/sbin/httpd
apache   18635  0.0  3.5  53928 36328 ?        S    12:09   0:14 /usr/sbin/httpd
apache   18636  0.1  3.4  53080 35636 ?        S    12:09   0:20 /usr/sbin/httpd
apache   18637  0.0  3.4  53072 35364 ?        S    12:09   0:12 /usr/sbin/httpd
apache   18638  0.0  3.5  53680 36336 ?        S    12:09   0:15 /usr/sbin/httpd
apache   18751  0.1  3.4  53492 35924 ?        S    12:10   0:22 /usr/sbin/httpd
root     19122  0.0  0.0      0     0 ?        S    16:08   0:00 [kworker/3:2]
root     21015  0.0  0.0      0     0 ?        S    16:22   0:00 [kworker/2:2]
root     22764  0.0  0.0      0     0 ?        S    16:36   0:00 [kworker/0:2]
apache   23494  0.1  3.5  53884 36288 ?        S    12:45   0:17 /usr/sbin/httpd
apache   23498  0.1  4.1  60572 42756 ?        S    12:45   0:19 /usr/sbin/httpd
root     23996  0.0  0.0      0     0 ?        S    16:44   0:00 [kworker/1:0]
root     27059  0.0  0.2  10108  2940 ?        Rs   17:06   0:00 sshd: root@pts/
root     27168  0.0  0.1   4812  1456 pts/0    Ss   17:07   0:00 -bash
root     27464  0.0  0.0      0     0 ?        S    17:09   0:00 [kworker/0:1]
root     28565  0.0  0.0   4400   928 pts/0    R+   17:17   0:00 ps aux

Free Usage stats

free -m

              total       used       free     shared    buffers     cached
Mem:          1003        655        347          0         29        151
-/+ buffers/cache:        474        528
Swap:          511         13        498

List of running processes sorted by memory use

ps -eo pmem,pcpu,rss,vsize,args | sort -k 1 -r | less

%MEM %CPU   RSS    VSZ COMMAND
4.1  0.1 42756  60572 /usr/sbin/httpd
3.6  0.1 37544  54936 /usr/sbin/httpd
3.5  0.1 36424  54128 /usr/sbin/httpd
3.5  0.1 36288  53884 /usr/sbin/httpd
3.5  0.1 36252  53992 /usr/sbin/httpd
3.5  0.0 36336  53680 /usr/sbin/httpd
3.5  0.0 36328  53928 /usr/sbin/httpd
3.5  0.0 36104  53880 /usr/sbin/httpd
3.5  0.0 35984  53636 /usr/sbin/httpd
3.4  0.1 35924  53492 /usr/sbin/httpd
3.4  0.1 35816  53340 /usr/sbin/httpd
3.4  0.1 35636  53080 /usr/sbin/httpd
3.4  0.0 35628  53328 /usr/sbin/httpd
3.4  0.0 35476  53064 /usr/sbin/httpd
1.3 13.5 13792 125496 /usr/libexec/mysqld --basedir=/usr --datadir=/var/lib/mysql --user=mysql --log-error=/var/log/mysqld.log --pid-file=/var/run/mysqld/mysqld.pid --socket=/var/lib/mysql/mysql.sock
1.3  0.0 13676  19420 /usr/bin/perl /usr/libexec/webmin/miniserv.pl /etc/webmin/miniserv.conf
1.0  0.0 10708  26696 /usr/bin/python -tt /usr/sbin/yum-updatesd
0.9  0.0  9596  27828 /usr/sbin/httpd
0.4  0.0  4544   4548 ntpd -u ntp:ntp -p /var/run/ntpd.pid -g

iostat

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          6.72    0.03    2.39    0.50    0.49   89.89

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
xvda              6.00       138.91       123.01    3397138    3008160
xvdb              0.06         0.15         1.11       3576      27040

iostat -d -x 2 5

Device:         rrqm/s   wrqm/s   r/s   w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
xvda              0.07    12.07  2.72  3.27   138.36   122.77    43.61     0.38   62.71   4.77   2.86
xvdb              0.00     0.09  0.01  0.05     0.15     1.10    20.51     0.00   48.04   3.10   0.02

Device:         rrqm/s   wrqm/s   r/s   w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
xvda              0.00     0.00  0.00  0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
xvdb              0.00     0.00  0.00  0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00

Device:         rrqm/s   wrqm/s   r/s   w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
xvda              0.00     0.00  0.00  0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
xvdb              0.00     0.00  0.00  0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00

Device:         rrqm/s   wrqm/s   r/s   w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
xvda              0.00     0.00  0.00  0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
xvdb              0.00     0.00  0.00  0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00

Device:         rrqm/s   wrqm/s   r/s   w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
xvda              0.00     0.00  3.00  0.00    36.00     0.00    12.00     0.02    6.00   6.00   1.80
xvdb              0.00     0.00  0.00  0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
Michael Hampton
  • 237,123
  • 42
  • 477
  • 940
user656002
  • 31
  • 1
  • 4
  • If PHP and/or Apache use a huge amount of memory, this is the application running within the LAMP stack in the most cases. I think this is a problem with Drupal, not Apache or PHP – Thomas Berger Oct 12 '12 at 20:10

2 Answers2

2

I had an almost identical problem - 1GB linode, apache, mysql, php, Yii (rather than Drupal), and Ubuntu rather than CentOS.

I suggest you check your mysql error log to make sure you don't have any tables that need repaired. Additionally, try turning on the mysql slow query log. In my case it ended up that spiders were occasionally crawling a couple of pages that had very slow queries (somehow the indexes had gotten whacked) ... eventually the OOM killer would kick in. While this may not be the cause of your issue, and you may have corrected it by now, it's an easy thing to check.

Worse comes to worse, you can use oom_adjust to at least mostly control which procs get whacked by oom killer, which does lessen the issue usually.

1

Disable the OOM Killer, then debug normally (look for the fattest processes, or ones that grow over time. Watch what they do to locate the memory leak, then correct it.

It is entirely possible (I would say probable) that the OOM Killer is doing the Wrong Thing, but you'll discover whether or not that's true when you can do some debugging without having processes killed out from under you.


Disclaimer: I believe the OOM Killer is THE WRONG THING in general.

POSIX is explicit in stating that it is possible for malloc() to fail (returning NULL), and that programs calling malloc() should expect such a failure and deal with that appropriately. The OOM Killer at its core is a crutch that tries to make one program's malloc() calls succeed by arbitrarily killing another program -- seems to me the guy that had the memory first should keep it, and the interloper should be told their request can't be satisfied.

voretaq7
  • 79,345
  • 17
  • 128
  • 213