I have a new server that has been frequently raising Out of memory
errors.
This has resulted in a several processes being killed by oom-killer
, and I can't work out why.
It's on AWS (EC2), with 4GB RAM (t2.medium), running a simple Apache web server with MySQL and PHP, on Ubuntu 16.04.2 LTS.
I've added a 4GB SWAP partition, using an EBS Volume (GP2).
The amount of available memory seems fine:
free -h
total used free shared buff/cache available
Mem: 3.9G 201M 49M 13M 3.6G 3.6G
Swap: 4.0G 79M 3.9G
This does not seem to have changed much throughout the day (highest used
amount from randomly checking was 250M).
So far I have not changed the vm.overcommit_
settings, as I'd like to know what the problem is before doing this:
vm.overcommit_kbytes = 0
vm.overcommit_memory = 0
vm.overcommit_ratio = 50
The output from dmesg -T
is:
[Mon Feb 13 13:09:00 2017] sessionclean invoked oom-killer: gfp_mask=0x26000c0, order=2, oom_score_adj=0
[Mon Feb 13 13:09:00 2017] sessionclean cpuset=/ mems_allowed=0
[Mon Feb 13 13:09:00 2017] CPU: 1 PID: 22353 Comm: sessionclean Not tainted 4.4.0-59-generic #80-Ubuntu
[Mon Feb 13 13:09:00 2017] Hardware name: Xen HVM domU, BIOS 4.2.amazon 12/09/2016
[Mon Feb 13 13:09:00 2017] 0000000000000286 0000000031d57236 ffff88005ce33af0 ffffffff813f7583
[Mon Feb 13 13:09:00 2017] ffff88005ce33cc8 ffff880107ac0000 ffff88005ce33b60 ffffffff8120ad5e
[Mon Feb 13 13:09:00 2017] ffffffff81cd2dc7 0000000000000000 ffffffff81e67760 0000000000000206
[Mon Feb 13 13:09:00 2017] Call Trace:
[Mon Feb 13 13:09:00 2017] [<ffffffff813f7583>] dump_stack+0x63/0x90
[Mon Feb 13 13:09:00 2017] [<ffffffff8120ad5e>] dump_header+0x5a/0x1c5
[Mon Feb 13 13:09:00 2017] [<ffffffff81192722>] oom_kill_process+0x202/0x3c0
[Mon Feb 13 13:09:00 2017] [<ffffffff811920ee>] ? oom_unkillable_task+0x9e/0xd0
[Mon Feb 13 13:09:00 2017] [<ffffffff81192b49>] out_of_memory+0x219/0x460
[Mon Feb 13 13:09:00 2017] [<ffffffff81198abd>] __alloc_pages_slowpath.constprop.88+0x8fd/0xa70
[Mon Feb 13 13:09:00 2017] [<ffffffff81198eb6>] __alloc_pages_nodemask+0x286/0x2a0
[Mon Feb 13 13:09:00 2017] [<ffffffff81198f6b>] alloc_kmem_pages_node+0x4b/0xc0
[Mon Feb 13 13:09:00 2017] [<ffffffff8107ea5e>] copy_process+0x1be/0x1b70
[Mon Feb 13 13:09:00 2017] [<ffffffff811c1670>] ? handle_mm_fault+0xce0/0x1820
[Mon Feb 13 13:09:00 2017] [<ffffffff810805a0>] _do_fork+0x80/0x360
[Mon Feb 13 13:09:00 2017] [<ffffffff81080929>] SyS_clone+0x19/0x20
[Mon Feb 13 13:09:00 2017] [<ffffffff818384f2>] entry_SYSCALL_64_fastpath+0x16/0x71
[Mon Feb 13 13:09:00 2017] Mem-Info:
[Mon Feb 13 13:09:00 2017] active_anon:17107 inactive_anon:20026 isolated_anon:0
active_file:424384 inactive_file:433967 isolated_file:0
unevictable:914 dirty:36 writeback:0 unstable:0
slab_reclaimable:71336 slab_unreclaimable:6907
mapped:12390 shmem:3580 pagetables:4224 bounce:0
free:25546 free_pcp:0 free_cma:0
[Mon Feb 13 13:09:00 2017] Node 0 DMA free:15872kB min:28kB low:32kB high:40kB active_anon:0kB inactive_anon:0kB active_file:8kB inactive_file:4kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15988kB managed:15904kB mlocked:0kB dirty:0kB writeback:0kB mapped:8kB shmem:0kB slab_reclaimable:20kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
[Mon Feb 13 13:09:00 2017] lowmem_reserve[]: 0 3745 3934 3934 3934
[Mon Feb 13 13:09:00 2017] Node 0 DMA32 free:66512kB min:7528kB low:9408kB high:11292kB active_anon:63604kB inactive_anon:72976kB active_file:1691880kB inactive_file:1723756kB unevictable:3244kB isolated(anon):0kB isolated(file):0kB present:3915776kB managed:3835152kB mlocked:3244kB dirty:124kB writeback:0kB mapped:48740kB shmem:13476kB slab_reclaimable:156484kB slab_unreclaimable:23920kB kernel_stack:2784kB pagetables:15124kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
[Mon Feb 13 13:09:00 2017] lowmem_reserve[]: 0 0 189 189 189
[Mon Feb 13 13:09:00 2017] Node 0 Normal free:19800kB min:380kB low:472kB high:568kB active_anon:4824kB inactive_anon:7128kB active_file:5648kB inactive_file:12108kB unevictable:412kB isolated(anon):0kB isolated(file):0kB present:393216kB managed:193908kB mlocked:412kB dirty:20kB writeback:0kB mapped:812kB shmem:844kB slab_reclaimable:128840kB slab_unreclaimable:3708kB kernel_stack:816kB pagetables:1772kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
[Mon Feb 13 13:09:00 2017] lowmem_reserve[]: 0 0 0 0 0
[Mon Feb 13 13:09:00 2017] Node 0 DMA: 4*4kB (MEH) 4*8kB (MEH) 1*16kB (H) 2*32kB (ME) 4*64kB (UME) 3*128kB (UME) 3*256kB (UME) 2*512kB (ME) 3*1024kB (MEH) 1*2048kB (E) 2*4096kB (M) = 15872kB
[Mon Feb 13 13:09:00 2017] Node 0 DMA32: 16314*4kB (UME) 172*8kB (UME) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 66632kB
[Mon Feb 13 13:09:00 2017] Node 0 Normal: 4161*4kB (UE) 147*8kB (UME) 0*16kB 0*32kB 1*64kB (H) 1*128kB (H) 1*256kB (H) 1*512kB (H) 1*1024kB (H) 0*2048kB 0*4096kB = 19804kB
[Mon Feb 13 13:09:00 2017] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[Mon Feb 13 13:09:00 2017] 864746 total pagecache pages
[Mon Feb 13 13:09:00 2017] 2212 pages in swap cache
[Mon Feb 13 13:09:00 2017] Swap cache stats: add 62420, delete 60208, find 121128/130192
[Mon Feb 13 13:09:00 2017] Free swap = 4103680kB
[Mon Feb 13 13:09:00 2017] Total swap = 4194300kB
[Mon Feb 13 13:09:00 2017] 1081245 pages RAM
[Mon Feb 13 13:09:00 2017] 0 pages HighMem/MovableOnly
[Mon Feb 13 13:09:00 2017] 70004 pages reserved
[Mon Feb 13 13:09:00 2017] 0 pages cma reserved
[Mon Feb 13 13:09:00 2017] 0 pages hwpoisoned
[Mon Feb 13 13:09:00 2017] [ pid ] uid tgid total_vm rss nr_ptes nr_pmds swapents oom_score_adj name
[Mon Feb 13 13:09:00 2017] [ 379] 0 379 10183 1130 22 3 771 0 systemd-journal
[Mon Feb 13 13:09:00 2017] [ 435] 0 435 25742 48 18 3 0 0 lvmetad
[Mon Feb 13 13:09:00 2017] [ 478] 0 478 10704 531 23 3 247 -1000 systemd-udevd
[Mon Feb 13 13:09:00 2017] [ 906] 0 906 4030 662 11 3 27 0 dhclient
[Mon Feb 13 13:09:00 2017] [ 1032] 0 1032 1306 30 8 3 0 0 iscsid
[Mon Feb 13 13:09:00 2017] [ 1033] 0 1033 1431 877 8 3 0 -17 iscsid
[Mon Feb 13 13:09:00 2017] [ 1035] 0 1035 1100 308 8 3 0 0 acpid
[Mon Feb 13 13:09:00 2017] [ 1037] 104 1037 64099 599 27 3 196 0 rsyslogd
[Mon Feb 13 13:09:00 2017] [ 1043] 0 1043 58710 315 17 4 41 0 lxcfs
[Mon Feb 13 13:09:00 2017] [ 1047] 107 1047 10724 334 24 3 25 -900 dbus-daemon
[Mon Feb 13 13:09:00 2017] [ 1055] 0 1055 6511 409 19 3 25 0 atd
[Mon Feb 13 13:09:00 2017] [ 1065] 0 1065 71269 64 41 3 2782 0 accounts-daemon
[Mon Feb 13 13:09:00 2017] [ 1069] 0 1069 7155 502 18 3 27 0 systemd-logind
[Mon Feb 13 13:09:00 2017] [ 1071] 0 1071 16380 545 36 3 153 -1000 sshd
[Mon Feb 13 13:09:00 2017] [ 1073] 0 1073 6932 502 19 3 38 0 cron
[Mon Feb 13 13:09:00 2017] [ 1101] 0 1101 3344 51 11 3 1 0 mdadm
[Mon Feb 13 13:09:00 2017] [ 1114] 0 1114 69272 148 39 3 45 0 polkitd
[Mon Feb 13 13:09:00 2017] [ 1202] 0 1202 4868 428 14 3 37 0 irqbalance
[Mon Feb 13 13:09:00 2017] [ 1221] 113 1221 27509 437 24 3 97 0 ntpd
[Mon Feb 13 13:09:00 2017] [ 1236] 0 1236 3619 402 11 3 0 0 agetty
[Mon Feb 13 13:09:00 2017] [ 1238] 0 1238 3665 361 12 3 0 0 agetty
[Mon Feb 13 13:09:00 2017] [ 1389] 0 1389 100324 4842 154 3 1167 0 apache2
[Mon Feb 13 13:09:00 2017] [ 7047] 114 7047 88165 925 64 3 437 0 opendkim
[Mon Feb 13 13:09:00 2017] [30354] 0 30354 16352 526 23 3 82 0 master
[Mon Feb 13 13:09:00 2017] [30356] 112 30356 16999 306 25 3 169 0 qmgr
[Mon Feb 13 13:09:00 2017] [30361] 112 30361 20306 889 30 3 174 0 tlsmgr
[Mon Feb 13 13:09:00 2017] [27212] 0 27212 6011 664 17 3 23 0 vsftpd
[Mon Feb 13 13:09:00 2017] [30271] 0 30271 4981 711 14 3 73 0 mysqld_safe
[Mon Feb 13 13:09:00 2017] [20955] 0 20955 23842 1213 51 3 229 0 sshd
[Mon Feb 13 13:09:00 2017] [20957] 1001 20957 11312 608 26 3 218 0 systemd
[Mon Feb 13 13:09:00 2017] [20959] 1001 20959 52158 95 36 4 391 0 (sd-pam)
[Mon Feb 13 13:09:00 2017] [21053] 1001 21053 23908 629 49 3 309 0 sshd
[Mon Feb 13 13:09:00 2017] [21054] 1001 21054 5382 862 15 3 434 0 bash
[Mon Feb 13 13:09:00 2017] [21565] 0 21565 23842 1305 51 3 136 0 sshd
[Mon Feb 13 13:09:00 2017] [21595] 1001 21595 23842 728 48 3 143 0 sshd
[Mon Feb 13 13:09:00 2017] [21596] 1001 21596 5378 820 15 3 452 0 bash
[Mon Feb 13 13:09:00 2017] [21844] 0 21844 23842 1424 50 3 20 0 sshd
[Mon Feb 13 13:09:00 2017] [21874] 1001 21874 23924 840 50 3 137 0 sshd
[Mon Feb 13 13:09:00 2017] [21875] 1001 21875 5378 724 16 3 524 0 bash
[Mon Feb 13 13:09:00 2017] [21891] 0 21891 13972 727 32 3 149 0 sudo
[Mon Feb 13 13:09:00 2017] [21892] 0 21892 12751 612 30 3 105 0 su
[Mon Feb 13 13:09:00 2017] [21893] 0 21893 5030 861 15 3 74 0 bash
[Mon Feb 13 13:09:00 2017] [22092] 116 22092 154426 22725 107 4 608 0 mysqld
[Mon Feb 13 13:09:00 2017] [22093] 0 22093 6203 277 18 3 10 0 logger
[Mon Feb 13 13:09:00 2017] [22166] 112 22166 16869 1066 25 3 0 0 pickup
[Mon Feb 13 13:09:00 2017] [22256] 33 22256 100903 4062 155 3 1028 0 apache2
[Mon Feb 13 13:09:00 2017] [22272] 33 22272 101113 6090 157 3 919 0 apache2
[Mon Feb 13 13:09:00 2017] [22273] 33 22273 101016 4352 156 3 1034 0 apache2
[Mon Feb 13 13:09:00 2017] [22285] 33 22285 100915 4301 155 3 1030 0 apache2
[Mon Feb 13 13:09:00 2017] [22290] 33 22290 100994 3732 153 3 1036 0 apache2
[Mon Feb 13 13:09:00 2017] [22291] 33 22291 101505 5021 157 3 1031 0 apache2
[Mon Feb 13 13:09:00 2017] [22292] 33 22292 100991 3457 153 3 1048 0 apache2
[Mon Feb 13 13:09:00 2017] [22313] 33 22313 100886 3437 153 3 1049 0 apache2
[Mon Feb 13 13:09:00 2017] [22314] 33 22314 100374 1309 144 3 1139 0 apache2
[Mon Feb 13 13:09:00 2017] [22315] 33 22315 100890 3193 151 3 1054 0 apache2
[Mon Feb 13 13:09:00 2017] [22316] 33 22316 101015 5788 160 3 986 0 apache2
[Mon Feb 13 13:09:00 2017] [22317] 33 22317 100380 1658 144 3 1127 0 apache2
[Mon Feb 13 13:09:00 2017] [22318] 33 22318 100901 3771 155 3 1046 0 apache2
[Mon Feb 13 13:09:00 2017] [22319] 33 22319 100374 1309 144 3 1139 0 apache2
[Mon Feb 13 13:09:00 2017] [22321] 33 22321 100900 3686 155 3 1045 0 apache2
[Mon Feb 13 13:09:00 2017] [22326] 33 22326 101024 4417 155 3 1039 0 apache2
[Mon Feb 13 13:09:00 2017] [22327] 33 22327 100378 1309 144 3 1139 0 apache2
[Mon Feb 13 13:09:00 2017] [22328] 33 22328 101533 6627 161 3 853 0 apache2
[Mon Feb 13 13:09:00 2017] [22329] 0 22329 12235 727 29 3 11 0 cron
[Mon Feb 13 13:09:00 2017] [22330] 0 22330 1127 195 8 3 0 0 sh
[Mon Feb 13 13:09:00 2017] [22331] 0 22331 1127 202 8 3 0 0 sessionclean
[Mon Feb 13 13:09:00 2017] [22332] 0 22332 1127 26 8 3 0 0 sessionclean
[Mon Feb 13 13:09:00 2017] [22334] 0 22334 4144 185 9 3 0 0 sort
[Mon Feb 13 13:09:00 2017] [22335] 0 22335 4144 186 10 3 0 0 sort
[Mon Feb 13 13:09:00 2017] [22336] 0 22336 1127 26 8 3 0 0 sessionclean
[Mon Feb 13 13:09:00 2017] [22342] 0 22342 1127 301 8 3 0 0 sessionclean
[Mon Feb 13 13:09:00 2017] [22353] 0 22353 1127 29 8 3 0 0 sessionclean
[Mon Feb 13 13:09:00 2017] Out of memory: Kill process 22092 (mysqld) score 11 or sacrifice child
[Mon Feb 13 13:09:00 2017] Killed process 22092 (mysqld) total-vm:617704kB, anon-rss:76544kB, file-rss:14356kB
And the content of /proc/meminfo
:
MemTotal: 4044964 kB
MemFree: 69100 kB
MemAvailable: 3665524 kB
Buffers: 1509532 kB
Cached: 1959440 kB
SwapCached: 6288 kB
Active: 2181068 kB
Inactive: 1524292 kB
Active(anon): 133556 kB
Inactive(anon): 119404 kB
Active(file): 2047512 kB
Inactive(file): 1404888 kB
Unevictable: 3656 kB
Mlocked: 3656 kB
SwapTotal: 4194300 kB
SwapFree: 4112600 kB
Dirty: 184 kB
Writeback: 0 kB
AnonPages: 235104 kB
Mapped: 59700 kB
Shmem: 14192 kB
Slab: 220372 kB
SReclaimable: 192240 kB
SUnreclaim: 28132 kB
KernelStack: 3936 kB
PageTables: 17116 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 6216780 kB
Committed_AS: 1189376 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 0 kB
VmallocChunk: 0 kB
HardwareCorrupted: 0 kB
AnonHugePages: 14336 kB
CmaTotal: 0 kB
CmaFree: 0 kB
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
DirectMap4k: 137216 kB
DirectMap2M: 4188160 kB
I'm not too familiar with oom-killer
, so I'm wondering if this is due to total_vm
, where that column adds up to 2937127 - which, if multiplied by 4 kB pages, gives 11.2 GB.
Where I have 4 GB of physical RAM, and 4GB of SWAP space (total of 8), which if multiplied by 1.5 (due to vm.overcommit_ratio 50), would give 12GB.
For comparison, the old server never had any memory problems, and it was running a few more services as well.
It also has 4GB RAM and 4GB of SWAP, and exactly the same vm.overcommit_
settings... the only real difference is that it's a physical server, and running CentOS (6.8)
Am I missing something? because this server looks like it should have plenty of RAM to spare, and hasn't touched the SWAP space.