3

Setup

EC2 t2.large instance with Ubuntu 16.04 (2 Cores, 8GB RAM)
Apache/2.4.18 (Ubuntu)
5.7.20-0ubuntu0.16.04.1 (Ubuntu)

Currently this instance hosts multiple Drupal 6 & 7 sites, around 150 in total. The majority of the sites have a DB size of 10-20 MB.

Until there has been no tuning of the mysql options and the mysql was keep restarting with error message errno: 24 - Too many open files.

So I set the following values
DefaultLimitNOFILE=13000 DefaultLimitMEMLOCK=infinity

my.cnf

key_buffer_size     = 16M      
max_allowed_packet  = 24M  
thread_stack        = 192K  
thread_cache_size       = 8  
sort_buffer_size = 2M  
table_open_cache = 4096  
wait_timeout = 28  
query_cache_limit = 4M  
query_cache_size  = 16M  

After those changes mysql restarted cause oom-killer shut it down during taking a complete backup of each database. Below error log from syslog file

Jan 10 12:52:43 ip-172 kernel: [29734812.469619] sh invoked 

oom-killer: gfp_mask=0x26000c0, order=2, oom_score_adj=0
Jan 10 12:52:43 ip-172 kernel: [29734812.469622] sh cpuset=/ mems_allowed=0
Jan 10 12:52:43 ip-172 kernel: [29734812.469628] CPU: 1 PID: 23932 Comm: sh Not tainted 4.4.0-59-generic #80-Ubuntu
Jan 10 12:52:43 ip-172 kernel: [29734812.469629] Hardware name: Xen HVM domU, BIOS 4.2.amazon 12/09/2016
Jan 10 12:52:43 ip-172 kernel: [29734812.469631]  0000000000000286 0000000064581ff2 ffff880002767af0 ffffffff813f7583
Jan 10 12:52:43 ip-172 kernel: [29734812.469634]  ffff880002767cc8 ffff880203b60e00 ffff880002767b60 ffffffff8120ad5e
Jan 10 12:52:43 ip-172 kernel: [29734812.469636]  ffffffff81cd2dc7 0000000000000000 ffffffff81e67760 0000000000000206
Jan 10 12:52:43 ip-172 kernel: [29734812.469638] Call Trace:
Jan 10 12:52:43 ip-172 kernel: [29734812.469645]  [<ffffffff813f7583>] dump_stack+0x63/0x90
Jan 10 12:52:43 ip-172 kernel: [29734812.469649]  [<ffffffff8120ad5e>] dump_header+0x5a/0x1c5
Jan 10 12:52:43 ip-172 kernel: [29734812.469653]  [<ffffffff81192722>] oom_kill_process+0x202/0x3c0
Jan 10 12:52:43 ip-172 kernel: [29734812.469654]  [<ffffffff81192b49>] out_of_memory+0x219/0x460
Jan 10 12:52:43 ip-172 kernel: [29734812.469657]  [<ffffffff81198abd>] __alloc_pages_slowpath.constprop.88+0x8fd/0xa70
Jan 10 12:52:43 ip-172 kernel: [29734812.469659]  [<ffffffff81198eb6>] __alloc_pages_nodemask+0x286/0x2a0
Jan 10 12:52:43 ip-172 kernel: [29734812.469661]  [<ffffffff81198f6b>] alloc_kmem_pages_node+0x4b/0xc0
Jan 10 12:52:43 ip-172 kernel: [29734812.469665]  [<ffffffff8107ea5e>] copy_process+0x1be/0x1b70
Jan 10 12:52:43 ip-172 kernel: [29734812.469668]  [<ffffffff811c164d>] ? handle_mm_fault+0xcbd/0x1820
Jan 10 12:52:43 ip-172 kernel: [29734812.469670]  [<ffffffff810805a0>] _do_fork+0x80/0x360
Jan 10 12:52:43 ip-172 kernel: [29734812.469674]  [<ffffffff81092477>] ? SyS_rt_sigaction+0x77/0xd0
Jan 10 12:52:43 ip-172 kernel: [29734812.469675]  [<ffffffff81080929>] SyS_clone+0x19/0x20
Jan 10 12:52:43 ip-172 kernel: [29734812.469680]  [<ffffffff818384f2>] entry_SYSCALL_64_fastpath+0x16/0x71
Jan 10 12:52:43 ip-172 kernel: [29734812.469682] Mem-Info:
Jan 10 12:52:43 ip-172 kernel: [29734812.469685] active_anon:462537 inactive_anon:14666 isolated_anon:0
Jan 10 12:52:43 ip-172 kernel: [29734812.469685]  active_file:224584 inactive_file:1250340 isolated_file:0
Jan 10 12:52:43 ip-172 kernel: [29734812.469685]  unevictable:914 dirty:10554 writeback:0 unstable:0
Jan 10 12:52:43 ip-172 kernel: [29734812.469685]  slab_reclaimable:52958 slab_unreclaimable:11418
Jan 10 12:52:43 ip-172 kernel: [29734812.469685]  mapped:26870 shmem:37547 pagetables:3663 bounce:0
Jan 10 12:52:43 ip-172 kernel: [29734812.469685]  free:14204 free_pcp:0 free_cma:0
Jan 10 12:52:43 ip-172 kernel: [29734812.469688] Node 0 DMA free:15904kB min:20kB low:24kB high:28kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15988kB managed:15904kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
Jan 10 12:52:43 ip-172 kernel: [29734812.469692] lowmem_reserve[]: 0 3745 7966 7966 7966
Jan 10 12:52:43 ip-172 kernel: [29734812.469695] Node 0 DMA32 free:25704kB min:5332kB low:6664kB high:7996kB active_anon:914364kB inactive_anon:27464kB active_file:421060kB inactive_file:2320460kB unevictable:1612kB isolated(anon):0kB isolated(file):0kB present:3915776kB managed:3835152kB mlocked:1612kB dirty:8928kB writeback:0kB mapped:55892kB shmem:72692kB slab_reclaimable:92924kB slab_unreclaimable:19012kB kernel_stack:1120kB pagetables:6128kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
Jan 10 12:52:43 ip-172 kernel: [29734812.469699] lowmem_reserve[]: 0 0 4221 4221 4221
Jan 10 12:52:43 ip-172 kernel: [29734812.469701] Node 0 Normal free:15208kB min:6008kB low:7508kB high:9012kB active_anon:935784kB inactive_anon:31200kB active_file:477276kB inactive_file:2680900kB unevictable:2044kB isolated(anon):0kB isolated(file):0kB present:4587520kB managed:4322664kB mlocked:2044kB dirty:33288kB writeback:0kB mapped:51588kB shmem:77496kB slab_reclaimable:118908kB slab_unreclaimable:26660kB kernel_stack:2480kB pagetables:8524kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
Jan 10 12:52:43 ip-172 kernel: [29734812.469705] lowmem_reserve[]: 0 0 0 0 0
Jan 10 12:52:43 ip-172 kernel: [29734812.469707] Node 0 DMA: 0*4kB 0*8kB 0*16kB 1*32kB (U) 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15904kB
Jan 10 12:52:43 ip-172 kernel: [29734812.469716] Node 0 DMA32: 2514*4kB (UME) 1970*8kB (UME) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 25816kB
Jan 10 12:52:43 ip-172 kernel: [29734812.469723] Node 0 Normal: 3193*4kB (UME) 56*8kB (UME) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 1*2048kB (H) 0*4096kB = 15268kB
Jan 10 12:52:43 ip-172 kernel: [29734812.469731] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Jan 10 12:52:43 ip-172 kernel: [29734812.469732] 1513086 total pagecache pages
Jan 10 12:52:43 ip-172 kernel: [29734812.469733] 0 pages in swap cache
Jan 10 12:52:43 ip-172 kernel: [29734812.469735] Swap cache stats: add 0, delete 0, find 0/0
Jan 10 12:52:43 ip-172 kernel: [29734812.469735] Free swap  = 0kB
Jan 10 12:52:43 ip-172 kernel: [29734812.469736] Total swap = 0kB
Jan 10 12:52:43 ip-172 kernel: [29734812.469737] 2129821 pages RAM
Jan 10 12:52:43 ip-172 kernel: [29734812.469738] 0 pages HighMem/MovableOnly
Jan 10 12:52:43 ip-172 kernel: [29734812.469739] 86391 pages reserved
Jan 10 12:52:43 ip-172 kernel: [29734812.469739] 0 pages cma reserved
Jan 10 12:52:43 ip-172 kernel: [29734812.469740] 0 pages hwpoisoned
Jan 10 12:52:43 ip-178 kernel: [29734812.469741] [ pid ]   uid  tgid total_vm      rss nr_ptes nr_pmds swapents oom_score_adj name
Jan 10 12:52:43 ip-172 kernel: [29734812.469746] [  389]     0   389    11155     1921      22       3        0             0 systemd-journal
Jan 10 12:52:43 ip-172 kernel: [29734812.469748] [  433]     0   433    25742      208      17       4        0             0 lvmetad
Jan 10 12:52:43 ip-172 kernel: [29734812.469750] [  465]     0   465    10706      830      24       3        0         -1000 systemd-udevd
Jan 10 12:52:43 ip-172 kernel: [29734812.469751] [  653]   100   653    25081      448      18       3        0             0 systemd-timesyn
Jan 10 12:52:43 ip-172 kernel: [29734812.469753] [  840]   107   840    10763      721      25       3        0          -900 dbus-daemon
Jan 10 12:52:43 ip-172 kernel: [29734812.469755] [  857]     0   857     4030      629      12       3        0             0 dhclient
Jan 10 12:52:43 ip-1728 kernel: [29734812.469756] [  886]     0   886    74155     1025      32       5        0             0 snapd
Jan 10 12:52:43 ip-172 kernel: [29734812.469758] [  888]   104   888    65158      888      29       3        0             0 rsyslogd
Jan 10 12:52:43 ip-172 kernel: [29734812.469759] [  890]     0   890    68654     1293      35       4        0             0 accounts-daemon
Jan 10 12:52:43 ip-172 kernel: [29734812.469761] [  892]     0   892     6932      532      19       3        0             0 cron
Jan 10 12:52:43 ip-172 kernel: [29734812.469762] [  895]     0   895   226585     1792      37       3        0             0 lxcfs
Jan 10 12:52:43 ip-172 kernel: [29734812.469764] [  898]     0   898     7330      810      18       3        0             0 systemd-logind
Jan 10 12:52:43 ip-1728 kernel: [29734812.469766] [  905]     0   905     1100      295       8       3        0             0 acpid
Jan 10 12:52:43 ip-172 kernel: [29734812.469767] [  913]     0   913     6511      365      17       3        0             0 atd
Jan 10 12:52:43 ip-172 kernel: [29734812.469769] [  965]     0   965    69847      897      39       3        0             0 polkitd
Jan 10 12:52:43 ip-178 kernel: [29734812.469770] [  966]     0   966     3344       37      10       3        0             0 mdadm
Jan 10 12:52:43 ip-172 kernel: [29734812.469772] [ 1087]     0  1087    16380      669      35       3        0         -1000 sshd
Jan 10 12:52:43 ip-172 kernel: [29734812.469773] [ 1095]     0  1095     1306       31       9       3        0             0 iscsid
Jan 10 12:52:43 ip-172kernel: [29734812.469775] [ 1096]     0  1096     1431      879       9       3        0           -17 iscsid
Jan 10 12:52:43 ip-172 kernel: [29734812.469776] [ 1137]     0  1137     4868      366      15       3        0             0 irqbalance
Jan 10 12:52:43 ip-172 kernel: [29734812.469778] [ 1178]     0  1178     3619      402      12       3        0             0 agetty
Jan 10 12:52:43 ip-172 kernel: [29734812.469779] [ 1180]     0  1180     3665      358      12       3        0             0 agetty
Jan 10 12:52:43 ip-172 kernel: [29734812.469781] [16514]  1000 16514    11313      482      27       3        0             0 systemd
Jan 10 12:52:43 ip-172 kernel: [29734812.469783] [16516]  1000 16516    15283      465      34       3        0             0 (sd-pam)
Jan 10 12:52:43 ip-172 kernel: [29734812.469784] [13008]     0 13008   103511     6515     158       3        0             0 apache2
Jan 10 12:52:43 ip-172 kernel: [29734812.469786] [13470]     0 13470    23843     1691      49       4        0             0 sshd
Jan 10 12:52:43 ip-172 kernel: [29734812.469787] [13526]  1000 13526    24005     1126      48       4        0             0 sshd
Jan 10 12:52:43 ip-172 kernel: [29734812.469789] [13527]  1000 13527     5429     1382      15       3        0             0 bash
Jan 10 12:52:43 ip-172 kernel: [29734812.469790] [13738]     0 13738    23843     1691      51       3        0             0 sshd
Jan 10 12:52:43 ip-172 kernel: [29734812.469792] [13770]  1000 13770    23843      794      48       3        0             0 sshd
Jan 10 12:52:43 ip-172 kernel: [29734812.469793] [13771]  1000 13771     5371     1300      15       3        0             0 bash
Jan 10 12:52:43 ip-172 kernel: [29734812.469795] [14616]   112 14616   602139   352341     763       5        0             0 mysqld
Jan 10 12:52:43 ip-172 kernel: [29734812.469796] [17818]     0 17818    13937      967      31       3        0             0 sudo
Jan 10 12:52:43 ip-172 kernel: [29734812.469798] [17819]     0 17819     5365     1286      15       3        0             0 bash
Jan 10 12:52:43 ip-172 kernel: [29734812.469799] [18780]     0 18780    23843     1671      51       3        0             0 sshd
Jan 10 12:52:43 ip-172 kernel: [29734812.469801] [18812]  1000 18812    23973     1110      52       3        0             0 sshd
Jan 10 12:52:43 ip-172 kernel: [29734812.469802] [18813]  1000 18813     5383     1329      15       3        0             0 bash
Jan 10 12:52:43 ip-172 kernel: [29734812.469804] [19787]  1000 19787     6413     1144      17       3        0             0 htop
Jan 10 12:52:43 ip-172 kernel: [29734812.469805] [21371]    33 21371   129645    27202     202       3        0             0 apache2
Jan 10 12:52:43 ip-172 kernel: [29734812.469807] [21933]    33 21933   129300    16608     182       3        0             0 apache2
Jan 10 12:52:43 ip-172 kernel: [29734812.469808] [21958]    33 21958   132075    14133     177       3        0             0 apache2
Jan 10 12:52:43 ip-172 kernel: [29734812.469810] [22573]     0 22573    13937      966      32       3        0             0 sudo
Jan 10 12:52:43 ip-172 kernel: [29734812.469811] [22574]     0 22574     2831      746      10       3        0             0 mysqlbackup.sh
Jan 10 12:52:43 ip-172 kernel: [29734812.469813] [23150]    33 23150   128754    11201     169       3        0             0 apache2
Jan 10 12:52:43 ip-172 kernel: [29734812.469814] [23314]    33 23314   113538    14182     174       3        0             0 apache2
Jan 10 12:52:43 ip-172 kernel: [29734812.469816] [23317]    33 23317   128350    10826     168       3        0             0 apache2
Jan 10 12:52:43 ip-172 kernel: [29734812.469817] [23817]    33 23817   103633     3157     148       3        0             0 apache2
Jan 10 12:52:43 ip-172 kernel: [29734812.469819] [23832]    33 23832   109993    10857     166       3        0             0 apache2
Jan 10 12:52:43 ip-172 kernel: [29734812.469821] [23839]    33 23839   128636    11425     168       3        0             0 apache2
Jan 10 12:52:43 ip-172 kernel: [29734812.469822] [23891]    33 23891   114121    15098     175       3        0             0 apache2
Jan 10 12:52:43 ip-172 kernel: [29734812.469824] [23930]     0 23930     7253     1011      18       3        0             0 mysqldump
Jan 10 12:52:43 ip-172 kernel: [29734812.469825] [23931]     0 23931     1160      172       7       3        0             0 gzip
Jan 10 12:52:43 ip-172 kernel: [29734812.469826] [23932]    33 23932     1128      189       8       3        0             0 sh
Jan 10 12:52:43 ip-172 kernel: [29734812.469828] Out of memory: Kill process 14616 (mysqld) score 172 or sacrifice child
Jan 10 12:52:43 ip-172 kernel: [29734812.487189] Killed process 14616 (mysqld) total-vm:2408556kB, anon-rss:1400092kB, file-rss:9272kB

I would really need some help reading this log.
htop says that I utilized 1.4GB to 1.9GB RAM without buffers/cache. Also CloudWatch metric memUse --without buffers/cache reports a utilization of ~2.2GB

So my questions are:

  1. What caused the oom-killer triggering?
  2. Was mySQL consuming the entire RAM according to the log?
  3. If no how can I increase mySQL RAM utilization safely

mysqltuner report

MySQLTuner 1.7.4 - Major Hayden <major@mhtx.net>
 Bug reports, feature requests, and downloads at http://mysqltuner.com/
 Run with '--help' for additional options and output filtering

[--] Skipped version check for MySQLTuner script
[OK] Currently running supported MySQL version 5.7.20-0ubuntu0.16.04.1
[OK] Operating on 64-bit architecture

-------- Log file Recommendations ------------------------------------------------------------------
[--] Log file: /var/log/mysql/error.log(175K)
[OK] Log file /var/log/mysql/error.log exists
[OK] Log file /var/log/mysql/error.log is readable.
[OK] Log file /var/log/mysql/error.log is not empty
[OK] Log file /var/log/mysql/error.log is smaller than 32 Mb
[!!] /var/log/mysql/error.log contains 90 warning(s).
[!!] /var/log/mysql/error.log contains 984 error(s).
[--] 3 start(s) detected in /var/log/mysql/error.log
[--] 1) 2018-01-10T14:30:44.424281Z 0 [Note] /usr/sbin/mysqld: ready for connections.
[--] 2) 2018-01-10T14:26:40.542815Z 0 [Note] /usr/sbin/mysqld: ready for connections.
[--] 3) 2018-01-10T12:53:12.946215Z 0 [Note] /usr/sbin/mysqld: ready for connections.
[--] 2 shutdown(s) detected in /var/log/mysql/error.log
[--] 1) 2018-01-10T14:30:43.313896Z 0 [Note] /usr/sbin/mysqld: Shutdown complete
[--] 2) 2018-01-10T14:26:38.720171Z 0 [Note] /usr/sbin/mysqld: Shutdown complete

-------- Storage Engine Statistics -----------------------------------------------------------------
[--] Status: +ARCHIVE +BLACKHOLE +CSV -FEDERATED +InnoDB +MEMORY +MRG_MYISAM +MyISAM +PERFORMANCE_SCHEMA 
[--] Data in MyISAM tables: 47G (Tables: 14150)
[--] Data in InnoDB tables: 4G (Tables: 22995)
[OK] Total fragmented tables: 0

-------- Security Recommendations ------------------------------------------------------------------
[OK] There are no anonymous accounts for any database users
[OK] All database users have passwords assigned
[--] Bug #80860 MySQL 5.7: Avoid testing password when validate_password is activated

-------- CVE Security Recommendations --------------------------------------------------------------
[--] Skipped due to --cvefile option undefined

-------- Performance Metrics -----------------------------------------------------------------------
[--] Up for: 2h 15m 51s (3M q [374.920 qps], 4K conn, TX: 8G, RX: 638M)
[--] Reads / Writes: 98% / 2%
[--] Binary logging is disabled
[--] Physical Memory     : 7.8G
[--] Max MySQL memory    : 616.7M
[--] Other process memory: 651.1M
[--] Total buffers: 192.0M global + 2.8M per thread (151 max threads)
[--] P_S Max memory usage: 72B
[--] Galera GCache Max memory usage: 0B
[OK] Maximum reached memory usage: 208.9M (2.62% of installed RAM)
[OK] Maximum possible memory usage: 616.7M (7.73% of installed RAM)
[OK] Overall possible memory usage with other process is compatible with memory available
[OK] Slow queries: 0% (0/3M)
[OK] Highest usage of available connections: 3% (6/151)
[OK] Aborted connections: 0.02%  (1/4959)
[!!] name resolution is active : a reverse name resolution is made for each new connection and can reduce performance
[!!] Query cache may be disabled by default due to mutex contention.
[!!] Query cache efficiency: 0.0% (0 cached / 3M selects)
[OK] Query cache prunes per day: 0
[OK] Sorts requiring temporary tables: 0% (4 temp sorts / 85K sorts)
[!!] Joins performed without indexes: 1794
[OK] Temporary tables created on disk: 13% (6K on disk / 50K total)
[OK] Thread cache hit rate: 99% (6 created / 4K connections)
[!!] Table cache hit rate: 3% (4K open / 135K opened)
[OK] Open file limit used: 26% (3K/13K)
[OK] Table locks acquired immediately: 100% (5M immediate / 5M locks)

-------- Performance schema ------------------------------------------------------------------------
[--] Memory used by P_S: 72B
[--] Sys schema is installed.

-------- ThreadPool Metrics ------------------------------------------------------------------------
[--] ThreadPool stat is disabled.

-------- MyISAM Metrics ----------------------------------------------------------------------------
[!!] Key buffer used: 72.9% (12M used / 16M cache)
[OK] Key buffer size / total MyISAM indexes: 16.0M/1.1G
[OK] Read Key buffer hit rate: 99.8% (110M cached / 272K reads)
[!!] Write Key buffer hit rate: 94.4% (26K cached / 25K writes)

-------- InnoDB Metrics ----------------------------------------------------------------------------
[--] InnoDB is enabled.
[--] InnoDB Thread Concurrency: 0
[OK] InnoDB File per table is activated
[!!] InnoDB buffer pool / data size: 128.0M/5.0G
[!!] Ratio InnoDB log file size / InnoDB Buffer pool size (75 %): 48.0M * 2/128.0M should be equal 25%
[OK] InnoDB buffer pool instances: 1
[--] Number of InnoDB Buffer Pool Chunk : 1 for 1 Buffer Pool Instance(s)
[OK] Innodb_buffer_pool_size aligned with Innodb_buffer_pool_chunk_size & Innodb_buffer_pool_instances
[OK] InnoDB Read buffer efficiency: 97.24% (7756838 hits/ 7976792 total)
[OK] InnoDB Write log efficiency: 90.12% (265094 hits/ 294146 total)
[OK] InnoDB log waits: 0.00% (0 waits / 29052 writes)

-------- AriaDB Metrics ----------------------------------------------------------------------------
[--] AriaDB is disabled.

-------- TokuDB Metrics ----------------------------------------------------------------------------
[--] TokuDB is disabled.

-------- XtraDB Metrics ----------------------------------------------------------------------------
[--] XtraDB is disabled.

-------- RocksDB Metrics ---------------------------------------------------------------------------
[--] RocksDB is disabled.

-------- Spider Metrics ----------------------------------------------------------------------------
[--] Spider is disabled.

-------- Connect Metrics ---------------------------------------------------------------------------
[--] Connect is disabled.

-------- Galera Metrics ----------------------------------------------------------------------------
[--] Galera is disabled.

-------- Replication Metrics -----------------------------------------------------------------------
[--] Galera Synchronous replication: NO
[--] No replication slave(s) for this server.
[--] This is a standalone server.

-------- Recommendations ---------------------------------------------------------------------------
General recommendations:
    Control warning line(s) into /var/log/mysql/error.log file
    Control error line(s) into /var/log/mysql/error.log file
    MySQL started within last 24 hours - recommendations may be inaccurate
    Configure your accounts with ip or subnets only, then update your configuration with skip-name-resolve=1
    Adjust your join queries to always utilize indexes
    Increase table_open_cache gradually to avoid file descriptor limits
    Read this before increasing table_open_cache over 64: 
    Beware that open_files_limit (13000) variable 
    should be greater than table_open_cache (4096)
    Read this before changing innodb_log_file_size and/or innodb_log_files_in_group: 
Variables to adjust:
    query_cache_size (=0)
    query_cache_type (=0)
    query_cache_limit (> 4M, or use smaller result sets)
    join_buffer_size (> 256.0K, or always use indexes with joins)
    table_open_cache (> 4096)
    innodb_buffer_pool_size (>= 4G) if possible.
    innodb_log_file_size should be (=16M) if possible, so InnoDB total log files size equals to 25% of buffer pool size.

Thanks in advance and I hope I included all the relevant information.

Angelos Ar
  • 31
  • 1
  • 2
  • you have no swap? – EvilTorbalan Jan 10 '18 at 18:45
  • It seems so.. I didn't set up the instance, so I don't know why there is no swap. Is it possible this to be some EC2 constraint? – Angelos Ar Jan 10 '18 at 20:15
  • Not really familiar with amazon's aws but if you just started standard instance its probably the default. – EvilTorbalan Jan 10 '18 at 20:17
  • When this happens on a machine running both MySQL and Apache, the problem is usually Apache. MySQL is using a lot of memory in one process, as it is designed to, while Apache is using a lot of memory across a lot of different processes. MySQL is a sitting duck for the oom-killer, the victim rather than the cause. https://dba.stackexchange.com/a/25171/11651 – Michael - sqlbot Jan 11 '18 at 01:53
  • @AngelosAr Could you post from Ubunto ulimit -a so we can see your OS limits. We can see you have ~ 14,000 MyISAM tables and ~ 23,000 Innodb tables. Are most of them dormant from one day to another (this includes NO -warming up- of the caches)?. – Wilson Hauck Feb 26 '18 at 13:15

2 Answers2

1

Try increasing vm.min_free_kbytes, this will ensure you have enough free memory when needed. Some(kernel) processes require memory to be immediately available (and cannot wait for the MM to free some) and if its not the kernel can trigger the OOM killer. Don't set to to more than 5% of your total memory, this may also make your system unstable.

If it happens during complete backup, probably the backup takes a lot of memory in buffers/cache. Increasing vm.min_free_kbytes helped me in similar situations where daemon was killed although it didn't use that much.

The way OOM killer works, it kills the process that used most memory lately and sometimes it may be wonky.

EvilTorbalan
  • 605
  • 4
  • 10
  • Thanks for the suggestion. I will try it out. To give you some more information. the database backup is a for loop of all databases, doing a `mysqldump` of one database at a time. I rerun the backup script calling `free` in every loop and memory usage never passed the 2GB.. Will vm.min_free_kbytes actually allow mysql server to utilize more memory? – Angelos Ar Jan 10 '18 at 20:25
  • What is the cached and buffers tho, linux will use all free memory as cache. You you are doing a lot of file operations it will eventually fill all the available memory with disk cache ;) Your server will not utilize more memory, what min_free_kbytes does is ensure that you always have at least that amount of memory thats not used for anything and is ready to be allocated. – EvilTorbalan Jan 10 '18 at 20:30
  • I know that linux uses almost all available memory as cache. As you correctly imagined the buffer/cache were over 6GB constantly. `systemctl status mysql` reported ~4.4GB memory usage. However I don't know, how it fits with the values I get from `free` & `htop` – Angelos Ar Jan 10 '18 at 20:37
1

Suggestions for your my.cnf/ini, [mysqld] section

key_buffer_size=64M             # from 16M to support your ~ 14,000 MyISAM tables
query_cache_size=0              # from 16M to conserve CPU and encourage good coding
innodb_buffer_pool_size=2G      # from 128M default to support your 4G DB better
innodb_buffer_pool_instances=8  # from 1 to reduce mutex contention, even w 2G
max_connections=50              # from 151 default until you need more
query_cache_limit=1K            # from 4M to conserve RAM and QC will be OFF
query_cache_type=0              # to ensure is OFF 
#max_allowed_packet=24M         # lead with # for default size of 1M

when you need larger M_A_P, in your SESSION, SET @max_allowed_packet=nnnnnnn; up to 1 GB and 1GB is the load infile size limit.

Pablo A
  • 169
  • 9
Wilson Hauck
  • 426
  • 4
  • 10