I have a Debian 9 server (12 CPUs, 80GB of RAM), running Percona server for MySQL 8.0, about 1100 clients per second, the load is not very high around 0.3 to 2.30
Latested numbers from show engine innodb status: 183.82 inserts/s, 169.69 updates/s, 5.79 deletes/s, 2179444.29 reads/s
It runs really nice most of the time, but lately time to time it suddenly refuses new connections (when a very busy table gets locked for a long time or some query takes too long).
The logs shows:
[ERROR] [MY-000000] [connection_h] Error log throttle: 1912 'Can't create thread to handle new connection' error(s) suppressed
[ERROR] [MY-010249] [Server] Can't create thread to handle new connection(errno= 11)
[ERROR] [MY-010053] [Server] Event_scheduler::execute_top: Can not create event worker thread (errno=11). Stopping event scheduler
So, basically I have two questions:
1. How to tweak the server to accept more connections (I tried a lot of things which I will list below), it cant handle more than 4821 to be exact
2. Whenever that happens, the event scheduler stops and I have to manually enable it again, its really bad because I have some MEMORY tables that gets full, they handle the heavy traffic and dumps to other tables every 5 seconds
Until now I've tried setting the soft and hard limits in /etc/security/limits.conf for *, mysql and root users:
* hard nofile 400000
* soft nofile 400000
* hard nproc 400000
* soft nproc 400000
mysql hard nofile 400000
mysql soft nofile 400000
root hard nofile 400000
root soft nofile 400000
I also have made many adjusments to /etc/sysctl.conf listing some below:
kernel.pid_max = 262144
vm.max_map_count = 262144
net.ipv4.tcp_keepalive_time = 1200
net.ipv4.ip_local_port_range = 2000 65500
net.ipv4.tcp_max_syn_backlog = 32768
fs.file-max = 450000
net.core.netdev_max_backlog = 450000
net.core.somaxconn = 32768
I also made changes to systemd conf files: /lib/systemd/system/mysql.service - setting:
LimitNOFILE=220000
TasksMax=32768
mysqld.conf
bind-address = 0.0.0.0
# GENERAL #
user = mysql
default-storage-engine = InnoDB
socket = /var/run/mysqld/mysqld.sock
pid-file = /var/lib/mysql/mysql.pid
# SAFETY #
max-allowed-packet = 16M
max-connect-errors = 1000000
skip-name-resolve
sysdate-is-now = 1
innodb = FORCE
wait-timeout = 600
# DATA STORAGE #
datadir = /var/lib/mysql/
# BINARY LOGGING #
sync-binlog = 0
# CACHES AND LIMITS #
tmp-table-size = 32M
max-heap-table-size = 32M
max-connections = 20000
thread-cache-size = 300
open-files-limit = 65535
table-definition-cache = 4096
table-open-cache = 4096
# INNODB #
innodb-flush-method = O_DIRECT
innodb-log-files-in-group = 2
innodb-log-file-size = 512M
innodb-flush-log-at-trx-commit = 0
innodb-file-per-table = 1
innodb-buffer-pool-size = 64G
innodb-fast-shutdown = 0
innodb-buffer-pool-dump-pct = 75
innodb-buffer-pool-dump-at-shutdown = 1
innodb-buffer-pool-load-at-startup = 1
innodb-io-capacity = 400
innodb-io-capacity-max = 2000
# LOGGING #
log-error = /var/log/mysql/mysql-error.log
log-queries-not-using-indexes = 0
slow-query-log = 1
slow-query-log-file = /var/log/mysql/mysql-slow.log
long-query-time = 5
event_scheduler = 1
general_log_file = /var/log/mysql/general.log
general_log = 0
local-infile = 1
I don't know what else I can do or where else I can look, tried to search for the problems and couldn't find anything that works, for the event scheduler its even worse, there's almost nothing to be found about that (try to search in google MY-010053 or Can not create event worker thread almost no results)
The server does not get unresponsive or slow, if I lock the tables on purpose, I start getting the error in about 10 seconds, and when I unlock it recover immediatly
As Wilson Hauck requested:
ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 326193
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 400000
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 326193
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
iostat -xm 5 3
Linux 4.9.0-8-amd64 (zelda) 03/07/2019 _x86_64_ (12 CPU)
avg-cpu: %user %nice %system %iowait %steal %idle
11.24 0.00 1.56 0.76 0.00 86.44
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sda 0.41 44.17 2.41 67.61 0.05 1.61 48.55 0.37 5.23 4.74 5.25 1.37 9.59
avg-cpu: %user %nice %system %iowait %steal %idle
9.77 0.00 1.91 0.82 0.00 87.51
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sda 0.00 52.20 0.00 81.60 0.00 1.85 46.51 0.45 5.51 0.00 5.51 1.14 9.28
avg-cpu: %user %nice %system %iowait %steal %idle
5.83 0.00 1.41 0.77 0.00 91.99
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sda 0.00 132.00 0.00 161.40 0.00 8.12 103.00 0.88 5.47 0.00 5.47 0.66 10.72
show full processlist there's nothing special:
4087783 event_scheduler localhost Daemon 1 Waiting for next activation 0 0
275050068 xgh 179.191.66.174:51143 xgh Sleep 3 0 0
275050130 xgh 179.191.66.174:51144 xgh Query 0 starting show full processlist 0 0
300324788 xgh 179.191.66.174:40708 xgh Sleep 12595 5 61
304505269 xgh 179.191.66.174:51680 xgh Sleep 72 0 0
304505986 xgh 179.191.66.174:51706 xgh Sleep 72 0 0
305818676 xgh 172.30.5.2:57288 xgh Query 0 Sending data SELECT *
FROM (`noticias`.`noticia`)
WHERE `texto` = 'Carlos Ghosn deixa prisão após mais de 100 dias detido em Tóquio' 0 0
305818680 xgh 172.30.5.2:57296 xgh Sleep 57 0 0
305818682 xgh 172.30.5.2:57302 xgh Sleep 57 0 0
305818689 xgh 172.30.5.2:57316 xgh Sleep 57 0 0
305818692 xgh 172.30.5.2:57324 xgh Sleep 57 0 0
305842475 xgh 172.30.5.2:49326 xgh Sleep 1 94 550698
305842479 xgh 172.30.5.2:49334 xgh Sleep 1 0 0
305842481 xgh 172.30.5.2:49340 xgh Sleep 1 0 0
305842486 xgh 172.30.5.2:49350 xgh Sleep 1 0 0
305842489 xgh 172.30.5.2:49358 xgh Sleep 1 0 0
305842492 xgh 172.30.5.2:49364 xgh Sleep 1 0 0
305842496 xgh 172.30.5.2:49372 xgh Sleep 1 0 0
305842498 xgh 172.30.5.2:49376 xgh Sleep 1 0 0
305842501 xgh 172.30.5.2:49382 xgh Sleep 1 0 0
305842505 xgh 172.30.5.2:49392 xgh Sleep 1 0 0
305842508 xgh 172.30.5.2:49398 xgh Sleep 1 0 0
SHOW GLOBAL STATUS: https://pastebin.com/cg1v5bHD
SHOW GLOBAL VARIABLES: https://pastebin.com/8mkTzpr0