0

after migrating to new VPS my server occassionally experiences database crashes. This may happen 1-2x in 2-3 days. Server runs MariaDB 10.3 on CentOS 7 with nginx 1.16. Virtual server has 1 GB memory, no swap. During normal day-peak traffic I see cca 600 - 700 MB of memory occupied, 300 - 400 MB free.

I found logs in /var/log/messages (bellow) from which I understood that InnoDB crashes due to memory exhaust (pls correct me if I misread). But I did not find any DDoS attack or other suspicious activity at the time of crash.

Is there any way to figure out, what's the reason of memory exhaust? Should I tune up some InnoDB buffer/setting .. ? What should I do to prevent from crashes?

On previous VPS was server running with this configuration for 4-5 years without crash (Maria 10.1 / nginx / CentOS 6).

I am not an database expert, so I will appreciate any advice. Thank you.

Here is /var/log/messages: (crash started at 22:20)

May 27 22:20:01 mysite systemd: Created slice User Slice of root.
May 27 22:20:01 mysite systemd: Started Session 28800 of user root.
May 27 22:20:01 mysite systemd: Created slice User Slice of nginx.
May 27 22:20:01 mysite systemd: Started Session 28801 of user nginx.
May 27 22:20:11 mysite kernel: mysqld invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0
May 27 22:20:11 mysite kernel: mysqld cpuset=/ mems_allowed=0
May 27 22:20:11 mysite kernel: CPU: 0 PID: 26756 Comm: mysqld Not tainted 3.10.0-957.12.1.el7.x86_64 #1
May 27 22:20:11 mysite kernel: Hardware name: OpenStack Foundation OpenStack Nova, BIOS 1.10.1-1ubuntu1~cloud0 04/01/2014
May 27 22:20:11 mysite kernel: Call Trace:
May 27 22:20:11 mysite kernel: [<ffffffff96363021>] dump_stack+0x19/0x1b
May 27 22:20:11 mysite kernel: [<ffffffff9635da4a>] dump_header+0x90/0x229
May 27 22:20:11 mysite kernel: [<ffffffff95d01092>] ? ktime_get_ts64+0x52/0xf0
May 27 22:20:11 mysite kernel: [<ffffffff95d582df>] ? delayacct_end+0x8f/0xb0
May 27 22:20:11 mysite kernel: [<ffffffffc033f71a>] ? virtballoon_oom_notify+0x2a/0x70 [virtio_balloon]
May 27 22:20:11 mysite kernel: [<ffffffff95dba634>] oom_kill_process+0x254/0x3d0
May 27 22:20:11 mysite kernel: [<ffffffff95dba0dd>] ? oom_unkillable_task+0xcd/0x120
May 27 22:20:11 mysite kernel: [<ffffffff95dba186>] ? find_lock_task_mm+0x56/0xc0
May 27 22:20:11 mysite kernel: [<ffffffff95dbae76>] out_of_memory+0x4b6/0x4f0
May 27 22:20:11 mysite kernel: [<ffffffff9635e54e>] __alloc_pages_slowpath+0x5d6/0x724
May 27 22:20:11 mysite kernel: [<ffffffff95dc1254>] __alloc_pages_nodemask+0x404/0x420
May 27 22:20:11 mysite kernel: [<ffffffff95e0e148>] alloc_pages_current+0x98/0x110
May 27 22:20:11 mysite kernel: [<ffffffff95db6497>] __page_cache_alloc+0x97/0xb0
May 27 22:20:11 mysite kernel: [<ffffffff95db90f8>] filemap_fault+0x298/0x490
May 27 22:20:11 mysite kernel: [<ffffffffc01fdd0e>] __xfs_filemap_fault+0x7e/0x1d0 [xfs]
May 27 22:20:11 mysite kernel: [<ffffffff95cc2e00>] ? wake_bit_function+0x40/0x40
May 27 22:20:11 mysite kernel: [<ffffffffc01fdf0c>] xfs_filemap_fault+0x2c/0x30 [xfs]
May 27 22:20:11 mysite kernel: [<ffffffff95de462a>] __do_fault.isra.59+0x8a/0x100
May 27 22:20:11 mysite kernel: [<ffffffff95de4bdc>] do_read_fault.isra.61+0x4c/0x1b0
May 27 22:20:11 mysite kernel: [<ffffffff95de9584>] handle_pte_fault+0x2f4/0xd10
May 27 22:20:11 mysite kernel: [<ffffffff95dec0bd>] handle_mm_fault+0x39d/0x9b0
May 27 22:20:11 mysite kernel: [<ffffffff963705e3>] __do_page_fault+0x203/0x4f0
May 27 22:20:11 mysite kernel: [<ffffffff963709b6>] trace_do_page_fault+0x56/0x150
May 27 22:20:11 mysite kernel: [<ffffffff9636ff42>] do_async_page_fault+0x22/0xf0
May 27 22:20:11 mysite kernel: [<ffffffff9636c788>] async_page_fault+0x28/0x30
May 27 22:20:11 mysite kernel: Mem-Info:
May 27 22:20:11 mysite kernel: active_anon:187693 inactive_anon:17396 isolated_anon:0#012 active_file:1781 inactive_file:3687 isolated_file:0#012 unevictable:0 dirty:0 writeback:0 unstable:0#012 slab_reclaimable:3639 slab_unreclaimable:7716#012 mapped:10218 shmem:27944 pagetables:13399 bounce:0#012 free:12232 free_pcp:30 free_cma:0
May 27 22:20:11 mysite kernel: Node 0 DMA free:4592kB min:708kB low:884kB high:1060kB active_anon:8412kB inactive_anon:996kB active_file:0kB inactive_file:216kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15992kB managed:15908kB mlocked:0kB dirty:0kB writeback:0kB mapped:472kB shmem:1420kB slab_reclaimable:60kB slab_unreclaimable:380kB kernel_stack:16kB pagetables:728kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:48 all_unreclaimable? yes
May 27 22:20:11 mysite kernel: lowmem_reserve[]: 0 972 972 972
May 27 22:20:11 mysite kernel: Node 0 DMA32 free:44212kB min:44344kB low:55428kB high:66516kB active_anon:742360kB inactive_anon:68588kB active_file:7124kB inactive_file:14532kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:1032048kB managed:998980kB mlocked:0kB dirty:0kB writeback:0kB mapped:40400kB shmem:110356kB slab_reclaimable:14496kB slab_unreclaimable:30484kB kernel_stack:2704kB pagetables:52868kB unstable:0kB bounce:0kB free_pcp:240kB local_pcp:240kB free_cma:0kB writeback_tmp:0kB pages_scanned:15611 all_unreclaimable? yes
May 27 22:20:11 mysite kernel: lowmem_reserve[]: 0 0 0 0
May 27 22:20:11 mysite kernel: Node 0 DMA: 2*4kB (EM) 3*8kB (EM) 5*16kB (UE) 4*32kB (UE) 2*64kB (UM) 1*128kB (E) 2*256kB (UE) 3*512kB (UEM) 2*1024kB (UE) 0*2048kB 0*4096kB = 4592kB
May 27 22:20:11 mysite kernel: Node 0 DMA32: 403*4kB (UEM) 597*8kB (UEM) 288*16kB (UEM) 116*32kB (UEM) 41*64kB (UEM) 30*128kB (UEM) 18*256kB (UEM) 18*512kB (UEM) 7*1024kB (UEM) 1*2048kB (U) 0*4096kB = 44212kB
May 27 22:20:11 mysite kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
May 27 22:20:11 mysite kernel: 33433 total pagecache pages
May 27 22:20:11 mysite kernel: 0 pages in swap cache
May 27 22:20:11 mysite kernel: Swap cache stats: add 0, delete 0, find 0/0
May 27 22:20:11 mysite kernel: Free swap  = 0kB
May 27 22:20:11 mysite kernel: Total swap = 0kB
May 27 22:20:11 mysite kernel: 262010 pages RAM
May 27 22:20:11 mysite kernel: 0 pages HighMem/MovableOnly
May 27 22:20:11 mysite kernel: 8288 pages reserved
May 27 22:20:11 mysite kernel: [ pid ]   uid  tgid total_vm      rss nr_ptes swapents oom_score_adj name
May 27 22:20:11 mysite kernel: [ 1321]     0  1321     9404      357      23        0             0 systemd-journal
May 27 22:20:11 mysite kernel: [ 1349]     0  1349    11119      141      24        0         -1000 systemd-udevd
May 27 22:20:11 mysite kernel: [ 1573]     0  1573    13880      114      28        0         -1000 auditd
May 27 22:20:11 mysite kernel: [ 1933]    81  1933    14524      138      34        0          -900 dbus-daemon
May 27 22:20:11 mysite kernel: [ 1961]     0  1961    48774      118      35        0             0 gssproxy
May 27 22:20:11 mysite kernel: [ 2001]     0  2001     6594       83      17        0             0 systemd-logind
May 27 22:20:11 mysite kernel: [ 2027]   999  2027   153158     1754      59        0             0 polkitd
May 27 22:20:11 mysite kernel: [ 2116]   998  2116    29446      115      30        0             0 chronyd
May 27 22:20:11 mysite kernel: [ 2385]     0  2385    89521     5530      94        0             0 firewalld
May 27 22:20:11 mysite kernel: [ 2961]     0  2961    26866      500      49        0             0 dhclient
May 27 22:20:11 mysite kernel: [ 3022]     0  3022   143481     2781      98        0             0 tuned
May 27 22:20:11 mysite kernel: [ 3189]     0  3189    22577      280      44        0             0 master
May 27 22:20:11 mysite kernel: [ 3191]    89  3191    22647      286      43        0             0 qmgr
May 27 22:20:11 mysite kernel: [ 3228]     0  3228    28216      255      58        0         -1000 sshd
May 27 22:20:11 mysite kernel: [ 3229]     0  3229    66328     1182      59        0             0 rsyslogd
May 27 22:20:11 mysite kernel: [ 3235]     0  3235    31579      170      18        0             0 crond
May 27 22:20:11 mysite kernel: [ 3236]     0  3236    27526       33      10        0             0 agetty
May 27 22:20:11 mysite kernel: [ 3237]     0  3237    27526       33      11        0             0 agetty
May 27 22:20:11 mysite kernel: [15958]     0 15958   130428     6710     147        0             0 php-fpm
May 27 22:20:11 mysite kernel: [15982]     0 15982    14721      288      25        0             0 nginx
May 27 22:20:11 mysite kernel: [15983]   997 15983    15403      975      31        0             0 nginx
May 27 22:20:11 mysite kernel: [21323]   997 21323   245378    11185     347        0             0 php-fpm
May 27 22:20:11 mysite kernel: [21327]   997 21327   246079    12329     348        0             0 php-fpm
May 27 22:20:11 mysite kernel: [21331]   997 21331   244508     9902     345        0             0 php-fpm
May 27 22:20:11 mysite kernel: [21582]   997 21582   225663    12179     340        0             0 php-fpm
May 27 22:20:11 mysite kernel: [ 4073]   997  4073   224018     8808     337        0             0 php-fpm
May 27 22:20:11 mysite kernel: [ 4076]   997  4076   223517     8433     335        0             0 php-fpm
May 27 22:20:11 mysite kernel: [ 5510]   997  5510   244374     7954     342        0             0 php-fpm
May 27 22:20:11 mysite kernel: [ 5517]   997  5517   244782     8575     344        0             0 php-fpm
May 27 22:20:11 mysite kernel: [ 5519]   997  5519   244994     9463     345        0             0 php-fpm
May 27 22:20:11 mysite kernel: [ 5520]   997  5520   244763     8907     344        0             0 php-fpm
May 27 22:20:11 mysite kernel: [ 5521]   997  5521   223316     8418     335        0             0 php-fpm
May 27 22:20:11 mysite kernel: [ 5522]   997  5522   244493     8212     343        0             0 php-fpm
May 27 22:20:11 mysite kernel: [ 5523]   997  5523   223509     9244     335        0             0 php-fpm
May 27 22:20:11 mysite kernel: [ 5524]   997  5524   223996     8921     336        0             0 php-fpm
May 27 22:20:11 mysite kernel: [ 5527]   997  5527   224017     8661     335        0             0 php-fpm
May 27 22:20:11 mysite kernel: [ 5536]   997  5536   244945     9254     346        0             0 php-fpm
May 27 22:20:11 mysite kernel: [ 5546]   997  5546   223528     8214     334        0             0 php-fpm
May 27 22:20:11 mysite kernel: [ 5551]   997  5551   244849     8735     343        0             0 php-fpm
May 27 22:20:11 mysite kernel: [ 5557]   997  5557   223438     8154     333        0             0 php-fpm
May 27 22:20:11 mysite kernel: [ 5558]   997  5558   244862     8565     344        0             0 php-fpm
May 27 22:20:11 mysite kernel: [ 5559]   997  5559   244877     8601     343        0             0 php-fpm
May 27 22:20:11 mysite kernel: [ 5561]   997  5561   223924     8706     338        0             0 php-fpm
May 27 22:20:11 mysite kernel: [ 5562]   997  5562   224018     8866     336        0             0 php-fpm
May 27 22:20:11 mysite kernel: [ 5563]   997  5563   224008     9187     335        0             0 php-fpm
May 27 22:20:11 mysite kernel: [ 5564]   997  5564   244965     8904     344        0             0 php-fpm
May 27 22:20:11 mysite kernel: [ 5566]   997  5566   244403     8290     348        0             0 php-fpm
May 27 22:20:11 mysite kernel: [ 5567]   997  5567   244920     9447     344        0             0 php-fpm
May 27 22:20:11 mysite kernel: [ 5568]   997  5568   244514     8736     343        0             0 php-fpm
May 27 22:20:11 mysite kernel: [ 5569]   997  5569   223477     7876     334        0             0 php-fpm
May 27 22:20:11 mysite kernel: [ 5570]   997  5570   224004     9054     335        0             0 php-fpm
May 27 22:20:11 mysite kernel: [ 5574]   997  5574   223319     7945     333        0             0 php-fpm
May 27 22:20:11 mysite kernel: [ 5575]   997  5575   244729     8709     343        0             0 php-fpm
May 27 22:20:11 mysite kernel: [ 5576]   997  5576   223435     8296     336        0             0 php-fpm
May 27 22:20:11 mysite kernel: [ 5577]   997  5577   244711     8679     343        0             0 php-fpm
May 27 22:20:11 mysite kernel: [ 5578]   997  5578   244467     8119     344        0             0 php-fpm
May 27 22:20:11 mysite kernel: [26393]   996 26393   302816    41380     170        0             0 mysqld
May 27 22:20:11 mysite kernel: [10107]    89 10107    22603      271      45        0             0 pickup
May 27 22:20:11 mysite kernel: [11273]     0 11273    45598      242      45        0             0 crond
May 27 22:20:11 mysite kernel: [11274]     0 11274    45598      242      45        0             0 crond
May 27 22:20:11 mysite kernel: [11275]     0 11275    28294       45      11        0             0 sh
May 27 22:20:11 mysite kernel: [11276]   997 11276    28294       43      11        0             0 sh
May 27 22:20:11 mysite kernel: [11277]   997 11277    96384     2413     139        0             0 php
May 27 22:20:11 mysite kernel: [11278]     0 11278    50559     1613      54        0             0 python
May 27 22:20:11 mysite kernel: Out of memory: Kill process 26393 (mysqld) score 163 or sacrifice child
May 27 22:20:11 mysite kernel: Killed process 26393 (mysqld) total-vm:1211264kB, anon-rss:165520kB, file-rss:0kB, shmem-rss:0kB
May 27 22:20:11 mysite systemd: mariadb.service: main process exited, code=killed, status=9/KILL
May 27 22:20:11 mysite systemd: Unit mariadb.service entered failed state.
May 27 22:20:11 mysite systemd: mariadb.service failed.
May 27 22:20:11 mysite systemd: Removed slice User Slice of nginx.
May 27 22:20:16 mysite systemd: mariadb.service holdoff time over, scheduling restart.
May 27 22:20:16 mysite systemd: Stopped MariaDB 10.3.14 database server.
May 27 22:20:16 mysite systemd: Starting MariaDB 10.3.14 database server...
May 27 22:20:16 mysite mysqld: 2019-05-27 22:20:16 0 [Note] /usr/sbin/mysqld (mysqld 10.3.14-MariaDB-log) starting as process 11329 ...
May 27 22:20:17 mysite systemd: Started MariaDB 10.3.14 database server.
May 27 22:20:27 mysite systemd: Removed slice User Slice of root.
May 27 22:21:02 mysite systemd: Created slice User Slice of nginx.
May 27 22:21:02 mysite systemd: Started Session 28802 of user nginx.

And here is corresponding /var/log/mysql/errors.log from that crash:

Version: '10.3.14-MariaDB-log'  socket: '/var/lib/mysql/mysql.sock'  port: 3306  MariaDB Server
2019-05-27 22:20:16 0 [Note] InnoDB: Using Linux native AIO
2019-05-27 22:20:16 0 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
2019-05-27 22:20:16 0 [Note] InnoDB: Uses event mutexes
2019-05-27 22:20:16 0 [Note] InnoDB: Compressed tables use zlib 1.2.7
2019-05-27 22:20:16 0 [Note] InnoDB: Number of pools: 1
2019-05-27 22:20:16 0 [Note] InnoDB: Using SSE2 crc32 instructions
2019-05-27 22:20:16 0 [Note] InnoDB: Initializing buffer pool, total size = 128M, instances = 1, chunk size = 128M
2019-05-27 22:20:16 0 [Note] InnoDB: Completed initialization of buffer pool
2019-05-27 22:20:16 0 [Note] InnoDB: If the mysqld execution user is authorized, page cleaner thread priority can be changed. See the man page of setpriority().
2019-05-27 22:20:16 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=699238373
2019-05-27 22:20:17 0 [Note] InnoDB: 128 out of 128 rollback segments are active.
2019-05-27 22:20:17 0 [Note] InnoDB: Removed temporary tablespace data file: "ibtmp1"
2019-05-27 22:20:17 0 [Note] InnoDB: Creating shared tablespace for temporary tables
2019-05-27 22:20:17 0 [Note] InnoDB: Setting file './ibtmp1' size to 12 MB. Physically writing the file full; Please wait ...
2019-05-27 22:20:17 0 [Note] InnoDB: File './ibtmp1' size is now 12 MB.
2019-05-27 22:20:17 0 [Note] InnoDB: Waiting for purge to start
2019-05-27 22:20:17 0 [Note] InnoDB: 10.3.14 started; log sequence number 699238382; transaction id 395866
2019-05-27 22:20:17 0 [Note] Plugin 'FEEDBACK' is disabled.
2019-05-27 22:20:17 0 [Note] InnoDB: Loading buffer pool(s) from /var/lib/mysql/ib_buffer_pool
2019-05-27 22:20:17 0 [Note] Recovering after a crash using tc.log
2019-05-27 22:20:17 0 [Note] InnoDB: Buffer pool(s) load completed at 190527 22:20:17
2019-05-27 22:20:17 0 [Note] Starting crash recovery...
2019-05-27 22:20:17 0 [Note] Crash recovery finished.
2019-05-27 22:20:17 0 [Note] Server socket created on IP: '::'.
2019-05-27 22:20:17 0 [Note] Reading of all Master_info entries succeded
2019-05-27 22:20:17 0 [Note] Added new Master_info '' to hash table
2019-05-27 22:20:17 0 [Note] /usr/sbin/mysqld: ready for connections.
Version: '10.3.14-MariaDB-log'  socket: '/var/lib/mysql/mysql.sock'  port: 3306  MariaDB Server

I use default settings, nothing special in etc/my.cnf.d/server.cnf:

# this is only for the mysqld standalone daemon
[mysqld]
key_buffer_size=16M
max_allowed_packet=8M
# log also slow queries:
slow_query_log = 1
long_query_time = 3
slow_query_log_file=/var/log/mysql/mysite-slow-query.log
#log_queries_not_using_indexes
log_error=/var/log/mysql/mysite-errors.log
lubosdz
  • 145
  • 1
  • 1
  • 5
  • 1
    As always, run mysqltuner.pl and examine the results. – Michael Hampton May 28 '19 at 00:32
  • @MichaelHampton I didn't know about such a script. Script suggested increasing `innodb_buffer_pool_size` from 128M to 256M, decreasing `innodb_log_file_size` from 48M to 32M, increasing `tmp_table_size` and `max_heap_table_size` from 16M to 32M and turning off DNS resolving `skip-name-resolve=1`. Will follow for a few days whether helped .. thank you. – lubosdz May 28 '19 at 10:16

1 Answers1

0

It is hard to analyze the memory requirements for such a tiny (1GB) system. Especially since you have the entire stack in the single VM.

The obvious culprit is so many PHP threads. They are probably sitting idle. Does nginx have a way to tune the number of threads it will run? If so, set it much lower, say 6. This may be the best quick-fix.

The suggestions from mysqltuner were more aimed at performance than memory. So, don't make any of those changes; most of them will add to the problem by allocating more RAM.

On the other hand, 128M for the buffer_pool is dangerously high in a mere 1GB of RAM. Still, don't change it (up or down) unless you get something specifically pointing to that in the error log.

Rick James
  • 2,058
  • 5
  • 11
  • You mean probably php-fpm threads rather than nginx .. ? I was also wondering why are there so many. When I restarted php-fpm memory usage dropped from 850 MB down to enjoyable 450 - 500 MB :-). I also reverted back innodb_pool from 256M to 128M because tuning script threw warnings. Must figure out those php-fpm strange threads ... thank you. – lubosdz May 28 '19 at 22:18
  • In nginx, `pm.max_children = 6` perhaps in `/etc/php-fpm.d/site.conf`. @lubosdz – Rick James May 28 '19 at 23:02