First things first, I've read a lot about the "huge task timeout" kernel panic and know that this often happens if the server is out of resources.
Error messages which appear only in the VNC console not in any log file:
[264240.505133] "echo 0 > /proc/sys/kernel/huge_task_timeout_secs" disables this message.
[264240.505359] INFO: task nginx:2333 blocked for more than 120 secounds.
[264240.505454] "echo 0 > /proc/sys/kernel/huge_task_timeout_secs" disables this message.
[264240.505658] INFO: task nginx:2334 blocked for more than 120 secounds.
[264240.505752] "echo 0 > /proc/sys/kernel/huge_task_timeout_secs" disables this message.
[264240.505946] INFO: task nginx:2335 blocked for more than 120 secounds.
[264240.506038] "echo 0 > /proc/sys/kernel/huge_task_timeout_secs" disables this message.
[264240.506251] INFO: task php5-fpm:2415 blocked for more than 120 secounds.
...
Server specs:
8 Core Intel® Xeon® E5-2660V3
24 GB DDR4
320GB SSD
The machine is KVM virtualized. It runs debian wheezy with PHP5-FPM, NGINX, MySQL and some other smaller stuff. Mainly it hosts a WebSite and a huge MySQL DB with around 25 GB data.
Disk usage is around 12%.
I've installed Munin for monitoring, which shows no anomaly.
But since the last crash I installed also sysstat
but I don't really know which of the log files could be useful for you. So please request that one you think are needed.
The crash happened around 10.03.2015 17:37 GMT.
In my opinion this has something to do with MySQL. Here the my.cnf
[client]
port = 3306
socket = /var/run/mysqld/mysqld.sock
[mysqld_safe]
socket = /var/run/mysqld/mysqld.sock
nice = 0
[mysqld]
user = mysql
pid-file = /var/run/mysqld/mysqld.pid
socket = /var/run/mysqld/mysqld.sock
port = 3306
basedir = /usr
datadir = /var/lib/mysql
tmpdir = /tmp
lc-messages-dir = /usr/share/mysql
skip-external-locking
bind-address = 127.0.0.1
key_buffer = 16M
max_allowed_packet = 16M
thread_stack = 192K
thread_cache_size = 8
myisam-recover-options = BACKUP
max_connections = 50
query_cache_limit = 1M
query_cache_size = 16M
log_error = /var/log/mysql/error.log
slow_query_log = 1
slow_query_log_file = /var/log/mysql/mysql-slow.log
expire_logs_days = 10
max_binlog_size = 100M
innodb_buffer_pool_size = 18G
innodb_log_file_size = 256M
[mysqldump]
quick
quote-names
max_allowed_packet = 16M
[mysql]
[isamchk]
key_buffer = 16M
!includedir /etc/mysql/conf.d/
As you can see I configured MySQL that it can use around 80% of the total RAM. The MySQL server performs in average 2k queries/secound with 50/50 read/write.
Right before the crash I saw in htop
that around 21 GB of 24 GB are used and 500 MB of the 1,5 GB swap, CPU usage was normal.
EDIT:
sar -u
of the time direct before the crash:
18:27:01 CPU %user %nice %system %iowait %steal %idle
18:29:01 all 8,28 0,00 1,31 5,61 0,02 84,77
18:31:01 all 7,65 0,41 1,41 5,73 0,03 84,78
18:33:01 all 7,95 0,00 1,25 5,51 0,02 85,27
18:35:01 all 8,87 0,00 1,42 5,53 0,03 84,15
18:37:01 all 8,99 0,42 1,40 5,94 0,03 83,22
Average: all 8,65 0,16 1,35 5,08 0,03 84,73
EDIT:
Munin images
EDIT:
I contacted my ISP an they said, that nothing abnormal happened at the time of the crash. So it has something to do with my setup.
Now I will check what happens if I reduce the innodb_buffer_pool_size
to 14 GB and add the innodb_flush_method = O_DIRECT
.