3

One of my server (Ubuntu 12.04, 16CPU, 32GB RAM) runs the app section of my website (Nginx 1.1.19, PHP5-FPM, Php 5.3.10, Symfony2 Web Framework)

Suddenly, users started getting HTTP 5** Errors. When I went on the server, there was huge disk IO and the culprit was [flush].

When I went through the kernel logs using dmesg, it showed the reason for [flush] was nginx processes taking too much time to flush. (I think)

Kernel Log entry from the time of Incident:

Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.048088] INFO: task nginx:17876 blocked for more than 120 seconds.
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.050180] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.057484] nginx           D ffffffff81806240     0 17876  17875 0x00000000
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.057488]  ffff88075deeda68 0000000000000082 ffff88063383be38 ffff880427d450d0
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.057491]  ffff88075deedfd8 ffff88075deedfd8 ffff88075deedfd8 0000000000013700
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.057494]  ffffffff81c0d020 ffff880734362de0 ffff88075deeda78 ffff88075deedad8
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.057501] Call Trace:
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.057508]  [<ffffffff8165badf>] schedule+0x3f/0x60
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.057513]  [<ffffffff8126163d>] do_get_write_access+0x27d/0x4f0
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.057519]  [<ffffffff8108b920>] ? autoremove_wake_function+0x40/0x40
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.057523]  [<ffffffff8129f1fe>] ? security_inode_alloc+0x1e/0x20
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.057526]  [<ffffffff81261a00>] jbd2_journal_get_write_access+0x30/0x50
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.057531]  [<ffffffff812439fe>] __ext4_journal_get_write_access+0x3e/0x80
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.057536]  [<ffffffff8121453a>] ext4_new_inode+0x19a/0xb10
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.057540]  [<ffffffff81220745>] ? ext4_lookup.part.27+0x75/0x150
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.057543]  [<ffffffff81221e58>] ext4_create+0xb8/0x140
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.057547]  [<ffffffff81185da4>] vfs_create+0xb4/0x120
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.057551]  [<ffffffff81187a79>] do_last+0x5c9/0x730
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.057554]  [<ffffffff81188f81>] path_openat+0xd1/0x3f0
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.057559]  [<ffffffff8152beea>] ? sock_poll+0x1a/0x20
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.057563]  [<ffffffff811893c2>] do_filp_open+0x42/0xa0
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.057567]  [<ffffffff8131b481>] ? strncpy_from_user+0x31/0x40
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.057570]  [<ffffffff8118472a>] ? do_getname+0x10a/0x180
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.057575]  [<ffffffff8165d9be>] ? _raw_spin_lock+0xe/0x20
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.057578]  [<ffffffff811964d7>] ? alloc_fd+0xf7/0x150
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.057583]  [<ffffffff81178a18>] do_sys_open+0xf8/0x240
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.057586]  [<ffffffff81178b80>] sys_open+0x20/0x30
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.057591]  [<ffffffff81666002>] system_call_fastpath+0x16/0x1b
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.057594] INFO: task nginx:17877 blocked for more than 120 seconds.
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.061709] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.066548] nginx           D ffffffff81806240     0 17877  17875 0x00000000
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.066551]  ffff880703c07c78 0000000000000082 ffff880746bb37b0 ffff880703c07e38
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.066554]  ffff880703c07fd8 ffff880703c07fd8 ffff880703c07fd8 0000000000013700
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.066556]  ffff88075e5896f0 ffff880734360000 0000000000000081 ffff880746bb3bc0
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.066559] Call Trace:
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.066562]  [<ffffffff8165badf>] schedule+0x3f/0x60
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.066564]  [<ffffffff8165c8c7>] __mutex_lock_slowpath+0xd7/0x150
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.066566]  [<ffffffff8165c482>] mutex_lock+0x22/0x40
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.066568]  [<ffffffff81187764>] do_last+0x2b4/0x730
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.066570]  [<ffffffff81188f81>] path_openat+0xd1/0x3f0
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.066573]  [<ffffffff8152beea>] ? sock_poll+0x1a/0x20
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.066575]  [<ffffffff811893c2>] do_filp_open+0x42/0xa0
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.066577]  [<ffffffff8131b481>] ? strncpy_from_user+0x31/0x40
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.066579]  [<ffffffff8118472a>] ? do_getname+0x10a/0x180
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.066581]  [<ffffffff8165d9be>] ? _raw_spin_lock+0xe/0x20
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.066584]  [<ffffffff811964d7>] ? alloc_fd+0xf7/0x150
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.066586]  [<ffffffff81178a18>] do_sys_open+0xf8/0x240
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.066588]  [<ffffffff81178b80>] sys_open+0x20/0x30
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.066590]  [<ffffffff81666002>] system_call_fastpath+0x16/0x1b
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.066592] INFO: task nginx:17878 blocked for more than 120 seconds.
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.070611] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.075090] nginx           D ffffffff81806240     0 17878  17875 0x00000000
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.075093]  ffff88075d68fc78 0000000000000082 ffff880746bb37b0 ffff88075d68fe38
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.075098]  ffff88075d68ffd8 ffff88075d68ffd8 ffff88075d68ffd8 0000000000013700
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.075103]  ffffffff81c0d020 ffff880734365bc0 0000000000000081 ffff880746bb3bc0
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.075109] Call Trace:
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.075113]  [<ffffffff8165badf>] schedule+0x3f/0x60
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.075117]  [<ffffffff8165c8c7>] __mutex_lock_slowpath+0xd7/0x150
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.075120]  [<ffffffff8165c482>] mutex_lock+0x22/0x40
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.075123]  [<ffffffff81187764>] do_last+0x2b4/0x730
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.075127]  [<ffffffff81188f81>] path_openat+0xd1/0x3f0
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.075130]  [<ffffffff8152beea>] ? sock_poll+0x1a/0x20
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.075134]  [<ffffffff811893c2>] do_filp_open+0x42/0xa0
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.075137]  [<ffffffff8131b481>] ? strncpy_from_user+0x31/0x40
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.075140]  [<ffffffff8118472a>] ? do_getname+0x10a/0x180
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.075143]  [<ffffffff8165d9be>] ? _raw_spin_lock+0xe/0x20
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.075146]  [<ffffffff811964d7>] ? alloc_fd+0xf7/0x150
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.075149]  [<ffffffff81178a18>] do_sys_open+0xf8/0x240
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.075152]  [<ffffffff81178b80>] sys_open+0x20/0x30
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.075156]  [<ffffffff81666002>] system_call_fastpath+0x16/0x1b
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.075158] INFO: task nginx:17879 blocked for more than 120 seconds.
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.078900] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.083527] nginx           D ffffffff81806240     0 17879  17875 0x00000000
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.083530]  ffff880680f29c78 0000000000000082 ffff880746bb37b0 ffff880680f29e38
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.083536]  ffff880680f29fd8 ffff880680f29fd8 ffff880680f29fd8 0000000000013700
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.083542]  ffffffff81c0d020 ffff8807343644d0 0000000000000081 ffff880746bb3bc0
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.083548] Call Trace:
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.083551]  [<ffffffff8165badf>] schedule+0x3f/0x60
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.083554]  [<ffffffff8165c8c7>] __mutex_lock_slowpath+0xd7/0x150
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.083558]  [<ffffffff8165c482>] mutex_lock+0x22/0x40
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.083562]  [<ffffffff81187764>] do_last+0x2b4/0x730
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.083565]  [<ffffffff81188f81>] path_openat+0xd1/0x3f0
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.083568]  [<ffffffff8152beea>] ? sock_poll+0x1a/0x20
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.083572]  [<ffffffff811893c2>] do_filp_open+0x42/0xa0
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.083575]  [<ffffffff8131b481>] ? strncpy_from_user+0x31/0x40
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.083578]  [<ffffffff8118472a>] ? do_getname+0x10a/0x180
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.083581]  [<ffffffff8165d9be>] ? _raw_spin_lock+0xe/0x20
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.083585]  [<ffffffff811964d7>] ? alloc_fd+0xf7/0x150
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.083588]  [<ffffffff81178a18>] do_sys_open+0xf8/0x240
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.083591]  [<ffffffff81178b80>] sys_open+0x20/0x30
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.083594]  [<ffffffff81666002>] system_call_fastpath+0x16/0x1b
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.083597] INFO: task php5-fpm:15541 blocked for more than 120 seconds.
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.087245] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.091289] php5-fpm        D 0000000000000008     0 15541  17849 0x00000004
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.091291]  ffff880670031c28 0000000000000082 ffff88024683e001 00000000000043ff
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.091294]  ffff880670031fd8 ffff880670031fd8 ffff880670031fd8 0000000000013700
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.091299]  ffff88013cd75bc0 ffff88075ddc96f0 ffff880670031c78 ffff88074acc9cd0
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.091305] Call Trace:
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.091309]  [<ffffffff8165badf>] schedule+0x3f/0x60
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.091313]  [<ffffffff8165c8c7>] __mutex_lock_slowpath+0xd7/0x150
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.091316]  [<ffffffff81185a84>] ? path_get+0x44/0x50
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.091319]  [<ffffffff8165c482>] mutex_lock+0x22/0x40
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.091323]  [<ffffffff81186446>] do_lookup+0x1d6/0x310
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.091326]  [<ffffffff811881ac>] path_lookupat+0x11c/0x750
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.091330]  [<ffffffff8114f6be>] ? free_pages_and_swap_cache+0x9e/0xc0
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.091334]  [<ffffffff81188811>] do_path_lookup+0x31/0xc0
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.091337]  [<ffffffff81189319>] user_path_at_empty+0x59/0xa0
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.091341]  [<ffffffff812d32cd>] ? aa_dup_task_context+0x3d/0x70
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.091345]  [<ffffffff812d86bb>] ? apparmor_cred_prepare+0x3b/0x60
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.091349]  [<ffffffff81189371>] user_path_at+0x11/0x20
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.091352]  [<ffffffff81178212>] sys_faccessat+0xa2/0x1e0
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.091356]  [<ffffffff81095c3a>] ? do_gettimeofday+0x1a/0x50
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.091359]  [<ffffffff81178368>] sys_access+0x18/0x20
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.091363]  [<ffffffff81666002>] system_call_fastpath+0x16/0x1b
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.091368] INFO: task php5-fpm:15643 blocked for more than 120 seconds.
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.094542] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.098519] php5-fpm        D ffffffff81806240     0 15643  17849 0x00000000
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.098522]  ffff880680d3dbb8 0000000000000086 ffff880557f73001 00000000000043ff
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.098527]  ffff880680d3dfd8 ffff880680d3dfd8 ffff880680d3dfd8 0000000000013700
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.098533]  ffffffff81c0d020 ffff88013cd72de0 ffff880680d3dc08 ffff88074acc9cd0
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.098539] Call Trace:
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.098542]  [<ffffffff8165badf>] schedule+0x3f/0x60
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.098545]  [<ffffffff8165c8c7>] __mutex_lock_slowpath+0xd7/0x150
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.098549]  [<ffffffff81185a84>] ? path_get+0x44/0x50
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.098552]  [<ffffffff8165c482>] mutex_lock+0x22/0x40
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.098555]  [<ffffffff81186446>] do_lookup+0x1d6/0x310
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.098558]  [<ffffffff811881ac>] path_lookupat+0x11c/0x750
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.098561]  [<ffffffff81188811>] do_path_lookup+0x31/0xc0
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.098564]  [<ffffffff81189319>] user_path_at_empty+0x59/0xa0
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.098568]  [<ffffffff8113e127>] ? zap_page_range+0xb7/0xd0
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.098572]  [<ffffffff8165d9be>] ? _raw_spin_lock+0xe/0x20
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.098575]  [<ffffffff8109f8f3>] ? futex_wake+0x113/0x130
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.098578]  [<ffffffff81189371>] user_path_at+0x11/0x20
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.098582]  [<ffffffff8117e39a>] vfs_fstatat+0x3a/0x70
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.098585]  [<ffffffff8117e3ee>] vfs_lstat+0x1e/0x20
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.098588]  [<ffffffff8117e58a>] sys_newlstat+0x1a/0x40
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.098591]  [<ffffffff8113a855>] ? sys_madvise+0x225/0x280
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.098595]  [<ffffffff81666002>] system_call_fastpath+0x16/0x1b
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.098598] INFO: task php5-fpm:15775 blocked for more than 120 seconds.
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.102580] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.107285] php5-fpm        D ffffffff81806240     0 15775  17849 0x00000000
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.107287]  ffff880680f69bb8 0000000000000086 ffff88074bdce001 00000000000043ff
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.107292]  ffff880680f69fd8 ffff880680f69fd8 ffff880680f69fd8 0000000000013700
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.107298]  ffff88075e5a96f0 ffff88074bbf8000 ffff880680f69c08 ffff88074acc9cd0
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.107303] Call Trace:
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.107307]  [<ffffffff8165badf>] schedule+0x3f/0x60
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.107311]  [<ffffffff8165c8c7>] __mutex_lock_slowpath+0xd7/0x150
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.107314]  [<ffffffff81185a84>] ? path_get+0x44/0x50
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.107317]  [<ffffffff8165c482>] mutex_lock+0x22/0x40
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.107320]  [<ffffffff81186446>] do_lookup+0x1d6/0x310
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.107323]  [<ffffffff811881ac>] path_lookupat+0x11c/0x750
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.107327]  [<ffffffff810d8ea7>] ? irq_to_desc+0x17/0x20
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.107331]  [<ffffffff810dbd8e>] ? irq_get_irq_data+0xe/0x10
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.107334]  [<ffffffff81188811>] do_path_lookup+0x31/0xc0
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.107337]  [<ffffffff81189319>] user_path_at_empty+0x59/0xa0
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.107340]  [<ffffffff8165d9be>] ? _raw_spin_lock+0xe/0x20
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.107344]  [<ffffffff8109f8f3>] ? futex_wake+0x113/0x130
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.107347]  [<ffffffff81189371>] user_path_at+0x11/0x20
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.107351]  [<ffffffff8117e39a>] vfs_fstatat+0x3a/0x70
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.107354]  [<ffffffff8117e3ee>] vfs_lstat+0x1e/0x20
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.107357]  [<ffffffff8117e58a>] sys_newlstat+0x1a/0x40
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.107361]  [<ffffffff81666002>] system_call_fastpath+0x16/0x1b
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.107364] INFO: task php5-fpm:16184 blocked for more than 120 seconds.
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.110831] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.115362] php5-fpm        D ffffffff81806240     0 16184  17849 0x00000004
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.115364]  ffff88070499dbb8 0000000000000082 ffff880037452001 00000000000043ff
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.115370]  ffff88070499dfd8 ffff88070499dfd8 ffff88070499dfd8 0000000000013700
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.115376]  ffff88075e6196f0 ffff8807309cc4d0 ffff88070499dc08 ffff88074acc9cd0
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.115381] Call Trace:
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.115385]  [<ffffffff8165badf>] schedule+0x3f/0x60
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.115388]  [<ffffffff8165c8c7>] __mutex_lock_slowpath+0xd7/0x150
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.115391]  [<ffffffff81185a84>] ? path_get+0x44/0x50
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.115394]  [<ffffffff8165c482>] mutex_lock+0x22/0x40
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.115397]  [<ffffffff81186446>] do_lookup+0x1d6/0x310
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.115400]  [<ffffffff811881ac>] path_lookupat+0x11c/0x750
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.115404]  [<ffffffff81188811>] do_path_lookup+0x31/0xc0
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.115407]  [<ffffffff81189319>] user_path_at_empty+0x59/0xa0
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.115410]  [<ffffffff8165d9be>] ? _raw_spin_lock+0xe/0x20
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.115413]  [<ffffffff8109f8f3>] ? futex_wake+0x113/0x130
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.115416]  [<ffffffff81189371>] user_path_at+0x11/0x20
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.115420]  [<ffffffff8117e39a>] vfs_fstatat+0x3a/0x70
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.115423]  [<ffffffff8117e3ee>] vfs_lstat+0x1e/0x20
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.115427]  [<ffffffff8117e58a>] sys_newlstat+0x1a/0x40
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.115431]  [<ffffffff81080b0a>] ? do_sys_times+0xaa/0xc0
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.115434]  [<ffffffff81080b6a>] ? sys_times+0x4a/0x60
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.115437]  [<ffffffff81666002>] system_call_fastpath+0x16/0x1b
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.115440] INFO: task php5-fpm:16213 blocked for more than 120 seconds.
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.118940] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.123356] php5-fpm        D ffffffff81806240     0 16213  17849 0x00000004
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.123358]  ffff880730811bb8 0000000000000086 ffff8804f99e0001 00000000000043ff
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.123363]  ffff880730811fd8 ffff880730811fd8 ffff880730811fd8 0000000000013700
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.123369]  ffff88075e5a96f0 ffff8806d47f44d0 ffff880730811c08 ffff88074acc9cd0
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.123374] Call Trace:
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.123378]  [<ffffffff8165badf>] schedule+0x3f/0x60
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.123381]  [<ffffffff8165c8c7>] __mutex_lock_slowpath+0xd7/0x150
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.123384]  [<ffffffff81185a84>] ? path_get+0x44/0x50
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.123387]  [<ffffffff8165c482>] mutex_lock+0x22/0x40
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.123390]  [<ffffffff81186446>] do_lookup+0x1d6/0x310
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.123393]  [<ffffffff811881ac>] path_lookupat+0x11c/0x750
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.123397]  [<ffffffff81188811>] do_path_lookup+0x31/0xc0
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.123400]  [<ffffffff81189319>] user_path_at_empty+0x59/0xa0
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.123403]  [<ffffffff8113e127>] ? zap_page_range+0xb7/0xd0
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.123406]  [<ffffffff8165d9be>] ? _raw_spin_lock+0xe/0x20
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.123410]  [<ffffffff8109f8f3>] ? futex_wake+0x113/0x130
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.123413]  [<ffffffff81189371>] user_path_at+0x11/0x20
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.123416]  [<ffffffff8117e39a>] vfs_fstatat+0x3a/0x70
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.123420]  [<ffffffff8117e3ee>] vfs_lstat+0x1e/0x20
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.123423]  [<ffffffff8117e58a>] sys_newlstat+0x1a/0x40
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.123426]  [<ffffffff8113a855>] ? sys_madvise+0x225/0x280
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.123429]  [<ffffffff81666002>] system_call_fastpath+0x16/0x1b
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.123434] INFO: task php5-fpm:17429 blocked for more than 120 seconds.
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.127444] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.131732] php5-fpm        D ffffffff81806240     0 17429  17849 0x00000004
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.131735]  ffff88063481dbb8 0000000000000086 ffff880634995001 00000000000043ff
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.131740]  ffff88063481dfd8 ffff88063481dfd8 ffff88063481dfd8 0000000000013700
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.131746]  ffff88075e7796f0 ffff88075d0896f0 ffff88063481dc08 ffff88074acc9cd0
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.131751] Call Trace:
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.131755]  [<ffffffff8165badf>] schedule+0x3f/0x60
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.131758]  [<ffffffff8165c8c7>] __mutex_lock_slowpath+0xd7/0x150
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.131762]  [<ffffffff81185a84>] ? path_get+0x44/0x50
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.131765]  [<ffffffff8165c482>] mutex_lock+0x22/0x40
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.131768]  [<ffffffff81186446>] do_lookup+0x1d6/0x310
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.131771]  [<ffffffff811881ac>] path_lookupat+0x11c/0x750
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.131774]  [<ffffffff810d8ea7>] ? irq_to_desc+0x17/0x20
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.131777]  [<ffffffff810dbd8e>] ? irq_get_irq_data+0xe/0x10
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.131780]  [<ffffffff81188811>] do_path_lookup+0x31/0xc0
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.131784]  [<ffffffff81189319>] user_path_at_empty+0x59/0xa0
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.131787]  [<ffffffff8165d9be>] ? _raw_spin_lock+0xe/0x20
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.131790]  [<ffffffff8109f8f3>] ? futex_wake+0x113/0x130
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.131794]  [<ffffffff81189371>] user_path_at+0x11/0x20
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.131797]  [<ffffffff8117e39a>] vfs_fstatat+0x3a/0x70
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.131800]  [<ffffffff8117e3ee>] vfs_lstat+0x1e/0x20
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.131803]  [<ffffffff8117e58a>] sys_newlstat+0x1a/0x40
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.131807]  [<ffffffff81080b0a>] ? do_sys_times+0xaa/0xc0
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.131810]  [<ffffffff81080b6a>] ? sys_times+0x4a/0x60
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.131814]  [<ffffffff81666002>] system_call_fastpath+0x16/0x1b

How do I ensure that such cases, doesn't create HIGH DISK IO / Server Unstable?

user268083
  • 51
  • 1
  • 1
  • 3

1 Answers1

0

I just ran across this issue on my NVR (Network Video Recorder).

This link: https://www.blackmoreops.com/2014/09/22/linux-kernel-panic-issue-fix-hung_task_timeout_secs-blocked-120-seconds-problem/ indicates it's an out of memory condition:

By default Linux uses up to 40% of the available memory for file system caching. After this mark has been reached the file system flushes all outstanding data to disk causing all following IOs going synchronous. For flushing out this data to disk this there is a time limit of 120 seconds by default. In the case here the IO subsystem is not fast enough to flush the data within 120 seconds. As IO subsystem responds slowly and more requests are served, System Memory gets filled up resulting in the above error, thus serving HTTP requests.

They suggest:

someuser@servercore [/home/someuser]$ sudo sysctl -w vm.dirty_ratio=10
someuser@servercore [/home/someuser]$ sudo sysctl -w vm.dirty_background_ratio=5

On my system (Ubuntu 16.04.5), that's changing the values from dirty_ratio=20, and dirty_background_ratio=10.

These two questions:

Indicate the kernel values mean:

dirty_ratio - Contains, as a percentage of total available memory that contains free pages and reclaimable pages, the number of pages at which a process which is generating disk writes will itself start writing out dirty data.

dirty_background_ratio - Contains, as a percentage of total available memory that contains free pages and reclaimable pages, the number of pages at which the background kernel flusher threads will start writing out dirty data.

I haven't proved this solution, but it seems reasonable. I'll try to remember to update this if the machine stays stable forever. In my case, the machine seems to occasionally enter this state while the machine's doing it's ZFS scrub Sunday evenings

Rob Paisley
  • 296
  • 2
  • 9