27

this is a standard apache web server on AWS Linux AMI + EBS. We are noticing high load average (+8) and iotop -a shows:

Total DISK READ: 0.00 B/s | Total DISK WRITE: 2.37 M/s

  TID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN     IO>    COMMAND             
 3730 be/4 root          0.00 B      0.00 B  0.00 % 91.98 % [kworker/u8:1]
  774 be/3 root          0.00 B   1636.00 K  0.00 % 15.77 % [jbd2/xvda1-8]
 3215 be/4 apache        0.00 B     40.39 M  0.00 %  0.88 % httpd
 3270 be/4 apache        0.00 B     38.20 M  0.00 %  0.93 % httpd
 2770 be/4 apache        0.00 B     46.86 M  0.00 %  0.71 % httpd

When apache is down, kworker and jbd2 is also down.

Server is not swapping as we have plenty of RAM available. I've seen this issue related to Database servers, but nothing only isolated to Apache.

Any idea on how to diagnose this further and prevent it?

UPDATE 1: perf report (perf record -g -a sleep 10)

Samples: 114K of event 'cpu-clock', Event count (approx.): 28728500000
-  83.58%          swapper  [kernel.kallsyms]         [k] xen_hypercall_sched_op                                          ◆
   + xen_hypercall_sched_op                                                                                               ▒
   + default_idle                                                                                                         ▒
   + arch_cpu_idle                                                                                                        ▒
   - cpu_startup_entry                                                                                                    ▒
        70.16% cpu_bringup_and_idle                                                                                       ▒
      - 29.84% rest_init                                                                                                  ▒
           start_kernel                                                                                                   ▒
           x86_64_start_reservations                                                                                      ▒
           xen_start_kernel                                                                                               ▒
+   1.73%            httpd  [kernel.kallsyms]         [k] __d_lookup_rcu                                                  ▒
+   1.08%            httpd  [kernel.kallsyms]         [k] xen_hypercall_xen_version                                       ▒
+   0.38%            httpd  [vdso]                    [.] 0x0000000000000d7c                                              ▒
+   0.36%            httpd  libphp5.so                [.] zend_hash_find                                                  ▒
+   0.33%            httpd  libphp5.so                [.] _zend_hash_add_or_update                                        ▒
+   0.25%            httpd  libc-2.17.so              [.] __memcpy_ssse3                                                  ▒
+   0.24%            httpd  libphp5.so                [.] _zval_ptr_dtor                                                  ▒
+   0.24%            httpd  [kernel.kallsyms]         [k] __audit_syscall_entry                                           ▒
+   0.22%            httpd  [kernel.kallsyms]         [k] pvclock_clocksource_read                                        ▒
user2383712
  • 371
  • 1
  • 3
  • 4
  • 3
    You may want to [use perf to find out what kworker is doing](http://askubuntu.com/a/422151) as a troubleshooting step. – David Schwartz Jan 13 '15 at 18:40
  • kworker's behaviour is technically interesting, but I wonder why Apache threads are writing megabytes to the disk. Assuming that explains the 2MB/s, isn't that high for a web server? Then one could identify the files being written, e.g. `strace -p` (and maybe lsof) and see if that shows anything interesting. – sourcejedi Jan 13 '15 at 19:18
  • Thanks, @DavidSchwartz. perf warns about high IO. I've added the report to the question. – user2383712 Jan 14 '15 at 12:00
  • 1
    Is it swapping by any chance? – Grizly Jan 14 '15 at 22:22
  • No @Grizly. We have 10GB of free RAM.... – user2383712 Jan 15 '15 at 12:45
  • 1
    Try to enable `sendfile` on apache to take advantage of zero copy. – fgbreel Jun 01 '15 at 16:19
  • 1
    @user2383712 This issue maybe related your cloud "neighbor" can you contact aws about this issue, if not try to shutdown you aws instance to change it's hypervisor, i had this problem in the past. – Alin Andrei Apr 11 '16 at 06:26
  • I suddenly have this problem on my home-built server. It's ruining my database performance. Ugh! – sudo Apr 14 '16 at 04:43
  • Details about instance type, AMI version (or kernel id) would help. – Mike Fiedler May 10 '16 at 13:02

2 Answers2

7

100% IO doesn't mean it's using all your IO operations. It means it's doing nothing but waiting on IO. Therefore, high %IO with low/zero disk bandwidth can be normal.

man iotop:

[...] It also displays the percentage of time the thread/process spent while swapping in and while waiting on I/O.

It may be a different issue if your kworker is waiting on IO forever, but I don't know. Maybe it's supposed to be waiting on a pipe or something. I see kworker doing the same on my server sometimes, and it doesn't seem to be a problem. (I also panicked the first time I saw it.)

sudo
  • 265
  • 3
  • 10
  • 2
    This is also in a shared environment, where they all access the same storage arrays. This is a sign of a busy disk (of which the VM may not know anything about because it's effectively isolated). On dedicated hardware, it would be more likely to be a failing disk with lots of retry. On network mounted access, it can mean a bad link as well as NAS / target side congestion. – Spooler Sep 25 '16 at 10:25
  • This also happens when writing to slower storage in general, I notice kworker freaking out in iotop when writing an ISO or IMG file to an SD Card for example. Sometimes it's the SD card reader, other times it's the card itself. I once thought that it couldn't be the reader, because the system should know the maximum capacity of that reader, but it doesn't seem to matter... – Tmanok Mar 03 '21 at 19:53
0

Had the problem of the drive being written to every 5 secs I used the command below and found that it was x2goserver keeps running every 5 sec and trigger kworker. Note that google chrome will write to the drive if its open.

sudo apt remove x2goserver

sudo pidstat -dvl 5

04:52:35 PM   UID       PID   kB_rd/s   kB_wr/s kB_ccwr/s iodelay  Command

04:52:40 PM     0      2318    539.42      0.01      0.00       0  /usr/bin/perl /usr/sbin/x2gocleansessions
04:52:40 PM     0   1920632      0.00    251.20      0.00       0  kworker/u64:3-events_unbound



04:52:40 PM   UID       PID   kB_rd/s   kB_wr/s kB_ccwr/s iodelay  Command
04:52:45 PM     0      2318    809.12      0.01      0.00       0  /usr/bin/perl /usr/sbin/x2gocleansessions



04:52:45 PM   UID       PID   kB_rd/s   kB_wr/s kB_ccwr/s iodelay  Command
04:52:50 PM     0      2318    539.42      0.01      0.00       0  /usr/bin/perl /usr/sbin/x2gocleansessions