2

I am using a vServer that is suddenly experiencing very high wait times (10/20/30 seconds) or even timeouts on basic requests since yesterday after being in use for over a year without any problems. This is my configuration:

  • 8 CPU vCores, 32 GB memory, 800 GB SSD
  • Standard Plesk Obsidian with latest updates

The server runs a couple of websites with PHP and MariaDB via Apache, nothing too fancy, not a huge amount of in-going or out-going traffic, not too much processing on the server. While the average load on this vServer has usually been between 1 and 3 now it is suddenly 20-100... once I start either the Apache or MariaDB service.

Via htop I can see:

  • up to 30 processes in the "D" state (Uninterruptible Sleep)
  • very low CPU use (<5% or even 0% on most cores)
  • plenty of free memory available (disk space is available as well)
  • no unknown/unusual processes (mostly Plesk-related, MariaDB and Apache)

Via iotop I can see:

  • very limited disk activity, both read/write are 0 or close to 0 most of the time

And vmstat 1 5 gives me the following output

procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs US SY ID WA st
 1 13      0 1055820      0 25834552    2    2   128    59    0   93  8  1 90  0  0
 2 14      0 975180      0 25870484    0    0    16     0    0  586 31  4 65  0  0
 1 16      0 910584      0 25873184    0    0   100    48    0  374  7  2 92  0  0
 0 16      0 920048      0 25883484    0    0    16    64    0  415  8  1 91  0  0
 0 15      0 954344      0 25883472    0    0    96  1432    0  383  1  0 99  0  0

So it looks like something is blocking these processes from being executed until every minute or so they are executed. I can then load a few pages on one of the websites, theses processes don't show up in htop anymore but a few more clicks and suddenly the same situation...

Interactions with this vServer via e.g. SFTP or SSH are also considerably slower than before due to the high average load. I have checked the health of the MariaDB databases already and couldn't find any problems and the load issue also happens when the MariaDB service isn't running.

My questions:

  • What can I do or use to find the specific reason why these processes cannot be executed / what is blocking them?
  • Is it possible that either the memory or disk has a problem? Should I run fsck (this would require taking the server offline)?

Anything to document e.g. hardware-related problems would be really helpful. I have checked other posts about a high load average but couldn't find a solution for my problem.

UPDATE

I've noticed that both buffer and swpd above are always 0. Here is the output of cat /proc/meminfo, could this be a/the reason?

MemTotal:       33554432 kB
MemFree:          639036 kB
MemAvailable:   25227064 kB
Cached:         24259912 kB
Buffers:               0 kB
Active:         19847944 kB
Inactive:       12315884 kB
Active(anon):    7664604 kB
Inactive(anon):   572316 kB
Active(file):   12183340 kB
Inactive(file): 11743568 kB
Unevictable:       11228 kB
Mlocked:           28388 kB
SwapTotal:             0 kB
SwapFree:              0 kB
Dirty:           8485104 kB
Writeback:             8 kB
AnonPages:       8236920 kB
Shmem:            328712 kB
Slab:             683440 kB
SReclaimable:     661120 kB
SUnreclaim:        22320 kB

Output for iostat -d (but this shows "40 CPU", so probably the whole server?):

Device       tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
somename     13805,99     41065,30   2557318,07 2443426623 152162982648

UPDATE

Sample of blocked processes:

Sample of blocked processes

UPDATE

Ongoing discussions with other customers of this hoster indicate widespread problems with the virtualization platform used for these vServers (e.g. resources not being made available to the vServers). Some customers having had problems for days now. I'll update once more information is available.

UPDATE

Here is a news report in German about the ongoing problems with this hoster: Lang anhaltende Störung bei Stratos V-Servern

FINAL UPDATE

The problem with this vServer seems to have been resolved now by the hoster after a week, even almost 2 weeks for some customers with other vServers. Main reason: communication issues between switches leading to delays with io operations. Details can be found here: Strato: Massive V-Server-Störung bald behoben

JamesApril
  • 21
  • 2
  • 1
    If there would be a HW problem, you would find something in the logs. A lock or wait is more likely. A recent change in the database? Does the application need anything remote, like a web request? Concerning the iostat, run "iostat -d 5 5". – Gerard H. Pille May 21 '20 at 16:41
  • One of the websites requests data from external API's, yes. I log every request including response time so a slow external API was my initial guess. But even when I disable such requests and restart the server and then Apache/MariaDB service I immediately have the load issue. – JamesApril May 21 '20 at 16:45
  • Try "echo w > /proc/sysrq-trigger" and see your logs (eg. /var/log/kern.log). cfr. https://support.microfocus.com/kb/doc.php?id=7002725 – Gerard H. Pille May 21 '20 at 17:53
  • "between 1 and 3 now it is suddenly 20-100" What are these figures? – Gerard H. Pille May 21 '20 at 17:55
  • What if there is no hardware problem, but another vServer is using too much resources? – Gerard H. Pille May 21 '20 at 18:08
  • Your vmstat shows +90 idle, 0 wait. – Gerard H. Pille May 21 '20 at 18:14
  • "What are these figures?" That's the output of htop for the load average. "What if there is no hardware problem, but another vServer is using too much resources?" I guess that could be possible, but probably could only be checked by the hoster? – JamesApril May 21 '20 at 18:22
  • So, which processes are blocked? – Gerard H. Pille May 21 '20 at 18:38
  • It could be basically anything that's supposed to run. I've added an image in the post with a few of these processes. – JamesApril May 21 '20 at 19:01
  • How many Apache processes are there? Nothing useful in /var/log (messages, kern.log, apache2/error.log, mysql, php) ? – Gerard H. Pille May 21 '20 at 20:05
  • Please show the output of `iostat -x -k 1` when issue happens. – shodanshok May 22 '20 at 17:37

0 Answers0