2

I'm writing here after weeks spent fighting an issue that cause Apache to stop responding until it is restarted. It happens 3/4 times a day, sometimes after hours, sometimes after some minutes, sometimes after a day. There's non relation (at least there's no evidence) with the number of concurrent connection to the server: it happens both during heavy traffic period (between 8.00am - 18.00pm) and during the night when accesses are very low.

Configuration: VM on Vmware ESXi Rel. 7 - OS: Ubuntu 20.04, Apache 2.4.41, PHP 8.0.15, MSSQL Drivers 17.8.1.1-1. 6 CPU "Xeon(R) Gold 5218", 12Gb Ram. 3 website running in "pure" PHP (no CMS like Wordpress, Drupal, Ruby On Rails etc). Awstats shows that the intranet's one with no external access serve < 10k page day, the others about 200k pages served a day. Most of time CPU usage sits about 1% and memory used about 2Gb. When the issue happens, no CPU/Memory/network "spikes" are detected.

At then moment I installed and configured Monit that every 20 seconds test with curl this minimal PHP webpage:

<?php
echo "ok";
?>

Normally it prints "ok". During the "freeze", even this simple page isn't served; curl ends with timeout error and trigger monit to do a "service apache2 restart". After 2/3 seconds the website come back to normal functionality (till the next freeze).

Follows a list of unsuccessful remediation (not in chronological order):

  • Removed certbot-Letsencrypt and used a Sectigo purchased SSL cerificate
  • Switched Apache from mpm_worker to mpm_event
  • Disabled a bunch of unused Apache's modules
  • Disabled a bunch of unused PHP's modules
  • Disabled most of non critical cron jobs (even there's no evidence that the freeze happens during cron jobs execution).
  • Changed virtual network adapter from VMXNET3 to E1000
  • Enabled verbose logging: no useful information/errors are recorded, simply there's a 25-30 sec time gap from the last page served just before the hang a the first served when the restart complete.
  • Enabled for some days mod_log_forensic: no (!) errors are reported using check_forensic utility
  • Double checked the few Rewrite rules in .conf and in .htaccess
  • Changed Apache's configuration; relevant values are:
    StartServers 10
    MinSpareThreads 40
    MaxSpareThreads 120
    ThreadLimit 100
    ThreadsPerChild 75
    MaxRequestWorkers 450
    MaxConnectionsPerChild 1000

There's no evident correlation between the "last" page/file served before the issue: sometimes is a PHP page (obviously not the same) sometimes a png/jpeg image. Reading logs I cannot find abnormal/malformed/excessive client's requests.

The issue is 99,99% Apache related, the PHP-fpm service works perfectly and is not necessary to restart it after a freeze. All other server's running services are not affetced.

Before writing here, I read tons of webpage but I didn't found any useful (for me) hint.

Thanks in adv

Ciao

JYD

JYD
  • 21
  • 1
  • When Apache hangs check process status with `ps`. Check Apache `mod_status`. Use `strace` to find out what the processes are doing. – AlexD Feb 04 '22 at 10:22
  • Maybe the number of httpd threads is influencing this? As it is virtual machine maybe it is running on hypervisor with ram memory balooning? – kakaz Feb 04 '22 at 11:36
  • @AlexD I adden a strace to a file and I'll post here the results – JYD Feb 08 '22 at 14:41
  • @kazak No "balloned" memory, ESX monitor shows always 0 KB. All 12Gb are reserved to this VM – JYD Feb 08 '22 at 14:46
  • Finally I got it!!!! – JYD Mar 16 '22 at 13:04
  • The problem was the filesystem's daemon "incron" missconfigured and with its log disabled. In its configuration file, one of the event watched had a wrong escaped command. When I enable incron's log file, the .log starts grows hundreds line/sec and it quick reaches dozens MB size. This strange behaviour was caused by a wrong escaping char in its conf file: in a line there was a "$\" instead a "\$" making a very upredicatble race condition. Fixed it, the apache's freeze gone. – JYD Mar 16 '22 at 13:23

0 Answers0