35

We have a fairly heavily loaded server running nginx and PHP-FPM. We have 6 websites on this server, running PHP-FPM and nginx. Software is all vBulletin 3.8 and WordPress. Databases are on a separate server.

Now, because these are highly popular websites, we normally have 7-8,000 visitors online at one time, with each page hitting the database for the most part. I believe this is the cause of our problems.

Because we have so many large databases on the MySQL server, and because the queries could, honestly, be a lot better in the software, I think MySQL will occasionally fail to return results to PHP in a timely manner, creating a cascade effect that eventually causes everything to just stop until we reload PHP-FPM. After we do that, things begin working fine again.

The reason I'm having problems troubleshooting this is because I can't really discern anything from the logs. In the MySQL slow query log, I see nothing of interest when downtime occurs. In the nginx logs, I see thousands of entries saying that the read request timed out or the connection timed out (To PHP-FPM). And in the PHP-FPM logs, I see a lot of lines that says "execution timed out (31 sec), terminating

So at this point I just completely don't know where to look for the problem. Obviously, whatever is happening is happening because these scripts aren't executing quickly enough sometimes (Normally they load in under a second, but something happens that causes the load time to skyrocket). This happens many times a day and has become quite an issue for us.

For now I simply have a crontab to service php5-fpm reload every 10 minutes, which takes care of the crashing problem. Of course, when PHP reloads, nginx throws a 502 gateway error, so it's not much of a solution.

PHP is running APC cache, if that matters. I've read in a few spots that APC can cause hanging under certain circumstances.

Any pointers would be helpful. I'd really like to not have to worry about this machine all the time.

More info can be provided of course. Just let me know what you need.

Update: I just copied over apc.php to a web root and accessed it to look at our stats. Things looked good. Then I clicked the link to go to User stats and BOOM the server instantly hung. I reloaded php-fpm and then reloaded the user stats page and it went through fine. Waited a minute, reloaded again, server hung again.

So this definitely seems to be APC related. The question is - How do we fix it?

APC Config:

[apc]
apc.enabled="1"
apc.stat = "1"
apc.max_file_size = "2M"
apc.localcache = "1"
apc.localcache.size = "256"
apc.shm_segments = "1"
apc.ttl = "3600"
apc.user_ttl = "7200"
apc.gc_ttl = "3600"
apc.cache_by_default = "1"
apc.filters = ""
apc.write_lock = "1"
apc.num_files_hint= "10000"
apc.user_entries_hint="10000"
apc.shm_size = "1G"
apc.mmap_file_mask=/tmp/apc.XXXXXX
apc.include_once_override = "0"
apc.file_update_protection="2"
apc.canonicalize = "1"
apc.report_autofilter="0"
apc.stat_ctime="0"

Update 2: We've made some progress on this here. It turns out that the WordPress caching plugin (W3 Total Cache) is what was causing the crashes. We still don't know why, but with it disabled, we've been running PHP for nearly 4 hours now with no reloads, no slowdowns, no crashes. We're still using APC on the vBulletin forums and no issues there at all. Is there any way we can determine WHY APC is crashing? I'd love to use it on our WordPress installations, but not at the cost of a fragile system.

BenMorel
  • 4,215
  • 10
  • 53
  • 81
Kevin
  • 767
  • 3
  • 12
  • 23
  • Can you post any APC related settings you have? – Kyle Feb 13 '14 at 22:10
  • Yeah, good idea. Done. – Kevin Feb 13 '14 at 22:14
  • How much ram and swap do you have on this machine? How much is used when it starts to die? – Kyle Feb 13 '14 at 22:21
  • 16 GB Ram, 16 GB Swap. Under normal load, there is 13GB free in RAM and swap unused. Similar statistics when things freeze up. When the freeze happens, the server is still up and functioning fine. Load is down, etc...Even nginx is loading static files. Just PHP is frozen and requires a reload. – Kevin Feb 13 '14 at 22:24
  • 2
    APC is a horribly buggy nightmare, and was the sole source of crashes like this on one of my web sites for _years_. I finally got rid of it entirely; and PHP is now solid. If you want caching, try Zend Opcache, which is also the default cache from PHP 5.5. – Michael Hampton Feb 16 '14 at 13:34
  • I have the similar issue, did you find the solution? – Kostanos Dec 03 '14 at 23:06
  • 1
    Yes, it ended up being APC which was crashing PHP. When we disabled APC, we stopped having to restart PHP constantly. – Kevin Dec 04 '14 at 04:42
  • Use the Opcache built into PHP, as Michael said, performance improvements are huge - 250% in some cases. You could try PHP 7.0, which is significantly faster than PHP 5.6. I would previously have suggested trying HHVM but I find it unreliable even at low load, but it works for Facebook which is massive load. – Tim Apr 20 '16 at 05:43
  • Do not use crontabs.. use monit for restart – Ravi Soni Jun 21 '19 at 18:00

1 Answers1

35

You're using php-fpm, so I suggest to be more aggressive with how long php-fpm's children are allowed to live. You need to find the sweet spot between shortly lived threads/children and stability. The php-fpm defaults are way to generous for any production system, IMHO.

I'd reduce the number for pm.max_requests for your production pools. I think the default is 200. I'd start from 50 and see where that takes you.

Failing/complementary to that, you could also try these global options (AFAIK they are all disabled by default):

emergency_restart_threshold=3
emergency_restart_interval=1m
process_control_timeout=5s

What does this mean? If 3 PHP-FPM child processes exit with SIGSEGV or SIGBUS (i.e. crash) within 1 minute then PHP-FPM is supposed to restart automatically. The child processes wait 5s for a reaction on signals from master.

This should keep your pool of PHP worker threads nice, fresh and clean. The longer a worker is allowed to furnish requests, more unstable it will get. There's also a higher risk of memory leaks.

Here's a nice overview of all the config options I mentioned here, as well as others: http://myjeeva.com/php-fpm-configuration-101.html

Hope these tips help you! Remember to tweak and observe, unfortunately there doesn't seem to be a rule of thumb for all this, there are too many variables that affect PHP's behaviour and stability.

Rouben
  • 1,272
  • 10
  • 15
  • 1
    What's your opinion on just using cron to restart php5-fpm every hour? – CMCDragonkai Sep 18 '15 at 16:38
  • 2
    That's a rather kludgy way of doing it, and it may not work at all. PHP-FPM has a number of tweaks built-in, so it's better to use that tweakability. – Rouben Sep 19 '15 at 03:14
  • 1
    This answer pointed me in the right direction. I saw a similar issue like this myself, the solution for me was to change `pm` from `dynamic`to `ondemand` and all seems to be working grand now with all other default values. – llanato Jul 01 '16 at 17:37
  • (in php-fpm.conf) it should be '=' instead of ' ' separating the key and value. emergency_restart_threshold = 3 emergency_restart_interval = 1m process_control_timeout = 5s – justyy Jan 16 '17 at 13:42
  • 2
    I'm getting `ERROR: [/etc/php/7.0/fpm/pool.d/www.conf:135] unknown entry 'emergency_restart_threshold'` – deweydb Mar 21 '17 at 21:00
  • @deweydb are you using = signs (proper syntax)? I messed up the syntax originally and fixed it just now. – Rouben Mar 21 '17 at 22:19
  • @Rouben, yes i am. maybe this 'emergency_restart_threshold' parameter is for older versions of php-fpm? i'm on latest with ubuntu 16.04 from default repo's – deweydb Mar 22 '17 at 23:27
  • @deweydb Did you try to add this in the php-fpm.conf? Maybe it doesn't work in the www.conf. – ptf Jun 01 '17 at 09:53
  • 1
    @deweydb you have to set these settings in the [global] part of the config file and not in the pool part – Julien Dec 21 '18 at 11:37
  • when above fails to keep php-fpm up I've added `RemainAfterExit=no Restart=on-failure RestartSec=5s` to systemd service unit – DKebler Mar 05 '19 at 21:54
  • FYI, `pm.` dynamic/ondemand settings in `./pool.d/www.conf` whereas the others in post are global and are in `php-fpm.conf` all under `/etc/php/7.x/fpm` at least on ubuntu. You'll need to stop service to edit files – DKebler Mar 05 '19 at 22:03