2

We run a learning platform for primary schools here in the UK and it's all been running extremely well. However at around 4PM Monday to Friday we see the same issue arise -- 1-2 PHP threads will spike to 100% CPU and gradually start eating up RAM until the server(s) fall over.

98%+ of our requests are HTTPS, these come into our Layer 7 load balancer which then decrypts the SSL data, adds the X-HTTP-Forwarded-For header and forwards the data onto an application server (we have 2 of those at the moment) on port 80. Our application servers have Varnish on port 80 which takes in the request from the load balancer and passes the request through to Nginx on port 81. Nginx then works out which 'vhost' it needs to use and passes any PHP processing through to PHP-CGI which is listening on a socket (managed through spawn-fcgi). There's an instance of Memcached running too, MySQL runs on a separate server / slave setup.

Throughout the day the load will typically go no higher than 0.8 on either of the application servers, however at around 4PM our problem arises. I've managed to run strace on a few of the actual threads when they cause the problem and I always see the same thing:

stat("/usr/share/zoneinfo/Europe/London", {st_mode=S_IFREG|0644,st_size=3661, ...}) = 0
stat("/usr/share/zoneinfo/Europe/London", {st_mode=S_IFREG|0644,st_size=3661, ...}) = 0

This is repeated infinitely and never stops until you SEGKILL the process or oomkiller kills it. There are no cron jobs scheduled to run at that time and I don't have any way of seeing exactly what Nginx request is associated with the PHP process which is running.

We are running PHP 5.3.14 which we upgraded to from 5.3.8 last week to rule out the older version being the problem. This issue has been going on a few months now and we have no idea what is causing it. We deploy our software very frequently, so it's difficult to track down a specific release which may have started the problem - especially as we do not know the date of the first occurrence of this issue. Varnish is version 3.0.1, Nginx is 1.0.6 (which I understand is about a year old now), our servers are running CentOS release 5.7 (Final) they have Intel i3 540s at 3.07Ghz and 8GB of RAM.

There's a discussion on the Debian mailing list about something very similar, you can find that here.

Has anyone seen anything like this in the past, does anyone have any ideas or suggestions? Are there a way of linking an Nginx request directly to a PHP thread? Is there a better way of seeing what the PHP process is doing? (I've seen GDB mentioned, though I'll have to recompile PHP)

Thanks!

  • Does it happen every single day mon-fri? Does it happen at exactly same time (around 4PM is a bit vague) - is it +/- 30 minutes or exact same minute every day? Have you checked MySQL server load during app server load spike? – c2h5oh Jun 27 '12 at 20:55
  • @c2h5oh: Yes, it's every day Monday to Friday. I do believe it happens on occasion at the weekend too - but seemingly not as frequently as on weekdays. It's not usually at one specific time, one day it might be at 16:07 then the next day it's at 16:45 and the next day some other time. I've seen it happen perhaps once or twice around 8AM too, but between 4PM and 5PM it happens consistently - so much so that we can log onto the servers at 4 and kill the processes as soon as we see it start to occur. MySQL load never goes above 0.75 when these incidents occur. – Daniel Samuels Jun 28 '12 at 07:46

2 Answers2

1

I found out what the issue was, it was Internet Explorer. There was a bad reference to a .htc file in our CSS which, for some reason, was being sent to PHP to process. PHP didn't know what to do with a .htc file and just ended up going crazy and consuming all of the available resources on the server.

0

With extra info from comment I think we can safely assume the problem occurs during load spike - number of online users daily peak(s). No fixed exact time, sometimes happens at other times and other days effectively rule out stuff like cron job hogging resources.

It might sound crazy, but start by increasing MySQL max connections limit - I've seen strange stuff happening to PHP running as FCGI when connection limit was exceeded, not unlike the problem you are experiencing.

c2h5oh
  • 1,489
  • 10
  • 13
  • Our MySQL max connections is currently set to 151, watching the `SHOW PROCESSLIST;` in the MySQL console I see perhaps 1-2 queries being made at a time at the most -- a lot of our data is stored in Memcached and so we don't hit the database very frequently. I just don't understand what the timezone stats are and why they are looping infinitely, I don't know what causes them to happen, that's the main thing I would like to try and work out. I have added some code to my main PHP routing file to try to associate a PID with a request, so I'll see if that helps today. – Daniel Samuels Jun 28 '12 at 09:26