I am stuck on this topic for a while now: How can I get more details on where response time burns.
My Problem is the extreme variance in response times. Sometimes it takes the server 5 or 10 seconds or more to respond (esp. for the first call). Firebug marks this time mostly as "waiting". When I check localhost/server-status (where this delay occurs as well), most slots are occupied - but half a second later, they're free, again. I can hardly imagine that there are so many load spikes to explain this behavior.
Another strange thing: There are requests for 100K JPG images that sometimes - according to server-status - take 1, 2, or even 10 seconds to perform (column Req). At the same time, PHP scripts that include some CPU load, are handled in 100 ms or less (well, others also need 1 or 2 seconds). Requests to other (smaller) GIF or PNG images are even listed with a time of 0 ms.
This is where I am stuck: Is there any way to see what takes 10 seconds to send a simple JPG image?
Thanks for your good ideas!
-
System: I am talking about an Apache 2 webserver on Debian Linux (Sequeeze) that mostly delivers PHP scripted pages and images. The server is running on a VPS at a professional Germany server hoster. There is no memory swapping on the server (as far as I can see from the stats) and CPU load is not especially high (uptime reports a value around 3 that can rise to about 32 under extreme load - I think it should be an 8-CPU system). Of course, I can never be sure what the other VPSs on the server do.
Special Settings: Notably the server is sending all data via SSL. I further reduced keep-alive time to 1 seconds, because users typically spend very much time on each page (30-60 sec.) and keeping these connections alive after the image(s) are retrieved would quickly exhaust the server's memory (or the 2 GB I may use on the VPS). Due to larger PHP scripts, a typical thread takes up 20 MB of RAM. Therefore there are only 50 server slots (MaxClient) of which 35 support keep-alive.
Material: I created a test page (https://www.soscisurvey.de/example/?debug&password=demo) that is observed by the server site24x7.com (usually reponds in 1.4 seconds, but regularly there are spikes up to 20 or 30 seconds). To cross-check the results, I sent it to Load Impact es well: http://loadimpact.com/load-test/www.soscisurvey.de-35648bef3b84d3269e1fc7cb11bf1721