1

I am stuck on this topic for a while now: How can I get more details on where response time burns.

My Problem is the extreme variance in response times. Sometimes it takes the server 5 or 10 seconds or more to respond (esp. for the first call). Firebug marks this time mostly as "waiting". When I check localhost/server-status (where this delay occurs as well), most slots are occupied - but half a second later, they're free, again. I can hardly imagine that there are so many load spikes to explain this behavior.

Another strange thing: There are requests for 100K JPG images that sometimes - according to server-status - take 1, 2, or even 10 seconds to perform (column Req). At the same time, PHP scripts that include some CPU load, are handled in 100 ms or less (well, others also need 1 or 2 seconds). Requests to other (smaller) GIF or PNG images are even listed with a time of 0 ms.

This is where I am stuck: Is there any way to see what takes 10 seconds to send a simple JPG image?

Thanks for your good ideas!

-

System: I am talking about an Apache 2 webserver on Debian Linux (Sequeeze) that mostly delivers PHP scripted pages and images. The server is running on a VPS at a professional Germany server hoster. There is no memory swapping on the server (as far as I can see from the stats) and CPU load is not especially high (uptime reports a value around 3 that can rise to about 32 under extreme load - I think it should be an 8-CPU system). Of course, I can never be sure what the other VPSs on the server do.

Special Settings: Notably the server is sending all data via SSL. I further reduced keep-alive time to 1 seconds, because users typically spend very much time on each page (30-60 sec.) and keeping these connections alive after the image(s) are retrieved would quickly exhaust the server's memory (or the 2 GB I may use on the VPS). Due to larger PHP scripts, a typical thread takes up 20 MB of RAM. Therefore there are only 50 server slots (MaxClient) of which 35 support keep-alive.

Material: I created a test page (https://www.soscisurvey.de/example/?debug&password=demo) that is observed by the server site24x7.com (usually reponds in 1.4 seconds, but regularly there are spikes up to 20 or 30 seconds). To cross-check the results, I sent it to Load Impact es well: http://loadimpact.com/load-test/www.soscisurvey.de-35648bef3b84d3269e1fc7cb11bf1721

BurninLeo
  • 860
  • 2
  • 11
  • 28
  • 1
    This sounds like a disk latency issue. – brent Apr 25 '12 at 18:02
  • Hmm - I did not think about that, yet. Interesting. Any idea how I could test on this on a VPS system? – BurninLeo Apr 25 '12 at 18:40
  • It's hard to say since the problem is spurious. As you're running on a VPS, you don't know what the underlying hardware is. You could talk to your provider to get statistics (if they feel like sharing.) You could try to check the system responsiveness when there's an issue viewing an image. If you have a problem doing anything filesystem related like an `ls`, it's related to the disk. – brent Apr 25 '12 at 18:50
  • I have not recognized a delay when using ''ls'' yet, but actually I have never looked out for it. Thanks for the idea! – BurninLeo Apr 25 '12 at 20:48
  • Seems you hit the point! Listing a directory with 20 files (which usually takes <20ms) just took 1, 2, or 5 seconds (randomly). I would never have thought about that... – BurninLeo Apr 26 '12 at 07:56
  • It does not yet solve the problem ... but enabling the Apache mod mod_mem_cache with 10 MB of memory cache for the images could reduce the response times to about one quarter. – BurninLeo Apr 26 '12 at 09:28

2 Answers2

1

This is where I am stuck: Is there any way to see what takes 10 seconds to send a simple JPG image?

The TamperData plugin for Firefox will show you explicitly what you're downloading from the server and how long each item is taking:

https://addons.mozilla.org/en-US/firefox/addon/tamper-data/

However, you may also have some other issues resloving DNS if it's taking 10 seconds to dowload.

You may also want to check into apachetop. Install it on your Apache webserver. I have it installed on mine and check it from time to time. It will show you the pages with the highest load:

http://www.howtogeek.com/howto/ubuntu/monitor-your-website-in-real-time-with-apachetop/

Jason Huntley
  • 1,253
  • 3
  • 10
  • 22
  • Hello! Thank you for the ideas. Actually Firebug already shows each file and the time components individually - and the main share of the loasing time is "waiting" for the server for the HTML document. Regarding apachetop: Thanks for the hint - I did not try this one, yet! – BurninLeo Apr 25 '12 at 18:28
  • Update: Apachetop is great - thanks! However, it won't help me to solve the basic problem: Why does response time (sometimes for the same file) vary so much and why does delivery of a single 100K JPG take so long? Should'nt be a DNS issue, if it could also be the 5th request to the same file that takes so long, right? – BurninLeo Apr 25 '12 at 18:38
  • Do you have access to the firewall? Are there any QOS rules set on your firewall. Also, do you have compression turned on? Is the delay specific to a single JPG or all? – Jason Huntley Apr 25 '12 at 19:06
  • I have a minimal iptables configuration (IP addresses are automatically locked by fail2ban if testing to many invalid URLs), but nothing about QOS. Thanks for the hint in checking different JPGs: There are JPGs from one directory (50-100 K) that consistently take 250ms and more while others (5-10 K) only require 1 ms to sent. Could it be that the small files are simply cached in memory? – BurninLeo Apr 25 '12 at 20:45
  • I take it all the images are stored on disk and not external url references? Are the php scripts references images on the local filesystem? – Jason Huntley Apr 25 '12 at 20:53
  • Yes, images and PHP scripts are both stored on the file system. However I use APC which caches the bytecode-translations of the PHP files (I guess in memory). – BurninLeo Apr 25 '12 at 21:18
  • hmm, i'm not very familiar with APC. Is this an apache mod? Just curious, what apache mods do you have enabled? – Jason Huntley Apr 25 '12 at 21:28
  • Hi! APC is just a cache for the bytecode that saves plenty of time if the same scripts are used again and again. Will probably become part of PHP in the next releases. – BurninLeo Apr 26 '12 at 06:50
  • About th modules: core, log_config, logio, mpm_prefork, http, so, alias, auth_basic, authn_file, authz_default, authz_groupfile, authz_host, authz_user, cgi, deflate, dir, env, include, mime, negotiation, php5, python, reqtimeout, rewrite, setenvif, ssl, status, suexec -- it seems I should check for unrequired modules... – BurninLeo Apr 26 '12 at 07:08
1

Adding this as an answer rather than just the comment since this is what it turned out to be

The issue sounded like a disk latency issue. There were some reasons I thought of this being the problem

  • Response times varied wildly with no warning signs from the standard load indicators.
  • Hosted on a VPS which are frequently oversold, and backed by NAS/SAN disks
  • Other attempts to squash the problem were fruitless

As you are not in control of the hardware, you have limited ways to solve this problem. You can contact the provider to have them try to fix it, use a RAM backed filesystem or in-memory cache (which you experimented with), or switch providers.

brent
  • 3,481
  • 3
  • 25
  • 37