1

I don't understand the performance I'm seeing from apache. I would expect that more concurrent apache requests would perform better than fewer, up to a point, but beyond 3 concurrent requests, overall performance is flat. For example, I see the same requests / sec if I've got 3 or 4 concurrent requests. With each additional concurrent request, the avg response time increases so that the overall request handling rate stays the same.

To test this, I created a new ubuntu 10.04 vm on slicehost. This is a 4 core vm. I set it up with

aptitude update
aptitude install apache2 apache2-utils curl
curl localhost/ # verify hello world static page works

Then I benchmarked the response time and reqs / sec.

Edit 4: I benchmarked with something like "for x in $(seq 1 40); do ab -n 10000 -c $x -q localhost/ | grep whatever; done".

The exact commands and data are at https://docs.google.com/spreadsheet/pub?hl=en_US&hl=en_US&key=0AurdDQB5QBe7dGtiLUc1SWdOeWQ4dGo3VDI5Yk8zbWc&output=html

Cpu usage was about 25% on each core while running the tests.

Edit 2: Memory usage was at 45 / 245 MB according to htop.

Edit 1: I just tried the same thing on an ubuntu 11.04 vm and the overall issue is the same, but the performance is even worse: it gets around 2100 reqs / sec for most levels of concurrency and uses about 50% cpu on each core.

Edit 3: I tried it on real hardware and saw a peak reqs / sec around 4 or 5 concurrent requests and then it dropped a little and flattened out.

Can anyone explain why this is happening, how I can figure out what the bottleneck is, or how I can improve this? I've done some searching and haven't found any answers.

Dan Benamy
  • 205
  • 1
  • 9
  • 2
    Do you have anything in you Apache config file to tune performance? Specifically `StartServers`, `SpareServers`, `MaxClients`, etc? – Chris S Dec 28 '11 at 19:50
  • Running the same test on my tiny VPS yields a nice curve that spikes up to 5kr/s at `-c 4` and levels around 5600r/s in the 20-30 concurrency range. My install is actually limited to 5 workers for memory reasons. – Chris S Dec 28 '11 at 19:59
  • How you test will impact performance almost as much as what you are testing. How are you testing performance? – jeffatrackaid Dec 28 '11 at 20:37
  • Just saw your GoogleDocs file. When opening a question, try to put all relevant info in your question and put supplementary information in external links. Please edit your question to include your testing procedures. – jeffatrackaid Dec 28 '11 at 20:38
  • @ChrisS: I used the defaults which are mpm-worker, StartServers 2, MinSpareThreads 25, MaxClients 150. If I understand this right it means that at startup, apache has enough workers sitting around to handle 40 concurrent requests. You're seeing increasing performance until you get to the 20 - 30 concurrent requests range. That's certainly better than capping out at 3 or 4. When it caps out, did you notice if your cpu is maxed out? – Dan Benamy Dec 29 '11 at 18:06
  • @ChrisS Hmm. Now that you told me that your performance jumped up and then slowly edged up further, I took another look at my graph and it kinda looks like that too. It's hard to tell for sure with all the noise but I think it's edging up slightly until around 15 concurrent reqs / sec and then flattening out. Anyway, I'm still dying to know what the bottleneck is. – Dan Benamy Dec 29 '11 at 18:17

3 Answers3

1

I don't understand the performance I'm seeing from apache. I would expect that more concurrent apache requests would perform better than fewer, up to a point, but beyond 3 concurrent requests, overall performance is flat.

It sounds like you're seeing exactly what you said you expected. More concurrent requests causes Apache to perform better, up to a point, and then performance is flat. What it seems has surprised you is that the point occurs with a low number of concurrent requests.

I'm not sure why you find that surprising. There's no real disk I/O, since the page is surely in RAM. So it's purely a CPU bound and network bound activity. Once you have enough requests that you can tie up all cores and fill the network down with one request will another request is going up, there's no reason more connections waiting would make things any better.

So that really only leaves the question of what the limiting factor is. It's hard to tell without more details, but I'd look at the amount of system CPU usage and the network bandwidth. Most likely, either the CPU or the network interface is maxing out.

David Schwartz
  • 31,215
  • 2
  • 53
  • 82
  • Thanks David. That's a good way of phrasing it. Maybe I should revise my question: What's the bottleneck here? There's no disk i/o, the cpu isn't close to maxed out, and I'm running the benchmark over loopback, so network bandwidth isn't a factor. – Dan Benamy Dec 29 '11 at 17:03
  • Are you sure you're measuring the CPU usage accurately? System CPU usage counts too. – David Schwartz Dec 30 '11 at 02:25
  • I'm looking at the numbers at the top of htop. Do you know if that includes everything? Would you recommend a different tool? – Dan Benamy Dec 31 '11 at 03:57
0

You are likely seeing the impact of overhead in the network stack. With increased concurrency, you will have more simultaneous connections open, so the system and apache has to work harder to open and close these connections. This typically degrades Apache performance and result in a longer average time per request at concurrency levels.

I also suspect you had more Apache child processes running at higher concurrency levels. This requires time to spin these up and down.

Network issues can be further complicated if you are running the test on the same system as the web server.

Tuning your TCP/IP stack, KeepAlive settings (if on), and Timeouts could improve this.

However, this is a long known issue with scaling apache.

Here's a classic article on the topic. PDF: http://www.stdlib.net/~colmmacc/Apachecon-EU2005/scaling-apache-handout.pdf

jeffatrackaid
  • 4,112
  • 18
  • 22
  • [Even older systems network stack scale much better than a few thousand per second](http://bulk.fefe.de/scalability/). Apache reuses child workers, it doesn't fork a new one each time, forking has been O(1) since Linux 2.6. KeepAlive doesn't apply to the `ab` test. Timeouts shouldn't matter until the system runs out of ephemeral ports available (which *should* return an error to ab). The article is certainly interesting, but clearly aimed at concurrent connections, not new requests rate. – Chris S Dec 29 '11 at 04:59
  • jeffatrackaid, thanks for the reminder. Colm's paper, while quite old, is still very relevant. tbh, the wiki page I link is based on Sander S. Temme's ApacheCon presentation, which isn't that much younger either ;) – Igor Galić Dec 29 '11 at 19:38
0

Please checkout the (not yet official) Performance documentation in the Apache httpd wiki:

http://wiki.apache.org/httpd/PerformanceScalingUp

A closing word: I don't know what "VM" implies in your case, but it could be a performance bottleneck.

  • Thanks for the link! I read through it and I've already looked at those points or they don't apply to the benchmark. By VM I meant virtual machine, and Slichost is a virtual machine hosting company. The VM could be part of the problem but I also saw a similar phenomenon on real hardware. – Dan Benamy Dec 29 '11 at 17:12
  • Well, essentially it boils down to: Are you CPU, Memory or IO bound. IO splits up in Disk and Network. – Igor Galić Dec 29 '11 at 19:42