1

We have a LAMP setup that had been working pretty well for half a year when the Apache server (MySQL servers are not on this box) just started to die. It seems to have started to spawn more and more processes over time. Eventually it will consume all the memory and the server would just die. We are using prefork.

In the meantime, we just keep adding more RAM and increasing the MaxClients and ServerLimit parameter to 512. But we're just prolonging the crash. The number still goes up slowly. Maybe in a day, it would reach that limit.

What is going on? We only have around 15-20 requests per second. We have 1 GB in memory and it's not half used. There's no swapping going on.

Why is Apache creating more and more processes? It's almost like there's a leak somewhere!

The database boxes are fine, they are not causing a delay to requests. We tested some queries everything is quick!

random
  • 450
  • 1
  • 9
  • 16
erotsppa
  • 2,033
  • 6
  • 23
  • 24

3 Answers3

7

[For the benefit of others stumbling across this older question ... ]

Quick Answer:

Check your KeepAlive settings in your apache2.conf or httpd.conf file. Set your KeepAliveTimeout between 2-5 seconds.

Details:

I've found that by default Apache's KeepAlive is on and the KeepAliveTimeout is set to 15 seconds. That will mean that a single user's page hit will cause the server to wait for that same user to request another page/resource for 15 seconds, before it gives up and handles someone else's request.

This setup is VERY useful when a user requests the initial index.html file, and then will request the linked CSS, javascript, and image files a second or two later. However, modern computers and network/internet connections mean that a browser is typically asking for the linked resources in less than 2 seconds. Apache will serve those subsequent pages and then wait another 15 seconds in case that user wants something else. This is highly inefficient in a high traffic environment.

If you're receiving 15 unique connections per second, and each connection stays alive for 15 seconds... I'm sure you can see how things are going to get bunched up pretty severely pretty quick. You'll have 225 Apache processes spun-up, with 90+% of them completely idle, waiting for another page request on it's open connection.

I've seen a number of suggestions to set your KeepAliveTimeout to somewhere between 2 and 5 seconds. Myself, I've got some servers set as low as 2 and others as high as 5. I don't receive the same system slow downs when I get traffic spikes anymore.

Philbert
  • 71
  • 1
  • 1
2

In your httpd.conf file, you'll likely have a section commented out that looks similar to:

<IfModule mod_status.c>
        <Location "/server-status">
                SetHandler server-status
                Order deny,allow
                Deny from all
                Allow from 127.0.0.1
        </Location>
        ExtendedStatus On
</IfModule>

In looking at one of my servers that's had a problem w/ the load getting too high, I can see a similar problem ... the lines of 'SS' should never get that high:

Srv   PID    Acc       M  CPU   SS       ...  Request

0-0   22830  1/9/3640  K  2.36  7        ...  GET /[].css HTTP/1.1
1-0   79114  0/0/858   W  0.00  121462   ...  POST /cgi/[] HTTP/1.1
2-0   22856  0/1/3211  W  0.00  20       ...  POST /cgi/[] HTTP/1.1
3-0   22890  0/0/2697  W  0.00  0        ...  GET /server-status HTTP/1.0
4-0   79105  0/5/525   W  0.34  121463   ...  POST /cgi/[] HTTP/1.1
5-0   22892  1/1/764   K  0.00  6        ...  GET /[].js HTTP/1.1
6-0   22893  1/1/449   K  0.00  5        ...  GET /[].js HTTP/1.1
7-0   22894  1/1/57    K  0.00  5        ...  GET /[].js HTTP/1.1
8-0   22895  1/1/426   K  0.00  4        ...  GET /[].js HTTP/1.1
9-0   -      0/0/40    .  0.00  2        ...  OPTIONS * HTTP/1.0
10-0  22897  0/0/16    _  0.00  4        ...  OPTIONS * HTTP/1.0
11-0  22898  0/0/8     _  0.00  4        ...  OPTIONS * HTTP/1.0

(you might need to scroll down to see that table -- the upper tables will be overall server statistics, and then a visualization of what each of the children is currently doing)

update : of course, this assumes something's going wrong. (based on your comment of only 10-15 requests per second). I have some other servers where people are mirroring files from us, and as the files are quite large, and there's a few folks who've been known to open 500 streams with not so great bandwidth, it'll eat up all 1024 connections, but it's perfectly normal and doesn't cause a crash.

If you're having problems with runaway CGIs, you might consider using suExec or CGIwrap to limit the execution time, although there will be overhead for using them.

Joe H.
  • 1,897
  • 12
  • 12
0

Do you have enough internet bandwidth to server the responses? The incoming requests are proportionally very small, so if you max out any leg (LAN, WAN, whatever) your servers pile up trying to write to the network.

Check the send queue via your system's netstat(1) command. eg "netstat -nat" and look at the send Q column. If you've lots of outgoing data queue'd that's a sign you have a bottleneck somewhere in the network (beyond your physical network card.)