-1

For the last couple of hours I've been trying to battle with my server to keep it up during some pretty minor load (50 concurrent users).

Spec:

6 CPUs
12GB RAM

During this time, memory usage maxed out at 4GB, so no problems there.

However, Apache was going insane kicking up about 20+ running processes and eating up all 6 CPUs (600% CPU usage), bringing the website to a halt.

Now; with exactly the same traffic and concurrent users, CPU usage is down to 40% of the available 600% - no changes were made.

I cannot for the life of me see why Apache thought it necessary to kick up 20+ running processes, and at the same time use 1 or 2 for the same traffic volumes.

How can I diagnose what these Apache processes are actually doing? I know to limit this through MaxClients but that still bottlenecks the server when its trying to create 20+.

Nick
  • 103
  • 2

2 Answers2

0

This may not be a full answer, but the suggestions I wish to make are more readable in a full post rather than a comment.

I would enable /server-status a handler implemented by mod_status as well as ExtendedStatus on and then regularly look a the /server-status page to see what Apache is doing, how many requests are being processed and how long they have been running. Possibly even record it using a looping shell script.

I would also add %D to your access log format so that you can post process the logs to see which requests are taking along time (if any).

Hopefully this will give you a clue as to what part of your app is taking all the CPU time.

if you are using mod_(php/python/perl) etc then it is almost certainly code from these making the CPUs busy, apache itself will normally only do this with a very high number of static requests.

Unbeliever
  • 2,286
  • 1
  • 9
  • 17
0

Being structured and methodical in your approach is much better than flapping around wildly.

Personally I find Scientific Method (others call it something different) a wonderful tool to pull out of the system administration kitbag when diagnosing problems.

  1. What is the actual problem you're trying to solve ?

For the last couple of hours I've been trying to battle with my server to keep it up during some pretty minor load (50 concurrent users).1

  1. So, now we know what the actual problem is we're solving we have some direction. Let's gather some information to help us figure out a solution.

    • Is the problem time related? Does it happen regularly or randomly.
    • Check your logs, all of them, not just the particular services's logs as something else may be causing the problem. Log entries generally have timestamps, this is to help you correlate events across multiple applications and services - use them. If necessary increase the log verbosity too.
    • Watch what your system is doing. Use tools like top, vmstat, iostat, sar, ps,tcpdump apache mod_status or even full blown monitoring systems.

  2. Analyse the information you have gathered. What is actually happening on the system when the issue is evident? What is the state of the system's resources ?

  3. Take appropriate action to remediate. Hopefully it's pretty obvious what's going on, you're running out of memory and OOM killer comes out to play, your swap activity is too high, your run queue is too long, you're iobound etc. If it's not obvious then you're probably not gathering the correct data - you know what to do, go back to 2.

  4. Monitor what the changes introduced at 4. do.

  5. Did the changes fix the problem ? Is it better? Is it worse ? Is there no difference ? Where you go from here depends on what you find. You may need to go back to 2. and gather more pertinent data or 3. to reanalyse what data you have or 4. because you identified a number of potential solutions.

  6. Document your findings and the changes you made.

  7. Go back to bed/home from work/to the pub.

1 This could be anything though 'My server is slow', 'My server is using too much memory' ...

user9517
  • 114,104
  • 20
  • 206
  • 289