Prevent high load average caused by apache

0

1

I've a remote server running Apache with some websites. Sometimes load average rises too much, and webserver is unresponsive

I think it's caused by apache, but i can't check because ssh session is closed automatically when i log in. I just can solve it by restarting the server (actually, i've to call the provider to restart it manually).

Once it's restarted, I can check on Cacti that load average was too high (more than 100).

Can anyone explain my any way to find and solve the problem? Maybe I need a trigger or something like that to restart Apache when load average rises, I don't know.

Thanks in advanced.

isma

Posted 2013-02-04T16:55:40.737

Reputation: 3

"Load average" is simply a set of 3 linux/unix variables that stand for the number of processes waiting for cpu time during the last 1, 5 and 15 minutes. You can check the load by typing "cat /proc/loadavg" on the command line. It is not shown in % but in 1/100's, 1.00 being 100% usage.

A load of 1.00 per CPU is ideal, but up to (num of CPUs)*25 the computer stays pretty responsive. A load of 100 is okay if you got 8 CPUs, but if you got only one CPU, the system might get a little slow... – Geeklab – 2014-09-08T11:07:29.410

What exactly do you mean by load "average"? Averaged over what time span? Are you sure you don't mean memory usage? – terdon – 2013-02-04T16:57:53.127

Averaged over 1, 5 and 15 minutes. – isma – 2013-02-04T17:09:13.207

OK, so this is the output of top? The thing is that a load of 100% should not really be visible to you. I repeat, are you sure this is not an issue of RAM usage? – terdon – 2013-02-04T17:14:20.147

I don't know exactly where is the issue. Cacti can't monitor activity when it happens. – isma – 2013-02-04T17:16:56.297

But the load is not expressed in % (I think), it raises until 150 or 200 – isma – 2013-02-04T17:17:46.327

It is expressed in %, but 100% means 100% of a single CPU. For example, if your machine has 8 cores, it can go up to 800%. – terdon – 2013-02-04T17:32:19.767

Load average doesn't mean the CPU usage. It shows a relation between processes being attended and processes waiting. – isma – 2013-02-05T10:09:19.653

Answers

0

The first thing you need to do is monitor what is going on, come back and update your question when you have more details.

Use a small script that will query system and memory load every few seconds and save that info to a file. Perhaps something like this:

#!/bin/sh
while true
do
    echo "-------`date`--------"
    echo "\t\t%MEM\t%CPU"
    ps ax -o comm,%mem,%cpu | sort -nk3 | tail -n 3
    sleep 30
done

The script will print the usage statistics for the three most CPU heavy processes and then for the 3 most memory heavy processes. It will then wait for 30 seconds (you can change that by giving a different number to sleep) and do it all again. Its output looks like this on my system:

$ ./monitor.sh
-------Mon Feb  4 20:00:51 CET 2013--------
                %MEM %CPU
java             9.1  3.6
Xorg             3.3  4.9
firefox          8.1 12.2
        ---     
Xorg             3.3  4.9
firefox          8.1 12.2
java             9.1  3.6

Save this script as monitor.sh and make it executable and run it in the background while redirecting its output to a file:

chmod 744 monitor.sh
./monitor.sh > usage.log &

You can monitor the progress by running tail -f usage.log.

Let this run for a while and check what was going on the next time your server becomes unresponsive. Be careful though, the script is printing out 9 lines of every 30 seconds. If you let it run too long, you will get a pretty big file. Remember to stop it when you have collected the necessary information.

terdon

Posted 2013-02-04T16:55:40.737

Reputation: 45 216

Just install sysstat, configure it to record activity and try to make sense of what it reports. – vonbrand – 2013-02-05T03:02:56.120

That looks good. As I said, I can know "when" it became unresponsive, but can't check why.

The problem is that it becomes unresponsive once a week (or 2 weeks), so maybe I can't have this script running too long (I'll try to do some log ratation or something like that to avoid the big files, and reduce the ckeck period).

Thanks for your answer! I'll try it.

Does anyone know if there's any monitor system (Cacti, Nagios,...) where you can check what was running on the server on a past time? – isma – 2013-02-05T10:03:24.080