4

I am trying to understand system load on one linux server:

$uptime 15:01:45 up 52 days, 19:48, 1 user, load average: 0.63, 1.76, 4.81

Loads are always 1 minute < 5 minutes < 15 minutes.

Distributor ID: RedHatEnterpriseServer Release: 5.8

I captured uptime every second for about 30 minutes and always reported 1 minute average was lower than 5 minutes and 15 minutes. From my understanding of system load values, this should not be possible. It seems that higher numbers are reported with some added constant.

How is this possible and what does it mean?

Adewzen
  • 41
  • 1
  • Perhaps something that Red Hat patches in its kernels? I don't see this on Debian: `load average: 0.64, 0.56, 0.44` – wurtel Dec 15 '15 at 14:36
  • 1
    Update your system first, and then try to reproduce the problem. It is useless to try to solve a problem which may be solved in an update you haven't installed yet. Especially when the system has not been updated in more than two years! – Michael Hampton Dec 15 '15 at 16:06
  • also take a look at sar output and see if that aligns with the numbers you are seeing. – Aaron Dec 15 '15 at 20:02
  • Did the load average ever rise at all during the 30 minutes? If you had a really high load average that was steadily decreasing you would see this. – Michael Nov 04 '16 at 02:13
  • 1
    A good answer is given by davibl below. If you are racing on the highway for 15 minutes the average speed may be high , then you slow down for the speedcamera zone for about one minute and the average speed for the last minute will of course be low. It's as simple as that. – Waxhead Feb 18 '17 at 22:19

1 Answers1

1

I think that you might have misunderstood how the "load average" work. First of it's not system "load" it's system "load average" - There is a big difference as the numbers are all averages (meaning spanning over multiple records and time)! Also it's very important that you know the number of CPU's on the system as this will effect how to interpret to numbers (note Cores in this day and age).

Also note, you can't use these numbers as "CPU Usage" like you know it from windows etc. They are averages and are based on process waiting times and cpu usage.

As you wrote yourself, the load average numbers can be describes as following.

(Please note i don't use > nor < as i think it can be misleading.)

  • During the last 1 minute
  • During the last 5 minutes
  • During the last 15 minutes

Let's do a test on my small dual cpu(core) system.

I ran the command "stress -c 1" to max out 1 cpu(core) and let it run for 5,10,30 minutes (cpu usage time might have been less hence the slight deviation in the numbers).

This is how my load averages looked like.

 5 MIN - load average: 1,00, 0,71, 0,37
10 MIN - load average: 1,02, 0,94, 0,59
30 MIN - load average: 1,01, 1,03, 0,98

So what does the numbers mean ? Giving that it's on a dual cpu(core) system you can take the first number 1,00 and that tells us that the system was used 50% as we know it has two CPU's therefore if the number was 2,00 then we where using the system 100%, the decimal numbers tells you the overload of processes waiting.

  • Over the last 1 minute: The computer was overloaded by 0% on average with one fully used CPU on average. Aka in normal CPU load terms where where using the system at 50%.

And so on. Let's make the same run but with both CPU's under load AND add one extra child trying to take resources. In this scenario i'm trying to use more than my system can handle.

In just 3min, my load is already screaming at me! Now i'm not going to let this run for longer time as it's a small router to test the loads easier so it's getting hot :)

3 MIN - load average: 2,48, 0,99, 0,74

Now let's take the 1 min average number of 2,48. What does this tells us now ? Well we are using 248% of the system, we know that we can use two CPU's (200%) so the system is overloaded by 48% meaning that 0.48 processes is in average waiting for CPU usage time as the two CPU's are fully busy. But if this had been a 4 CPU(core) system then the numbers would be quite ok as we only used the system 50% again.

I hope this all makes sense, also there can be small variations between distributions on how their kernel interpret load averages but not like you think. It's how the kernel looks at what state the process is in. Like cpu_idle, cpu_waiting, io_waiting and so on. So NFS fs can make process wait on io etc. I don't think RedHat does anything special.

EDIT: If you want to look at CPU usage overall on the system, you might want to use the command "top". Top also shows load averages.

davidbl
  • 76
  • 5