40

We are running into a strange behavior where we see high CPU utilization but quite low load average.

The behavior is best illustrated by the following graphs from our monitoring system.

CPU usage and load

At about 11:57 the CPU utilization goes from 25% to 75%. The load average is not significantly changed.

We run servers with 12 cores with 2 hyper threads each. The OS sees this as 24 CPUs.

The CPU utilization data is collected by running /usr/bin/mpstat 60 1 each minute. The data for the all row and the %usr column is shown in the chart above. I am certain this does show the average per CPU data, not the "stacked" utilization. While we see 75% utilization in the chart we see a process showing to use about 2000% "stacked" CPU in top.

The load average figure is taken from /proc/loadavg each minute.

uname -a gives:

Linux ab04 2.6.32-279.el6.x86_64 #1 SMP Wed Jun 13 18:24:36 EDT 2012 x86_64 x86_64 x86_64 GNU/Linux

Linux dist is Red Hat Enterprise Linux Server release 6.3 (Santiago)

We run a couple of Java web applications under fairly heavy load on the machines, think 100 requests/s per machine.

If I interpret the CPU utilization data correctly, when we have 75% CPU utilization it means that our CPUs are executing a process 75% of the time, on average. However, if our CPUs are busy 75% of the time, shouldn't we see higher load average? How could the CPUs be 75% busy while we only have 2-4 jobs in the run queue?

Are we interpreting our data correctly? What can cause this behavior?

K Erlandsson
  • 635
  • 1
  • 9
  • 13
  • Is the monitoring system showing normalized CPU load (load / #CPUs)? Regular Linux CPU load is hard to compare across systems with different core/cpu counts so some tools use a normalized CPU load instead. – Brian Feb 12 '15 at 12:08
  • Do you mean dividing each data point with the number of CPUs? I.e. loadavg/24 in our case? I can easily create such a chart from the data if that helps. – K Erlandsson Feb 12 '15 at 12:14
  • I was suggesting your chart may already be showing that. – Brian Feb 12 '15 at 12:51
  • Ah, sorry for misunderstanding you. It would have been a nice explanation, but unfortunately it is the system-wide load average that is shown. I just triple checked. – K Erlandsson Feb 12 '15 at 12:54

8 Answers8

76

On Linux at least, the load average and CPU utilization are actually two different things. Load average is a measurement of how many tasks are waiting in a kernel run queue (not just CPU time but also disk activity) over a period of time. CPU utilization is a measure of how busy the CPU is right now. The most load that a single CPU thread pegged at 100% for one minute can "contribute" to the 1 minute load average is 1. A 4 core CPU with hyperthreading (8 virtual cores) all at 100% for 1 minute would contribute 8 to the 1 minute load average.

Often times these two numbers have patterns that correlate to each other, but you can't think of them as the same. You can have a high load with nearly 0% CPU utilization (such as when you have a lot of IO data stuck in a wait state) and you can have a load of 1 and 100% CPU, when you have a single threaded process running full tilt. Also for short periods of time you can see the CPU at close to 100% but the load is still below 1 because the average metrics haven't "caught up" yet.

I've seen a server have a load of over 15,000 (yes really that's not a typo) and a CPU % of close to 0%. It happened because a Samba share was having issues and lots and lots of clients started getting stuck in an IO wait state. Chances are if you are seeing a regular high load number with no corresponding CPU activity, you are having a storage problem of some kind. On virtual machines this can also mean that there are other VMs heavily competing for storage resources on the same VM host.

High load is also not necessarily a bad thing, most of the time it just means the system is being utilized to it's fullest capacity or maybe is beyond it's capability to keep up (if the load number is higher than the number of processor cores). At a place I used to be a sysadmin, they had someone who watched the load average on their primary system closer than Nagios did. When the load was high, they would call me 24/7 faster than you could say SMTP. Most of the time nothing was actually wrong, but they associated the load number with something being wrong and watched it like a hawk. After checking, my response was usually that the system was just doing it's job. Of course this was the same place where the load got up over 15000 (not the same server though) so sometimes it does mean something is wrong. You have to consider the purpose of your system. If it's a workhorse, then expect the load to be naturally high.

deltaray
  • 1,435
  • 9
  • 14
  • How do you mean that I can have a load of 1 and 100% CPU with a single threaded process? What kind of threads are you talking about? If we consider our Java processes, they have tons of threads, but I was under the assumption that the threads were treated as processes from the perspective of the OS (they have separate PIDs on Linux after all). Could it be so that a single multi threaded java process is only counted as one task from a load average perspective? – K Erlandsson Feb 13 '15 at 09:17
  • I just did a test on my own, the threads in a Java process contributes to the load average as if they where separate processes (I.e. a java class that runs 10 threads in a busy-wait loop gives me a load close to 10). I would appreciate a clarification about the threaded process you mentioned above. Thank you! – K Erlandsson Feb 13 '15 at 09:26
  • I mean if you have a non-multithreading process (ie, one that just uses a single CPU at a time). For instance if you just write a simple C program that runs a busy loop, its just a single thread running and uses only 1 CPU at a time. – deltaray Feb 13 '15 at 20:36
  • All information I have found says that threads count as separate processes when seen from the kernel and when calculating load. Hence I fail to see how I could have a multi threaded process on full tilt resulting in 1 load and 100% CPU on a multi-CPU system. Could you please help me understand how you mean? – K Erlandsson Feb 14 '15 at 13:13
  • For anyone looking for more detail: ["Linux Load Averages: Solving the Mystery" by Brendan Gregg](http://www.brendangregg.com/blog/2017-08-08/linux-load-averages.html) had all the answers I ever needed. – Nickolay Sep 12 '18 at 12:50
27

Load is a very deceptive number. Take it with a grain of salt.

If you spawn many tasks in very quick succession which complete very quickly, the number of processes in the run queue is too small to register the load for them (the kernel counts load once every five seconds).

Consider this example, on my host which has 8 logical cores, this python script will register a large CPU usage in top (about 85%), yet hardly any load.

import os, sys

while True:
  for j in range(8):
    parent = os.fork()
    if not parent:
      n = 0
      for i in range(10000):
        n += 1
      sys.exit(0)
  for j in range(8):
    os.wait()

Another implementation, this one avoids wait in groups of 8 (which would skew the test). Here the parent always attempts to keep the number of children at the number of active CPUs such it will be much busier than the first method and hopefully more accurate.

/* Compile with flags -O0 */
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>

#include <err.h>
#include <errno.h>

#include <sys/signal.h>
#include <sys/types.h>
#include <sys/wait.h>

#define ITERATIONS 50000

int maxchild = 0;
volatile int numspawned = 0;

void childhandle(
    int signal)
{
  int stat;
  /* Handle all exited children, until none are left to handle */
  while (waitpid(-1, &stat, WNOHANG) > 0) {
    numspawned--;
  }
}

/* Stupid task for our children to do */
void do_task(
    void)
{
  int i,j;
  for (i=0; i < ITERATIONS; i++)
    j++;
  exit(0);
}

int main() {
  pid_t pid;

  struct sigaction act;
  sigset_t sigs, old;

  maxchild = sysconf(_SC_NPROCESSORS_ONLN);

  /* Setup child handler */
  memset(&act, 0, sizeof(act));
  act.sa_handler = childhandle;
  if (sigaction(SIGCHLD, &act, NULL) < 0)
    err(EXIT_FAILURE, "sigaction");

  /* Defer the sigchild signal */
  sigemptyset(&sigs);
  sigaddset(&sigs, SIGCHLD);
  if (sigprocmask(SIG_BLOCK, &sigs, &old) < 0)
    err(EXIT_FAILURE, "sigprocmask");

  /* Create processes, where our maxchild value is not met */
  while (1) {
    while (numspawned < maxchild) {
      pid = fork();
      if (pid < 0)
        err(EXIT_FAILURE, "fork");

      else if (pid == 0) /* child process */
        do_task();
      else               /* parent */
        numspawned++;
    }
    /* Atomically unblocks signal, handler then picks it up, reblocks on finish */
    if (sigsuspend(&old) < 0 && errno != EINTR)
      err(EXIT_FAILURE, "sigsuspend");
  }
}

The reason for this behaviour is the algorithm spends more time creating child processes than it does running the actual task (counting to 10000). Tasks not yet created cannot count towards the 'runnable' state, yet will take up %sys on CPU time as they are spawned.

So, the answer could really be in your case that whatever work is being done spawns large numbers of tasks in quick succession (threads, or processes).

Matthew Ife
  • 22,927
  • 2
  • 54
  • 71
  • Thank you for the suggestion. The chart in my question shows %user time (CPU system time is excluded, we do only see a very slight increase in system time). Could many small tasks be the explanation anyways? If the load average is sampled every 5 seconds, is the CPU utilization data as given by mpstat more frequently sampled? – K Erlandsson Feb 12 '15 at 13:23
  • I am not familiar with how CPU sampling is done there. Never read the kernel source regarding it. In my example %usr was 70%+ and %sys was 15%. – Matthew Ife Feb 12 '15 at 13:30
  • Good examples ! – Xavier Lucas Feb 12 '15 at 18:06
5

If the load average doesn't increase much then it just means that your hardware specs and the nature of the tasks to be processed result in a good overall throughput, avoiding them to be piled up in the task queue for some time.

If there was a contention phenomenom because for instance the average task complexity is too high or task average processing time takes too many CPU cycles, then yes, load average would increase.

UPDATE :

It may not be clear in my original answer, so I'm clarifying now :

The exact formula of load average calculation is : loadvg = tasks running + tasks waiting (for cores) + tasks blocked.

You can definately have a good throughput and get close to a load average of 24 but without penalty on tasks processing time. On the other hand you can also have 2-4 periodic tasks not completing quickly enough, then you will see the number of task waiting (for CPU cycles) growing and you will eventually reach a high load average. Another thing that can happen is having tasks running outstanding synchronous I/O operations then blocking a core, lowering the throughput and making the waiting task queue growing (in that case you may see the iowait metric changing)

Xavier Lucas
  • 12,815
  • 2
  • 44
  • 50
  • It is my understanding that load average also includes the tasks currently executing. That would mean we definitely can have an increase in load average without actual contention for the CPUs. Or am I mistaken/misunderstanding you? – K Erlandsson Feb 12 '15 at 13:24
  • @KristofferE You are completely right. The actual formula is loadavg = taks running + tasks waiting (for available cores) + tasks blocked. This mean you can have a load average of 24, no task waiting or blocked, thus having just a "full usage" or your hardware capacity without any contention. As you seemed confused about load average vs number of processes running vs CPU usage, I mainly focused my answer on explanations about how a load average can still grow with so few running processes overall. It may not be that clear indeed after re-reading it. – Xavier Lucas Feb 12 '15 at 14:26
3

While Matthew Ife's answer was very helpful and led us in the right direction, it was not exactly the what caused the behavior in our case. In our case we have a multi threaded Java application that uses thread pooling, why no work is done creating the actual tasks.

However, the actual work the threads do is short lived and includes IO waits or synchornization waits. As Matthew mentions in his answer, the load average is sampled by the OS, thus short lived tasks can be missed.

I made a Java program that reproduced the behavior. The following Java class generates a CPU utilization of 28% (650% stacked) on one of our servers. While doing this, the load average is about 1.3. The key here is the sleep() inside the thread, without it the load calculation is correct.

import java.util.concurrent.ArrayBlockingQueue;
import java.util.concurrent.ThreadPoolExecutor;
import java.util.concurrent.TimeUnit;

public class MultiThreadLoad {

    private ThreadPoolExecutor e = new ThreadPoolExecutor(200, 200, 0l, TimeUnit.SECONDS,
            new ArrayBlockingQueue<Runnable>(1000), new ThreadPoolExecutor.CallerRunsPolicy());

    public void load() {
        while (true) {
            e.execute(new Runnable() {

                @Override
                public void run() {
                    sleep100Ms();
                    for (long i = 0; i < 5000000l; i++)
                        ;
                }

                private void sleep100Ms() {
                    try {
                        Thread.sleep(100);
                    } catch (InterruptedException e) {
                        throw new RuntimeException(e);
                    }
                }
            });
        }
    }

    public static void main(String[] args) {
        new MultiThreadLoad().load();
    }

}

To summarize, the theory is that the threads in our applications idle a lot and then perform short-lived work, why the tasks are not correctly sampled by the load average calculation.

K Erlandsson
  • 635
  • 1
  • 9
  • 13
2

Load average includes tasks that are blocked on disk IO, so you can easily have zero cpu utilization and a load average of 10 just by having 10 tasks all trying to read from a very slow disk. Thus it is common for a busy server to start thrashing the disk and all of the seeking causes lots of blocked tasks, driving up the load average, while cpu usage drops, since all of the tasks are blocked on the disk.

psusi
  • 3,247
  • 1
  • 16
  • 9
0

First of all the short answer to the question: it's obvious that from 12 to 12:05 the processes that were processed by CPU took a longer time to process than it happened before.

From 11 to 11:55 every process of OS took 25ms (for example) of CPU time.

From 12 to 12:05 every process of OS took 75ms.

That's why load average didn't change but CPU usage changed a lot.

The long answer: CPU usage and load average they describe the state of two the very different creatures.

CPU usage describes the health of CPU

Load average has nothing in common with CPU.

So its quite inappropriate when the load average is used to find out the busyness or idleness of a CPU.

It's like to try to find out how much money get a person via weather forecast.

Load average describes processes in Linux OS not CPU state

CPU usage describes how much time CPU was doing something instead of doing nothing during some period of time, let's say for simplicity for 1 second.

If CPU usage = 85% it means 85ms CPU was busy and 15ms it was idle. That's it.

CPU usage is quite similar to HDD %busy time characteristic.

Load average = 125 for 1 second means that 125 processes was processed by CPU or waited to be processed or waited for hdd system.

It's complicated so it is easy to understand the point to think that 125 processes were processed by CPU. the point is we don't know how much time every process was running on CPU. we just know they were running for some unknown time.

So for my opinion load average make a lot of confusion and harm when we try to understand the performance rather than it's doing something useful.

When we look at the initial graph we can see that there is no correlation between CPU usage and load average during all the period of time. It's like trying to find the correlation between the weather forecast and the color of your teacup.

Stuggi
  • 3,366
  • 4
  • 17
  • 34
Alex
  • 262
  • 3
  • 6
0

Load average is average number of processes in the CPU queue. It is specific for each system, you cannot say that one LA is generically high on all systems, and another is low. So you have 12 cores, and for LA to increase significantly the number of processes must be really high.

Another question is what is meant by the "CPU Usage" graph. If it's taken from SNMP, like it should be, and your SNMP implementation is net-snmp, then in just stacks CPU-load from each of your 12 CPU. So for net-snmp the total amount of CPU load is 1200%.

If my assumptions are correct, then the CPU usage didn't increased significantly. Thus, LA didn't increased significantly.

drookie
  • 8,051
  • 1
  • 17
  • 27
  • The cpu usage is taken from mpstat, the `all` row. I am fairly certain it is an average across all CPUs, it is not stacked. For example, when the problem occurs, top shows 2000% CPU usage for one process. That is stacked usage. – K Erlandsson Feb 12 '15 at 12:31
0

The scenario here is not particularly unexpected although it is a little unusual. What Xavier touches on, but does not develop much, is that although Linux (by default) and most flavours of Unix implement pre-emptive multi-tasking, on a healthy machine, tasks will rarely be pre-empted. Each task is alotted a time slice for occupying the CPU, it is only pre-empted if it exceeds this time and there are other tasks waiting to run (note that load reports the average number of processes both in the CPU and waiting to run). Most of the time, a process will yield rather than being interrupted.

(in general you only need to worry about load when it gets close the number of CPUs - i.e. when the scheduler starts pre-empting tasks).

if our CPUs are busy 75% of the time, shouldn't we see higher load average?

Its all about the pattern of activity, clearly increased utilization of the CPU by some tasks (most likely a small mintority) was not having an adverse effect on the processing of other tasks. If you could isolate the transactions being processed, I would expect you would see a new group emerging during the slowdown, while the extant task set was not affected.

update

One common scenario where high CPU can occur without a big increase in load is where a task triggers one (or a sequence) of other tasks, e.g. on receipt of a network request, the handler routes the request to a seperate thread, the seperate thread then makes some asynchronous calls to other processes.... the sampling of the runqueue causes the load to reported lower than it really is - but it does not rise linearly with CPU usage - the chain of tasks triggerred would not have been runnable without the initial event, and because they occur (more or less) sequentially the run queue is not inflated.

symcbean
  • 19,931
  • 1
  • 29
  • 49
  • The OP originally provided indications that the aggregate CPU% was "2000%" suggesting there are many tasks using up CPU, rather than just 1 busy process. If it was a consistent 2000% for a minute you'd normally anticipate the load be 20-ish. – Matthew Ife Feb 12 '15 at 14:10
  • ...in a comment, not in the question, and he's not very sure about that. In the absence of the 'ALL' option, mpstat reports the total % usage not the average. But that doesn't change the answer - it's about the pattern of activity. – symcbean Feb 12 '15 at 14:40
  • I'm 100% positive that the CPU util we see in the chart is the "average per CPU". Mpstat is run without ALL, but that only leaves out the per-CPU info, the `all` row still shows the average per CPU. I will clarify the question. – K Erlandsson Feb 12 '15 at 14:43
  • Could you please elaborate yoru last section a bit? I fail to grasp what you mean, while the part of my question you cited is the part I have most trouble understanding. – K Erlandsson Feb 12 '15 at 14:44