We are running into a strange behavior where we see high CPU utilization but quite low load average.
The behavior is best illustrated by the following graphs from our monitoring system.
At about 11:57 the CPU utilization goes from 25% to 75%. The load average is not significantly changed.
We run servers with 12 cores with 2 hyper threads each. The OS sees this as 24 CPUs.
The CPU utilization data is collected by running /usr/bin/mpstat 60 1
each minute. The data for the all
row and the %usr
column is shown in the chart above. I am certain this does show the average per CPU data, not the "stacked" utilization. While we see 75% utilization in the chart we see a process showing to use about 2000% "stacked" CPU in top
.
The load average figure is taken from /proc/loadavg
each minute.
uname -a
gives:
Linux ab04 2.6.32-279.el6.x86_64 #1 SMP Wed Jun 13 18:24:36 EDT 2012 x86_64 x86_64 x86_64 GNU/Linux
Linux dist is Red Hat Enterprise Linux Server release 6.3 (Santiago)
We run a couple of Java web applications under fairly heavy load on the machines, think 100 requests/s per machine.
If I interpret the CPU utilization data correctly, when we have 75% CPU utilization it means that our CPUs are executing a process 75% of the time, on average. However, if our CPUs are busy 75% of the time, shouldn't we see higher load average? How could the CPUs be 75% busy while we only have 2-4 jobs in the run queue?
Are we interpreting our data correctly? What can cause this behavior?