12

We have a server with unusual high load and cpu util, but we can't figure out why. When we run top all the procs seem to be very low cpu.

http://cl.ly/2d1g0K3q261r0R0K3e35

Is there a better way to look for what is causing this?

Ben
  • 555
  • 2
  • 5
  • 7

7 Answers7

7

Load is a measure of the workload a system has had on a 1, 5 and 15 minute basis.

The most common misconception is that Load Average is purely connected to the CPU usage of a system.
Load does however incorporate additional measurements such as CPU waiting for I/O which I think is your issue.

Based on the image I'm guessing you ran out of memory and started swapping data to disk.

A simple free -m will tell you how much RAM and swap is used.
The interesting column is the free column besides -/+ buffers/cache.
If it's close to zero you've run out of RAM and should act accordingly.

Mark
  • 740
  • 5
  • 5
4

Noticed that the load average is quite high (68, wow). Is it possible that there are a lot of processes which takes up a little bit of CPU, thus add up consuming all CPU time? Maybe, those processes just start and finish very quickly thus top cannot capture the existence of them, you may try to see if atop can see that or not.

Raymond Tau
  • 682
  • 3
  • 16
4

Try

iotop

IO was it for me most of the time.

splattne
  • 28,348
  • 19
  • 97
  • 147
MarcoHager
  • 41
  • 1
4

I think this bug is your case. From what I see from the output, you have enough memory (note the cached 14 GB or so), no I/O issues, but you have xen-related processes running. This make me think it is a bug.

grs
  • 2,235
  • 6
  • 28
  • 36
2

Try using:

top -o cpu

The -o flag will force top to order the processes by CPU usage in descending order.

Bandit
  • 201
  • 1
  • 3
  • when i ran `top -o cpu` I got "top: unknown argument 'o'" – Ben May 22 '11 at 04:54
  • Ok, try running top and hitting `o` while it is running. It should ask you for a primary key. Type `cpu` and hit enter. – Bandit May 22 '11 at 06:10
  • 1
    even when sorted by cpu there is nothing at the top of the list above 1% and there are only a handful (2-3) at any given time. The rest are 0% – Ben May 22 '11 at 06:40
  • 1
    On Centos 7.2 at least, the correct command to do this is `top -o %CPU` – siliconrockstar Aug 15 '16 at 15:02
2

It could be locked files on nfs or any other thing that locks a file that another process needs access to

could also be missed configured service with too many threads active

John
  • 391
  • 1
  • 4
1

Looks like CPU usage is coming from a thread. top seems to not take this into account. I recently saw this on a mysql server. there are running INSERT statements but I was unable to get the new rows with SELECT because some thread of mysqld was updating the table index. top shows 100% user load on one core but every process including mysqld was an 0.0% CPU. hours later the same SELECT provided the expected result set.

See also

Getting a per thread cpu stats

'htop' process and threads cpu usage?