How do I find out which process is causing high CPU usage?

2

1

I have a strange issue happening on my Solaris virtual machine: After it is booted and up for 1-2 hours, the CPU usage will go up to 100% for 5 seconds, then down to normal for another 5 seconds, and repeats like that until a reboot - it makes my Solaris virtual machine totally unusable.

I wish to find out what is happening during the repeating 5 seconds 100% CPU usage, but the system is totally unresponsive during the 5 seconds - not even mouse/keyboard interrupts are handled, and thus I cannot see the process name using top or prstat.

So I wish to find out:

  • the process id which caused the 100% CPU usage
  • what the process was doing during the 100% CPU usage

Please offer your suggestions, thank you!

Howard

Posted 2012-08-25T06:11:23.573

Reputation: 1 646

Answers

5

You could try running top in batch mode:

top -b -n100 > top.log

where -n100 stands for 100 iterations.

Another alternative is to use ps with appropriate arguments (these are from Linux, you may need to consult the man page of ps in Solaris). For convenience the command is embedded into a Bash script.

   #!/bin/bash 
   while true ; do
      ps -eo pcpu,pmem,pid,ppid,args >> process.log
      sleep 1
   done

You may also have to change the priority of the process activity logging process to real time with the nice command.

jpe

Posted 2012-08-25T06:11:23.573

Reputation: 301

3

Especially as you are running in a virtualized environment, you shouldn't directly assume the culprit is a process.

This can also be an hypervisor issue or an kernel related one.

I would use dtrace to figure out what the kernel is doing during these high CPU usage periods:

The DTraceToolkit cputimes and modcalls.d commands would be a good start.

If your host OS is also Solaris, dtrace can also be useful identifying the CPU load origin

jlliagre

Posted 2012-08-25T06:11:23.573

Reputation: 12 469

thank you. I too suspect that Solaris 11 kernel is not playing nice in virtualbox - or a virtualbox is causing the issue, because I observed that, right before it happens, solaris reports a pulse of 4Gbit/s network traffic... – Howard – 2012-08-25T22:05:16.863