20

I have a box on Linode that's going through weird behavior. Every now and then CPU and disk I/O will shoot to 100% and the server becomes unresponsive and has to be booted. I'd like to investigate better what's going on, but I don't know how to find who's responsible for all that CPU and I/O. I'm running Gentoo 2.6.18.

Jeff Atwood
  • 12,994
  • 20
  • 74
  • 92
agentofuser
  • 547
  • 3
  • 6
  • 14

7 Answers7

24

You could try to do something like this:

while true; do ps -eo pcpu,pid,user,args | sort -k 1 -r | head -10 >> logfile.txt; printf "\n" >> logfile.txt; sleep 3; done

that would show you the top ten processes in terms of CPU usage. You can change the number of processes shown by changing the 10 in "head -10" to a different number, and how often it updates by changing the 3 in "sleep 3" or taking out the "sleep 3" part entirely.

Jeff
  • 203
  • 1
  • 2
  • 14
shawn
  • 371
  • 1
  • 5
  • 5
    Make sure you have some sort of sleep in there, otherwise there is a good chance that your shell process will always be in the top 10. :) – jedberg Oct 10 '09 at 17:21
  • 1
    I think `sort -nr` would do better to sort numerically (at least on my ubuntu/debian boxen) – sehe May 06 '13 at 17:02
  • 3
    BTW, you should give this process highest priority, for it to remain useful during peaks of high load (which is the point of its life, after all). – spacediver Mar 17 '14 at 15:51
  • 2
    Just to note that `ps -eo pcpu` prints the process's *lifetime* CPU use, not the average over the past X seconds. So this is not useful for tracking the use of a process over time, or for finding cases where a long-lived process usually has 0% CPU use but occasionally consumes all available CPU. – Tom Sep 02 '20 at 07:52
  • thx for pointing out @Tom but what would be the correct usage for finding such a process? – Bohne Mar 03 '22 at 07:59
  • AFAICT @Bohne there is no way of getting this from `ps`. I'll add another answer with a suggestion. – Tom Mar 03 '22 at 09:49
14

Check out atop it will write a binary log of pretty much everything you would possibly want and then you can use a top like gui to go through the time slices of the day (default is to take the data every 5 minutes). http://www.atcomputing.nl/Tools/atop/

ScottZ
  • 467
  • 2
  • 7
6

I think that munin is one of the goods tools of monitoring that will help your to get some information about your box's activities. Also, there are some command line tool like sar, iostat, ps, top for such use.

Ali Mezgani
  • 3,810
  • 2
  • 23
  • 36
5

The other answers have only shown you how you can look at what's currently going on, which doesn't help if the system has been rebooted.

If you want this information recorded for posterity (or billing, or whatever other use you might also have), what you want is process accounting.

Here's a HOWTO I found, but I'll be honest -- it's been a decade since I've used process accounting.

http://tldp.org/HOWTO/Process-Accounting/

Rob F
  • 386
  • 1
  • 6
3

The accepted answer sorts by process lifetime CPU use, not short-term CPU use.

You can get the top ten process by CPU use over an arbitrary interval using top in batch mode:

$ top -b -n 1 -d 3 -o +%CPU | sed -e '1,/PID/d' | head -10

Here -d 3 specifies the interval. According to the man page on Ubuntu 21.10, intervals in multiples of 0.1 seconds are supported. It doesn't complain if you specify something more precise, but whether it actually makes a measurement over a shorter time I don't know.

The sed -e '1,/PID/d' just cuts off the summary information and header - but you might like to leave it in and increase the head limit accordingly.

Tom
  • 295
  • 1
  • 11
  • `-d` doesn't do anything with `-n 1`. It's the interval between updates and with only one update there can be no interval. Also `-n` doesn't work at all when outputting to a pipe. – CR. Jul 23 '22 at 07:18
  • This Worked For Me on some system I tested it on. May have been a busybox system. YMMV. – Tom Jul 25 '22 at 13:27
2

A more user-friendly approach to shawn's solution for near real-time monitoring:

while true; do clear; ps -eo pcpu,pmem,pid,user,args --sort=-pcpu c|head -20; sleep 1; done

This will provide a static view of the top 20 processes that will be refreshed every 1 sec. The "c" option in the ps command will print the process executable name rather than the whole args command. You can omit this option if you need the whole command info instead. %memory usage column also added.

manolis
  • 21
  • 1
1

Doesn't Gentoo have the "top" command as wel?

machine:~/# top

should give you the running stats of which programs causes the most load.

Emthigious
  • 141
  • 1
  • Yeah I know, but I'd like to have that logged so I can see the history later. When the CPU spikes, the machine becomes unresponsive, so I can't log in and run `top` to see who's the culprit. I want to check back later and see which process did it. – agentofuser Oct 10 '09 at 15:31