23

Running Linux I have a few processes which tend to crash occasionally (game servers), which end up using 100% cpu.

I'm looking for a program or script to check the cpu usage of a list of processes by name and if they are at 100% for more than X time, say 30 seconds, kill them. I tried ps-watcher but wasn't able to determine how to accomplish this.

Just killing the process at 100% usage won't work as it will hit that for brief periods during normal operation.

I've also found this script which seems to do what I want, however it is limited to one process: link

Any help is greatly appreciated!

mwfearnley
  • 757
  • 9
  • 21
user30153
  • 231
  • 1
  • 2
  • 3
  • Can you please post again a link to the script cause this one http://pastebin.com/m1c814cb4 seems not to be valid any more. –  Jan 11 '12 at 15:57
  • Would I be right in guessing that you're running Minecraft servers? ;) – PhonicUK Aug 29 '12 at 12:39
  • @Chris S You are dull. This is a very interesting question. Can you provide a source for your claims "because they attract low quality, opinionated and spam answers, and the answers become obsolete quickly."? And can you give some examples for how the existing answers to this question live up to this? I am not holding my breath. – d-b Nov 17 '18 at 21:41

3 Answers3

22

Try monit.

You could use a configuration like this, to accomplish your task:

check process gameserver with pidfile /var/run/gameserver.pid
  start program = "/etc/init.d/gameserver start" with timeout 60 seconds
  stop program  = "/etc/init.d/gameserver stop"
  if cpu > 80% for 2 cycles then alert
  if cpu > 95% for 5 cycles then restart
  if totalmem > 200.0 MB for 5 cycles then restart
  if loadavg(5min) greater than 10 for 8 cycles then stop
  if failed port 12345 type tcp with timeout 15 seconds
    then restart
  if 3 restarts within 5 cycles then timeout

Details about this configuration can be found in monit's documentation.

joschi
  • 20,747
  • 3
  • 46
  • 50
  • Thank you for the reply! Is there any way to monitor the process without having to start it with monit? I have a ton of servers running on the machine which are managed through a web interface, having to launch them with monit isn't ideal. – user30153 Dec 28 '09 at 01:10
  • Sure, the `start program` and `stop program` lines are just for the case when `monit` needs to restart your process. You can still start it with your normal init script. `monit` can also check if the program is already running (e.g. by its PID file or process name). – joschi Dec 28 '09 at 02:00
  • Fantastic, i think i've got it figured out. The only problem is it's dependence on a pid file, i'm going to have to generate one for over 200 processes, and create rules for each one i suppose. Thanks for the help! – user30153 Dec 28 '09 at 08:37
4

This was what I was looking for, and have been using it for some time now (slightly altered). Lately, I've put a bug in my work but need to keep the app (game server) running.
I had quoted out the part where topmost PID is killed, as it was killing the wrong PID.
Here's my latest draft of your script, so far, it finds the top-most overload and effectively kills it (also emails me with the info whenever it does anything);

#!/bin/bash

## Note: will kill the top-most process if the $CPU_LOAD is greater than the $CPU_THRESHOLD.
echo
echo checking for run-away process ...

CPU_LOAD=$(uptime | cut -d"," -f4 | cut -d":" -f2 | cut -d" " -f2 | sed -e "s/\.//g")
CPU_THRESHOLD=300
PROCESS=$(ps aux r)
TOPPROCESS=$(ps -eo pid -eo pcpu -eo command | sort -k 2 -r | grep -v PID | head -n 1)

if [ $CPU_LOAD -gt $CPU_THRESHOLD ] ; then
  # kill -9 $(ps -eo pid | sort -k 1 -r | grep -v PID | head -n 1) #original
  # kill -9 $(ps -eo pcpu | sort -k 1 -r | grep -v %CPU | head -n 1)
  kill -9 $TOPPROCESS
  echo system overloading!
  echo Top-most process killed $TOPPROCESS
  echo load average is at $CPU_LOAD
  echo 
  echo Active processes...
  ps aux r

  # send an email using mail
  SUBJECT="Runaway Process Report at Marysol"
  # Email To ?
  EMAIL="myemail@somewhere.org"
  # Email text/message
  EMAILMESSAGE="/tmp/emailmessage.txt"
  echo "System overloading, possible runaway process."> $EMAILMESSAGE
  echo "Top-most process killed $TOPPROCESS" >>$EMAILMESSAGE
  echo "Load average was at $CPU_LOAD" >>$EMAILMESSAGE
  echo "Active processes..." >>$EMAILMESSAGE
  echo "$PROCESS" >>$EMAILMESSAGE
  mail -s "$SUBJECT" "$EMAIL" < $EMAILMESSAGE

else
 echo
 echo no run-aways. 
 echo load average is at $CPU_LOAD
 echo 
 echo Active processes...
 ps aux r
fi
exit 0


This little script has been extremely useful, if you don't like it killing any process, the email alone will help keep you informed.

HopelessN00b
  • 53,385
  • 32
  • 133
  • 208
  • Thanks for your answer! I would just like to point out that your sorting in `TOPPROCESS` is off. It won't sort by actual value, instead it will order the entries alphanumerically (e.g. 6% will have precedence over 12%). A better alternative might be the following command: `top -b -n 1 | sed 1,6d | sed -n 2p` – Glutanimate Sep 11 '13 at 01:08
  • 1
    If the CPU is 90% what is the CPU_LOAD? and how you calculate the threshold? thanks – Ofir Attia Sep 17 '14 at 19:14
  • 1
    This wont catch situations where one process is maxed out on a multi core server. – UpTheCreek Apr 18 '17 at 11:28
0

Below is a sample BASH script that may help you get some hints for your own needs.

#!/bin/bash

CPU_LOAD=$(uptime | cut -d"," -f4 | cut -d":" -f2 | cut -d" " -f2 | sed -e "s/\.//g")
CPU_THRESHOLD=700

if [ $CPU_LOAD -gt $CPU_THRESHOLD ] ; then
  kill -9 $(ps -eo pid | sort -k 1 -r | grep -v PID | head -n 1)
fi

exit 0

Please take note that the value of your $CPU_THRESHOLD should depend on the number of (CPU) cores you have on your system. A detailed explanation about this topic can be found at http://blog.scoutapp.com/articles/2009/07/31/understanding-load-averages .

You can either call your script from inside the /etc/inittab or a cronjob for every number of minutes you prefer. Please take note also that the example script will kill the top-most process if the $CPU_LOAD is greater than the $CPU_THRESHOLD.

bintut
  • 304
  • 1
  • 5