42

I'm looking for a way to kill all processes with a given name that have been running for more than X amount of time. I spawn many instances of this particular executable, and sometimes it goes into a bad state and runs forever, taking up a lot of cpu.

I'm already using monit, but I don't know how to check for a process without a pid file. The rule would be something like this:

kill all processes named xxxx that have a running time greater than 2 minutes

How would you express this in monit?

Parand
  • 728
  • 1
  • 8
  • 15

5 Answers5

86

In monit, you can use a matching string for processes that do not have a PID. Using the example of a process named "myprocessname",

check process myprocessname
        matching "myprocessname"
        start program = "/etc/init.d/myproccessname start"
        stop program = "/usr/bin/killall myprocessname"
        if cpu usage > 95% for 10 cycles then restart

Maybe if you check to see if CPU load is at a certain level for 10 monitoring cycles (of 30-seconds each), then restart or kill, that could be an option. Or you could use monit's timestamp testing on a file related to the process.

ewwhite
  • 194,921
  • 91
  • 434
  • 799
5

There no ready-to-use tool with that functionality. Let assume you want to kill php-cgi scripts, that runs longer than minute. Do this:

pgrep php-cgi | xargs ps -o pid,time | perl -ne 'print "$1 " if /^\s*([0-9]+) ([0-9]+:[0-9]+:[0-9]+)/ && $2 gt "00:01:00"' | xargs kill

pgrep will select processes by name, ps -o pid,time prints runtime for every pid, and then analyse line, extract time from it, and print pid if time compares with defined one. result passed to kill.

datacompboy
  • 673
  • 2
  • 7
  • 16
  • the process runnig for **very** long time gets strange runtime (62-13:53:05), so the regexp parsing running time should be ([-0-9]+:[0-9]+:[0-9]+) - look at the minus at the beginning of the expression. – andrej Jun 26 '14 at 11:43
3

I solved this exact issue with ps-watcher and wrote about it on linux.com a few years back. ps-watcher does allow you to monitor processes and kill them based on accumulated run time. Here's the relevant ps-watcher configuration, assuming your process is named 'foo':

[foo]
  occurs = every
  trigger = elapsed2secs('$time') > 1*HOURS && $ppid != 1
  action = <<EOT
  echo "$command accumulated too much CPU time" | /bin/mail user\@host
  kill -TERM $pid
EOT

[foo?]
   occurs = none
   action = /usr/local/etc/foo restart

The key is the line

trigger = elapsed2secs('$time') > 1*HOURS && $ppid != 1`

which says 'if accumulated process time is > 1 hour AND I'm not the parent process, restart me.

So, I realize that answer doesn't use monit, but it does work. ps-watcher is lightweight and simple to set up, so there's no harm running it in addition to your monit setup.

Phil Hollenback
  • 14,647
  • 4
  • 34
  • 51
3

Monit can do this as of version 5.4:

if uptime > 3 days then restart

See: the project CHANGES file

gertas
  • 1,007
  • 10
  • 11
0

You could work this into monit as an exec statement.

    if [[ "$(uname)" = "Linux" ]];then killall --older-than 2m someprocessname;fi
Jodie C
  • 733
  • 6
  • 9