-3

OK, so I'm familiar with several methods of implementing a "watchdog" script. The problem is, that none of these check for a "hung" or unresponsive process. They all just check if the process is still present.

Perhaps I'm showing my lack of programming knowledge, but I'm under the impression, a process can sometimes continue to appear running to this system but in fact be crashed/hung.

Is there a way to detect this condition and trigger (pkill blah && blah) the process in question?

Some examples of what I'm NOT looking for:

U880D
  • 597
  • 7
  • 17
bumbling fool
  • 321
  • 1
  • 4
  • 12

1 Answers1

3

Look into the Monit or M/Mmonit utility.

Its uses are covered well here on Server Fault, as well as in the documentation examples.

You can check process by PID or presence easily; but additional parameters like CPU utilization or RAM consumption can also be triggers for a variety of actions.

ewwhite
  • 194,921
  • 91
  • 434
  • 799
  • 2
    This is probably the best "universal" way of doing it, I've rarely found something that "hung" without taking up as many CPU cycles as possible. The only "right" way of doing it is to see whether the process is doing the work it's supposed to be doing, which has to be tailored to the individual process. – DerfK Feb 08 '13 at 23:43