Automatically restart a Unix job if it goes down?



I have a job that I would like to "daemonize" on Unix: I want it to come up when the computer boots, and I want it to restart if it goes down.

A simple way to do this is to setup a cronjob that runs every 10 or 20 minutes. The cronjob should restart the application if it's not already running.

How do I write this last part of the script: "If the job is not currently running, then start the job"?


Posted 2009-11-04T00:56:21.103

Reputation: 361

Only by using a program that is guaranteed to be running all the time (such as init or upstart) can you be sure that your program will (almost) always be alive. But I see from some of your comments below, you don't have root access. Just be aware that any periodic checking solution (pid file / cron) is only as good as the checking interval. – DaveParillo – 2009-11-04T04:48:16.940



This approach is fast and cheap and not bulletproof:

#!/usr/bin/perl -w
$l = `ps x`;
if (not $l =~ /mzscheme/) {
        system('~/utils/src/plt/bin/mzscheme &');

I put that script in a cron file.


Posted 2009-11-04T00:56:21.103

Reputation: 361

This is fast and lovely – smonff – 2014-05-01T13:18:49.553


If your program runs in the foreground, use Gerrit Pape's runit. Advantages:

  • Its pretty well bullet proof (based on Dan Berstein's daemontools).
  • It runs on a wide variety of platforms (portable).
  • It is packaged on Ubuntu and Debian (along w/ above..).
  • It is relatively easy to configure (run script, log script, some symlinks).


Posted 2009-11-04T00:56:21.103

Reputation: 20 109

djb is awesome, actually. If I'm not mistaken, he successfully sued the United States in favor of first amendment rights. – user13798 – 2009-11-04T21:28:23.173

runit was written by Gerrit Pape, not by Dan Bernstein. – JdeBP – 2014-04-02T12:41:55.067

amended to clarify author of runit – jtimberman – 2014-05-05T14:50:22.827


I use Monit for this purpose, it's free and open source. It does what you need and so much more.

What Monit can do

Monit can start a process if it does not run, restart a process if it does not respond and stop a process if it uses too much resources. You can use Monit to monitor files, directories and filesystems for changes, such as timestamp changes, checksum changes or size changes. You can also monitor remote hosts; Monit can ping a remote host and can check TCP/IP port connections and server protocols. Monit is controlled via an easy to use control file based on a free-format, token-oriented syntax. Monit logs to syslog or to its own log file and notifies you about error conditions and recovery status via customizable alert

I also like their design philosophy:

It is important for a system monitoring tool to just work - all the time and you should be able to trust it to do so. A system monitoring tool need to be non-intrusive and you should be able to forget about it once it's installed. That is, until sshd dies on your co-located server, 50 miles away. When this happens, it is good to know that you have installed this extra layer of security and protection - just wait a few seconds and Monit will restart the sshd daemon. It is also helpful to get an alert mail before the server disks are full or if your http server suddenly is slashdotted.

Monit is designed as an autonomous system and does not depend on plugins nor any special libraries to run. Instead it works right out of the box and can utilize existing infrastructure already on your system. For instance, Monit will easily integrate with init and can use existing runlevel rc-scripts to manage services. There are also flexibility for those special cases when you need a certain setup for a service.

Monit compiles and run on most flavors of UNIX. It is a small program and weights in at just over 300kB. There is support for compiling with glibc replacements such as uClibc if you need it to be even smaller.

Since you do not have root access, a script like this may work for your requirement of:

"If the job is not currently running, then start the job"

if [ $(ps ax | grep -v grep | grep "/usr/local/apache2/bin/httpd" | wc -l) -eq 0 ]
        echo "httpd Service not running"
        apachectl start

the above is coded I created and tested with cron and the Apache httpd daemon. It simply searches for your string in the current list of processes. If 0 lines are found it isn't running so it will restart it. Make sure to include grep -v grep to eliminate your search from the process output. Try using the entire path to the binary to ensure it is the service being found in your queries. If you only use httpd for example, then having httpd.conf open in vim will make the program think the httpd service is running when it really isn't. Of course, your method of starting the service will also be different.

John T

Posted 2009-11-04T00:56:21.103

Reputation: 149 037

I don't know if I can use monit, because I don't have root access on my system. So I cannot get the monit daemon to automatically load at boot. – user13798 – 2009-11-04T02:56:26.313

Ah I see, added some shell scripting which may help. – John T – 2009-11-04T03:38:22.460


There are also solutions especially designed to work as a watchdog and even run as services scripts which don't create pid files etc. An example of such a solution is supervisor.

Mr. Girgitt

Posted 2009-11-04T00:56:21.103

Reputation: 131


You can use systemd. Most modern systems already use it.

Use Type=Simple

Type=simple (default): systemd considers the service to be started up immediately. The process must not fork. Do not use this type if other services need to be ordered on this service, unless it is socket activated.


And Restart=always

Please don't do the forking-magic yourself, since other tools already do this (and better than you and I can do it).


Posted 2009-11-04T00:56:21.103

Reputation: 347


You can use a file lock. The site explains how to implement it in python, but it should be pretty simple to figure out in other languages.

Jeffrey Aylesworth

Posted 2009-11-04T00:56:21.103

Reputation: 2 170


Another idea (similar to Jeffrey Aylesworth's file lock suggestion, though more geared to the Unix shell-scripting world) would be to have your cron job check a PID-file (see related questions on SO). If your daemonized application doesn't create a PID file on its own, you can wrap it in a shell script to do so.

The basic idea is this:

  1. Start your application from a script that creates a PID-file (somewhere like /home/username/run/ containing its PID.
  2. In your cron job, check that the PID-file exists.
    1. If it exists, check that the PID is still running the application.
    2. If not running or the PID-file doesn't exist, application has died. Restart.

If you only ever want to have the application Foo running once, you could even do all this in the startup script, and just execute that as the cron job.

quack quixote

Posted 2009-11-04T00:56:21.103

Reputation: 37 382