26

I've used Daemontools to provide a simple and reliable way to supervise Unix services on my servers. It works well, but it requires a different way of thinking (The DJB Way) and some common complaints are:

  • TAI64N based timestamps
  • Doesn't store scripts under /etc/init.d (or (/usr/local)/etc/rc.d)
  • Doesn't always work with scripts like apachectl. Some scripts need to be rewritten.

I remember that some similar "supervisor/watchdog" daemons were in the works about two years ago, but some were still a little rough around the edges.

If you have switched from Daemontools to something else, what did you choose and did it work well for you? Does RedHat or Ubuntu come with any process supervisor utilities by default?

Stefan Lasiewski
  • 22,949
  • 38
  • 129
  • 184

9 Answers9

16

Hrm, if you're using Ubuntu, their new init process, upstart, includes a level of process supervision. It can be used for your standard starting and stopping of services, a la SysV init scripts, and it can also monitor running applications and respawn them if they die.

You can also implement a poor man's process restarter via inittab, depending on what your needs are.

If you're primarily looking for something to keep an eye on a process, to make sure it's always running, and then restart it when it isn't, I've had great luck with restartd. Unfortunately, the only source for it that I know of is the Debian package. However, it's a very small and simple application, basically just a single .c and .h file, with a make file. Compiling it from the Debian source tarball on Red Hat is trivial (I even made an RPM of it at my previous job).

A final option I've heard of, but not used, is Supervisor. It looks like a promising tool, but restartd has worked well enough for me in the past, for what I needed, that I haven't yet bothered to play with it.

Christopher Cashell
  • 8,999
  • 2
  • 31
  • 43
14

+1 for runit. More features and flexible than daemontools, compatible with existing daemontools arguments and options. Pretty neat.

But as you mentioned a lot of tools come with their own control binaries, apache2ctl, ejabberdctl, poundctl, collectd, etc. And although hacks exist, sometimes its just better to stick to the supplied tools, mostly when you are not sure of the cleanest possible implementation. I usually do a compromise, and have most of the services run under runit's supervision. And the others can be allowed to run using the trivial way.

Mohit Chawla
  • 486
  • 1
  • 4
  • 11
  • 1
    +1 It's worth mentioning that the [`runsv`](http://smarden.org/runit/runsv.8.html) command from `runit` supports custom controls, so that a restart could be implemented in terms of a daemon's native control binaries. – pilcrow Sep 26 '12 at 01:37
6

Well, there's runit. I can't tell you what its differences and similarities with daemontools are, but judging by the Berstein-esque website, I'd say there is a definite Bernstein influence.

Steven Monday
  • 13,019
  • 4
  • 35
  • 45
  • 2
    My vote is for runit, given that you can drop it into a SysVInit arrangement and have it take over /etc/init.d/ fairly transparently. – Avery Payne Aug 20 '14 at 19:14
4

Fedora seems poised to switch to systemd: http://0pointer.de/blog/projects/systemd.html

Mark Wagner
  • 17,764
  • 2
  • 30
  • 47
4

As an alternative to the already mentioned daemonize and daemontools, there is the daemon command of the libslack package.

daemon is quite configurable and does care about all the tedious daemon stuff such as automatic restart, logging or pidfile handling.

nazu
  • 49
  • 1
3

There's supervisord

ptman
  • 27,124
  • 2
  • 26
  • 45
3

There's also libslack's daemon tool that is written in C and available for various (Unix) platforms.

It is quite configurable and does care about all the tedious daemon stuff such as automatic restart, logging or pidfile handling.

chad
  • 39
  • 1
2

Ubuntu comes with Upstart -- I don't know much about it but I know it does have "supervisor" capabilities. Apple's launchd is another option (that Wikipedia article has a nice "see also" section that lists a bunch of others too, including Upstart & RunIt).

All of them have their good points and their own special brand of übersuck - Whenever someone asks me about "process supervisor"/"watchdog" programs I always ask the same question: Why do you need one?

voretaq7
  • 79,345
  • 17
  • 128
  • 213
-2

There are no popular/community-consensus tools for this because everyone who goes down this road realizes its a dead end. If your long running processes fail too often for simple monitoring to be good enough, then stop using them and move your code inside something that will be more event driven.

edit: as Chris points out below sometimes you're completely cornered, in which case a */1 cron job that looks for the process/pidfile, runs a start/restart if its missing, and outputs the results in an email to the responsible developer/product-manager is your fallback position.

cagenut
  • 4,808
  • 2
  • 23
  • 27
  • 3
    Easier said, than done. ;-) Sometimes you have applications that you are forced to run, regardless of how unstable or crappy they might be, and anything you can do to keep them running will help reduce the 3am phone calls. Not ideal, by any means, but sometimes it's as good as it gets. – Christopher Cashell Oct 19 '10 at 16:47
  • 1
    This answers misses out on two features of processor supervisors: the ability to manage groups of processes as a single unit and the ability to manage dependencies. For example, your web site may involve a web server, database server, and several web applications running as external processes. These processes may have dependencies -- e.g., the database needs to be up before the web application. A good process supervisor will let you start and stop this group of processes with a single command, and will make sure that things start up in the correct order. – larsks Mar 09 '12 at 01:40
  • 1
    In an ideal world, everything would just work perfectly. Unfortunately this is just not an ideal world. – hookenz Dec 17 '12 at 03:46
  • The problem is not failing too often. The problem is failing once a week and not being restarted *immediately*. This is not a real answer. – dan3 Nov 18 '13 at 06:20
  • @ChristopherCashell is on the right track. Supervision *inside* an app is usually over-engineering (and it also happens to not be UNIX Philosophy.) Software can assumed to always be imperfect, no matter how much proactive effort is poured in to fix every crash. Supervision is a distinct, external layer... an insurance policy. It's better to keep production services going no matter what, even if they're "not supposed to crash," because the reality is sh%t happens. I'd rather a service restart, log the exception and fix it in the morning. (Service flapping is another case to consider.) –  Sep 12 '15 at 03:44