1

I have a Windows-only daemon running on a linux box with wine and Xvfb. Due to this quite experimental setup, the daemon crashes periodically and i'd like to implement some kind of mechanism to automatically restart the daemon. Currently i have a systemd unit definition with the Restart=always setting.

However, i have noticed that sometimes the daemon crashes, but doesn't exit it's process. This is the equivalent of displaying a dialog box with the question "The daemon is crashed, do you want to send an error report?". So, the process is still running, but the daemon has stopped working.

The only outside behaviour of this phenomenon which i can examine on my linux box are two new files, which appear at a certain location but with variable filenames (they're time dependent, and have a timestamp in their name). I think they are some kind of memory dump or stack traces which should originally be used for the error report sending.

So now i am looking for a solution for systemd to capture this solution, like

  1. On unit start, look at the crash dump target directory and make a snapshot of the directory contents
  2. Start the daemon
  3. Periodically look at the directory and if there are new files, which aren't in the snapshot, based on some regular expression, restart the daemon and refresh the snapshot.

I thought about a wrapper, written in bash or something, but there are two problems: First i would not know how to implement this behaviour, and second, this would make the use of systemd completely obsolete, since the script handles all the crash handling, and systemd would only execute the script.

I also thought about just periodically restarting the the daemon with systemd's given features, but this would be quite inefficient (given the fact that a windows daemon in a wine wrapper isn't inefficient in the first place), since it would restart the daemon sometimes when it's not necessary, or it would take some time after the daemon crash until the periodic restart kicks in.

What would be the best solution to solve this problem?

And just for the records: The daemon i am talking about is the Uploader for Google Photos. Google doesn't for some reason release it for Linux.

simonszu
  • 343
  • 5
  • 14

2 Answers2

6

Okay, i discovered the power of systemd.path.

I created a second service unit with ExecStart=systemctl restart daemon.unit and Type=oneshot. After that, i created a third unit, a path unit with PathModified=<crashdump output directory> and Unit=daemon-restart.unit.

It works now. I only have to make sure that no other process is writing to the output directory, but this is solvable with multiple different wineprefixes.

simonszu
  • 343
  • 5
  • 14
1

I think your issue is that your programme may be crashing, but wine is not, so systemd sees nothing wrong (PID is still around).

First off, you may find some help in the answers to this question: Start SystemD service conditionally?

I think you may need to detail out your needs just a little bit more (and/or, consider adjusting them to simplify the setup).

Basically, I think the solution will boil down to clever use of ConditionPathExistsGlob=, possibly in an auxiliary unit.

A hacky solution might involve a timer unit with such a PathExistsGlob condition, that might be restarting your main service. I would tend to want to have that timer unit also deal with the cleanup of files/dumps, rather than make the main unit do so, but that is almost certainly a matter of taste.

So, I would not touch what you have, but instead, add something like (NB: This is a guess, and not tested):

[Unit]
Description=Detect and recover issues with Uploader
After=uploader.service
Requires=uploader.service
PartOf=uploader.service
AssertPathExistsGlob=/srv/uploader/crash*.dump

[Service]
Type=oneshot
ExecStart=cleanup_script
Restart=on-success

The basic logic being:

  • you run this on a timer, say every 5 minutes (or whatever makes sense for your needs)
  • if the crash files are not there, the timer unit fails to start, and the main Uploader service keeps on keeping on
  • if the crash files are there, run some custom script to do the right thing with them, and restart our timer unit (which, because of the PartOf, should also restart the main Uploader service)

I am not saying this is a great solution, but it may be a solution

iwaseatenbyagrue
  • 3,588
  • 12
  • 22