runit does not kill process on sv stop or sv reload

Question

i am running a headless selenium process along a jenkins server on an AMI linux box, all managed by runit.

the problem is that issuing "sv stop selenium" or "sv reload selenium" do not term or kill the old instance along its child processes, but merely detach them from the runsv process, so they continue to run without runit knowing about them, resulting in a failing restart try of the service.

i think my question is kind of related to this: How to write runit custom stop script

meaning: i should probably try a custom d control script, in order to manually clean up.

I followed this idea: https://stackoverflow.com/questions/392022/best-way-to-kill-all-child-processes

However, cat'ing the pid from /etc/sv/selenium/supervise/pid and forwarding it to the loop didn't do any difference.

Any advice?

sv run script:

#!/bin/sh

exec 2>&1
exec chpst -u jenkins -U jenkins /usr/bin/xvfb-run \
--server-args="-screen 0 1024x768x32" \
/usr/bin/java -jar /usr/local/bin/selenium-server-standalone-2.42.1.jar \
-ensureCleanSession \
-browserSessionReuse

Because your child process wandered off to a new process group - so you can't get there from here. The answer from @András Korn is correct. — Avery Payne, Dec 17 '14 at 01:09

András Korn · Accepted Answer · 2021-02-28T20:51:57.120

5

If you add -P to the chpst command line, chpst will create a new process group for your service. Then in your custom 'd' script you can read pid and kill -TERM -pid to send the TERM signal to the entire process group.

This should work as long as no child process creates its own process group.

However, it might be cleaner to start your xvfb and java separately (split these into two runit services).

Edit: apparently the runsv manpage is misleading; runsv only actually runs the control/d script after it already killed its child. You should use a control/t script to clean up. Thanks to @Keith for pointing this out.

edited Feb 28 '21 at 20:51

answered Sep 02 '14 at 22:02

András Korn

641
5
13

Note you would want to use the 't' script, not the 'd' script. With the d script, the process will already be killed and the process group is not retrievable in the script. – Keith Feb 24 '21 at 05:25
That's not what the man page says: "For each control character c sent to the control pipe, runsv first checks if service/control/c exists and is executable. If so, it starts service/control/c and waits for it to terminate, before interpreting the command. If the program exits with return code 0, runsv refrains from sending the service the corresponding signal." – András Korn Feb 25 '21 at 10:00
1

Indeed, András, that is exactly what the man page says. While I was spending quite some time debugging what was really going on, I wrote control scripts that logged info on various states and I found that what I stated above is the reality, which does indeed conflict with the theory in the man page. Hence my comment here in the hopes it might save somebody else the same work. It might be of interest to readers if you actually have proven results otherwise, maybe with a different version, or a distinct operating system behaves differently. Or are you just blindly quoting the man page? – Keith Feb 28 '21 at 20:01
1

Maybe taking a look at the source code might shed some light. My knowledge of forking and process managment is not good enough to say, but I see differences for sure between 't' and 'd'. https://github.com/vulk-archive/runit/blob/master/src/runsv.c#L246 https://github.com/vulk-archive/runit/blob/master/src/runsv.c#L324 https://github.com/vulk-archive/runit/blob/master/src/runsv.c#L340 – Keith Feb 28 '21 at 20:12
Actually, check stopservice() at runsv.c:246. Seems pretty clear. I tried to paste code in here but it's ugly. as are the links above. :-/ – Keith Feb 28 '21 at 20:19
OK, looking at the code it seems you're right and the documentation is wrong, or at least misleading; I'll report this as a bug. Thanks for pointing it out. From my reading of the code, what happens on `d` is: 1. the service "want" status is updated to be `want_down`; 2. if the service is in the "running" state and a child is running, the `control/t` script is run (if it exists etc.); 3. if it returns successfully, the child is sent a CONT signal; 4. the `custom/d` script is *also* run. At this point the process group may still exist, though, but you're right, it's better to use t. – András Korn Feb 28 '21 at 20:49
@Keith, I did file a bug report: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=983726 and there is now also a discussion thread about what the correct behaviour would be, at https://www.mail-archive.com/supervision@list.skarnet.org/msg02832.html -- in case you want to weigh in. – András Korn Feb 21 '22 at 20:05
How do you read pid(s) in the `control/t`? – Animesh Sahu Mar 30 '22 at 07:48
1

The pid is in `./supervise/pid`. – András Korn Mar 30 '22 at 13:36

runit does not kill process on sv stop or sv reload

1 Answers1