0

I have a script which starts a daemon process and then sleeps for 20 seconds. If I run the script on SLES11 SP1 or RHEL6 then after the script exits the process is still running.

If I run the script on SLES11 SP3 or RHEL6.3 then after the script exits the process is no longer running. The process continues to run for the entire 20 second sleep and is killed when the process exits.

The script is run via expect so the script's entire shell exits with the process. Obviously if this wasn't a daemon it was starting I wouldn't be surprised. Also, I suspect the problem isn't the OS version as much as it is the difference in the way we've setup the newer servers (no idea what those differences are though, the older servers were set up years ago).

During the 20 seconds the process runs if I do a ps I get the following:

root      4699     1  0 15:14 pts/2    00:00:00 sudo -u openmq /opt/PacketPortal/openmq/default/bin/imqbrokerd -bgnd -autorestart -silent -port 7676 -Dimq.service.activelist=admin,ssljms -D
openmq    4701  4699  0 15:14 pts/2    00:00:00 /bin/sh /opt/PacketPortal/openmq/default/bin/imqbrokerd -bgnd -autorestart -silent -port 7676 -Dimq.service.activelist=admin,ssljms -Dimq.ssl
openmq    9095  9063 54 16:21 pts/2    00:00:02 /usr/java/latest/bin/java -cp /opt/PacketPortal/openmq/default/bin/../lib/imqbroker.jar:/opt/PacketPortal/openmq/default/bin/../lib/imqutil.jar:/opt/PacketPortal/ope

The fact that the parent process of 4699 is 1 seems to suggest to me that the process has been correctly daemonized. However, after the expect script exits both 4699 and 4701 are killed. What could be causing this?

UPDATE

I've printed the same output on the servers that work. During the 20 second sleep I get:

openmq   18652     1  0 15:44 pts/1    00:00:00 /bin/sh /opt/PacketPortal/openmq/default/bin/imqbrokerd -bgnd -autorestart -silent -port 7676 -Dimq.service.activelist=admin,ssljms -Dimq.ssljms.tls.port=7680
openmq   18686 18652  8 15:44 pts/1    00:00:02 /usr/java/latest/bin/java -cp /opt/PacketPortal/openmq/default/bin/../lib/imqbroker.jar:/opt/PacketPortal/openmq/default/bin/../lib/imqutil.jar:/opt/PacketPortal/ope

After the 20 second sleep I get:

openmq   18652     1  0 15:44 ?        00:00:00 /bin/sh /opt/PacketPortal/openmq/default/bin/imqbrokerd -bgnd -autorestart -silent -port 7676 -Dimq.service.activelist=admin,ssljms -Dimq.ssljms.tls.port=7680
openmq   18686 18652  5 15:44 ?        00:00:02 /usr/java/latest/bin/java -cp /opt/PacketPortal/openmq/default/bin/../lib/imqbroker.jar:/opt/PacketPortal/openmq/default/bin/../lib/imqutil.jar:/opt/PacketPortal/ope

After the script exits it disconnects the controlling terminal. I wonder why it doesn't do that on the newer servers.

UPDATE

Here is the section of the script that actually launches OpenMQ. The -bgnd flag is what is supposed to daemonize it.

sudo -u openmq $IMQ_HOME/bin/$EXECUTABLE -bgnd $BROKER_OPTIONS $ARGS > /dev/null 2>&1 &

UPDATE

Some truly bizarre behavior I discovered by accident. If I change the command to:

sudo -u openmq sldkhglksj; $IMQ_HOME/bin/$EXECUTABLE -bgnd $BROKER_OPTIONS $ARGS > /dev/null 2>&1 &

Then I get sldkhglksj: command not found of course but...the openmq process is not killed. If I take that one change out, it is killed.

UPDATE

In retrospect, it appears that magical command changes the sudo to not run on the actual openmq startup which leads me to believe that sudo is somehow involve.

Pace
  • 235
  • 2
  • 11
  • Can you share the script? – ewwhite Dec 11 '12 at 22:52
  • Is there anything in the log file? Have you tried launching the process manually w/o dumping STDOUT and STDERR to /dev/null to see if there is any output? Do you have SELinux enabled? – Zypher Dec 11 '12 at 23:05
  • I changed it to redirect to a logfile, no output. I get command not found when I try to run sestatus so I assume that means SELinux is not enabled. – Pace Dec 11 '12 at 23:18
  • The actual daemon log file (specified as part of the $BROKER_OPTIONS) prints a shutdown message after the 20 seconds has elapsed. This is the same shutdown message I would get if I were to kill the process. – Pace Dec 11 '12 at 23:20
  • Does this help? http://serverfault.com/questions/117152/do-background-processes-get-a-sighup-when-logging-off – Massimo Dec 11 '12 at 23:29
  • @Massimo Nope, the process is not getting launched in a bash shell. Even if it were the huponexit is set to false on both machines. – Pace Dec 11 '12 at 23:53

4 Answers4

2

You might be running into this issue that is documented here: https://access.redhat.com/knowledge/solutions/180243.

It states that the sudo behavior for actions similar to the one you have described have changed in the version that ships with RHEL/CentOS 6.3 (sudo-1.7.4p5-11.el6.x86_64). The fact that you see different behavior between RHEL 6 and 6.3 and that this involves sudo is the reason why I am pointing this out.

Some options to try (I don't have a 100% answer, just throwing out ideas):

  • If you have root level access, which it looks like you do, try to run the script without using sudo, something like su -c '/opt/PacketPortal/openmq/default/bin/imqbrokerd -bgnd -autorestart -silent -port 7676 -Dimq.service.activelist=admin,ssljms -D' - openmq - See http://www.linfo.org/su.html for more info
  • Install an older version of sudo to work around this (hacky, I know, but you could build/install it in a temporary location to test it out)
  • Look into the huponexit shopt in the answer that Massimo references, that sounds promising if this isn't the sudo issue I mentioned above
Tony Cesaro
  • 182
  • 4
  • That was it. An older version of sudo worked. I tried upgrading to the latest version of sudo, no luck. The su command also worked and is likely the approach we will take. – Pace Dec 12 '12 at 14:49
1

You can prefix the command you wish to daemonize within the scripting:

  nohup command-that-you-want-to-demonize &

Then when the outside script completes the program will continue to run.

mdpc
  • 11,698
  • 28
  • 51
  • 65
  • The process we call is supposed to start itself up as a daemon, it should not need to be called with nohup. Reinforcing this fact is that our old servers call the process in the exact same way and don't suffer from this issue. Similar to calling service ntpd start. – Pace Dec 11 '12 at 22:45
  • Does my suggestion solve the specific problem? – mdpc Dec 11 '12 at 22:50
  • Nope. I changed the script to use nohup before the line I posted above and that had no effect. – Pace Dec 11 '12 at 23:11
  • By before, I mean, at the beginning of the line, not as a separate line. – Pace Dec 11 '12 at 23:20
1

Try adding a disown on a line of its own after you background the process. This should prevent your shell from sending signals to any child processes as it exits.

chutz
  • 7,569
  • 1
  • 28
  • 57
0

Try adding </dev/null to the startup command as well.

Not sure how exactly the -bgnd flag is supposed to background your process, but processes can die if their standard input gets lost, which is exactly what happens when you lose the ssh connection. You are already throwing away all output to the bitbucket, you may want to make sure there is no input either.

I cannot help explaining the change of behavior, but my suggestion is to just live with it.

chutz
  • 7,569
  • 1
  • 28
  • 57