I have a script which starts a daemon process and then sleeps for 20 seconds. If I run the script on SLES11 SP1 or RHEL6 then after the script exits the process is still running.
If I run the script on SLES11 SP3 or RHEL6.3 then after the script exits the process is no longer running. The process continues to run for the entire 20 second sleep and is killed when the process exits.
The script is run via expect so the script's entire shell exits with the process. Obviously if this wasn't a daemon it was starting I wouldn't be surprised. Also, I suspect the problem isn't the OS version as much as it is the difference in the way we've setup the newer servers (no idea what those differences are though, the older servers were set up years ago).
During the 20 seconds the process runs if I do a ps I get the following:
root 4699 1 0 15:14 pts/2 00:00:00 sudo -u openmq /opt/PacketPortal/openmq/default/bin/imqbrokerd -bgnd -autorestart -silent -port 7676 -Dimq.service.activelist=admin,ssljms -D
openmq 4701 4699 0 15:14 pts/2 00:00:00 /bin/sh /opt/PacketPortal/openmq/default/bin/imqbrokerd -bgnd -autorestart -silent -port 7676 -Dimq.service.activelist=admin,ssljms -Dimq.ssl
openmq 9095 9063 54 16:21 pts/2 00:00:02 /usr/java/latest/bin/java -cp /opt/PacketPortal/openmq/default/bin/../lib/imqbroker.jar:/opt/PacketPortal/openmq/default/bin/../lib/imqutil.jar:/opt/PacketPortal/ope
The fact that the parent process of 4699 is 1 seems to suggest to me that the process has been correctly daemonized. However, after the expect script exits both 4699 and 4701 are killed. What could be causing this?
UPDATE
I've printed the same output on the servers that work. During the 20 second sleep I get:
openmq 18652 1 0 15:44 pts/1 00:00:00 /bin/sh /opt/PacketPortal/openmq/default/bin/imqbrokerd -bgnd -autorestart -silent -port 7676 -Dimq.service.activelist=admin,ssljms -Dimq.ssljms.tls.port=7680
openmq 18686 18652 8 15:44 pts/1 00:00:02 /usr/java/latest/bin/java -cp /opt/PacketPortal/openmq/default/bin/../lib/imqbroker.jar:/opt/PacketPortal/openmq/default/bin/../lib/imqutil.jar:/opt/PacketPortal/ope
After the 20 second sleep I get:
openmq 18652 1 0 15:44 ? 00:00:00 /bin/sh /opt/PacketPortal/openmq/default/bin/imqbrokerd -bgnd -autorestart -silent -port 7676 -Dimq.service.activelist=admin,ssljms -Dimq.ssljms.tls.port=7680
openmq 18686 18652 5 15:44 ? 00:00:02 /usr/java/latest/bin/java -cp /opt/PacketPortal/openmq/default/bin/../lib/imqbroker.jar:/opt/PacketPortal/openmq/default/bin/../lib/imqutil.jar:/opt/PacketPortal/ope
After the script exits it disconnects the controlling terminal. I wonder why it doesn't do that on the newer servers.
UPDATE
Here is the section of the script that actually launches OpenMQ. The -bgnd flag is what is supposed to daemonize it.
sudo -u openmq $IMQ_HOME/bin/$EXECUTABLE -bgnd $BROKER_OPTIONS $ARGS > /dev/null 2>&1 &
UPDATE
Some truly bizarre behavior I discovered by accident. If I change the command to:
sudo -u openmq sldkhglksj; $IMQ_HOME/bin/$EXECUTABLE -bgnd $BROKER_OPTIONS $ARGS > /dev/null 2>&1 &
Then I get sldkhglksj: command not found
of course but...the openmq process is not killed. If I take that one change out, it is killed.
UPDATE
In retrospect, it appears that magical command changes the sudo to not run on the actual openmq startup which leads me to believe that sudo is somehow involve.