I am the administrator of a cluster running on CentOS and using SLURM to send jobs from a login node to compute nodes. Recently, a user complained about some unexpected behaviour with their jobs. If a user starts a job with srun
and then logs out, the job keeps running as expected. However, when the user is disconnected by a SSH timeout, the job is killed. I've replicated this behaviour by killing a shell running a job using kill -1 ShellJobID
and the job is killed. Examining the SLURM logs indicates that the job actually received a SIGKILL
and not a SIGHUP
based on the line WSIGTERM 9
. Additionally, if I run kill -1 ActiveSrunJob
, the jobs exits with WSIGTERM 9
. What about logging out using exit
prevents the SLURM job from being cancelled? I was under the impression, and research seems to back that, SIGHUP
is propagated to a shell's children on logout. Am I missing something or completely off base?
Asked
Active
Viewed 725 times
0
TheOneHyer
- 1
- 4
-
See [here](https://serverfault.com/questions/117152/do-background-processes-get-a-sighup-when-logging-off) and [here](https://serverfault.com/questions/115999/if-i-launch-a-background-process-and-then-log-out-will-it-continue-to-run). – Massimo Dec 11 '17 at 22:24
-
1I had come across the first link before, but not the second. Thanks for your quick reply. Using both your links, the BASH manual, and [this page](https://unix.stackexchange.com/questions/318369/what-happens-to-background-jobs-after-exiting-the-shell), I was able to fully understand that this is the expected behaviour. I think the explanation given in the link I put above helped me out the most. Thanks for the help and pointing me in the right direction, Massimo! – TheOneHyer Dec 11 '17 at 22:42