0
Can a process that was started in a tmux session fall asleep? If yes, what is the cause(s), how to prevent it?
Example reason for the question: I started a process on a server yesterday (training neural networks, it prints the current training epoch to stdout). I had a split window, and in the one with the process running, I had activated scroll mode before detaching from the session.
Today I come back, and it has made no progress at all.
More specifically, the epoch is the same. After quitting scroll mode, it now happily continued.
The log reads something like
...
Epoch 40: 1h few mins
Epoch 41: 12h few mins
Epoch 42: 12h few more mins
...
Epoch 73: 13h
Meaning, the time it took to get from epoch 0 to 49 was definitely less than two hours; from epoch 40 to 41 it took around 11 hours (!), from epoch 41 to 76 average time per epoch was around 1.7 minutes. The epochs are in a loop, and there shouldn't be a reason why one takes around 400 times longer than the others.
Additional information: This 'sleeping' doesn't happen every time I detach while being in scroll mode. But it already happened before. The scroll mode might not have anything to do with it at all.
The program is a python script, including tensorflow code running on a GPU; the command to run it was :
python train_script.py 2>&1 | tee train_log.txt.
For tmux I use tmux attach
to re-attach, the standard key mapping and ctrl-b + d
to detach, ctrl-b + up(number block)
to start scrolling, q
to quit scroll mode.
Ok, so this does not ususally happen. But your example does not exclude that it can happen, and that tmux somehow has an influence. Maybe the interaction with the GPU, or with the python interpreter, causes it? Also, this 'sleeping' doesn't happen every time I detach while being in scroll mode. – dasWesen – 2018-08-11T12:26:13.443
A bunch of people using GPU for mining bitcoins without monitors, so I don't think it is a tmux or GPU issue. Do you run python virtual environment before using tmux?If yes, try to exit it and run python virtual environment inside of tmux. Also if you using anaconda, some its version don't support parallel environments. – Alex – 2018-08-11T12:42:18.183
Tonight, I'll write down where exactly the process is, to see whether it's just tensorflow taking random breaks. But I believe epoch 40 was the last thing visible in scroll mode this morning, but I'll try to really make sure it is correlated to tmux in this way. – dasWesen – 2018-08-11T12:45:04.750
No, I don't use a python virtual environment on the server. But anaconda I use. Maybe that's it, but then there's probably nothing I can do, except regularly looking what the process is doing. Thanks for your suggestions. – dasWesen – 2018-08-11T12:46:32.600
So it probably anaconda, read this tread: https://github.com/openai/universe-starter-agent/issues/9. Every time when I investigating kinda the same issue, it turns out that it isn't
– Alex – 2018-08-11T12:52:48.480tmux
fault for sure.One more clue: https://unix.stackexchange.com/questions/366553/tmux-is-causing-anaconda-to-use-a-different-python-source. Also to exclude
– Alex – 2018-08-11T13:00:21.167tmux
from been guilty, you can try to run tensorflow withnohup
instead oftmux
, but Im pretty sure it is a pythont environment that screw things up.As said, there is no python environment involved - no conda env, no python virtualenv. Python versions and paths are the exact same both within and outside the tmux session. But will try nohup! – dasWesen – 2018-08-11T15:40:46.017