Can a process that was started in a tmux session fall asleep?

0

Can a process that was started in a tmux session fall asleep? If yes, what is the cause(s), how to prevent it?

Example reason for the question: I started a process on a server yesterday (training neural networks, it prints the current training epoch to stdout). I had a split window, and in the one with the process running, I had activated scroll mode before detaching from the session.

Today I come back, and it has made no progress at all.

More specifically, the epoch is the same. After quitting scroll mode, it now happily continued.

The log reads something like

...
Epoch 40: 1h few mins
Epoch 41: 12h few mins
Epoch 42: 12h few more mins
...
Epoch 73: 13h

Meaning, the time it took to get from epoch 0 to 49 was definitely less than two hours; from epoch 40 to 41 it took around 11 hours (!), from epoch 41 to 76 average time per epoch was around 1.7 minutes. The epochs are in a loop, and there shouldn't be a reason why one takes around 400 times longer than the others.


Additional information: This 'sleeping' doesn't happen every time I detach while being in scroll mode. But it already happened before. The scroll mode might not have anything to do with it at all.

The program is a python script, including tensorflow code running on a GPU; the command to run it was :

python train_script.py 2>&1 | tee train_log.txt.

For tmux I use tmux attach to re-attach, the standard key mapping and ctrl-b + d to detach, ctrl-b + up(number block) to start scrolling, q to quit scroll mode.

dasWesen

Posted 2018-08-11T11:31:18.190

Reputation: 101

Answers

0

Can a process that was started in a tmux session fall asleep?

Basically all tmux doing is attaching own file descriptors in place of STDIN/STDOUT/STDERR to a running process inside of tmux that allows it to work while detached from console.

Below is a simple script you can run using the same workflow(attaching/detaching from tmux session) you described:

#!/bin/sh

c=1000

while [ $c -ne 0 ]; do
  date '+%Y-%m-%dT%H:%M:%S' | tee -a log.txt
  sleep 1
done

even if you would switch to the scroll mode and then detached from tmux session, it would still continue running, you can check log.txt file, so it isn't an issue with tmux.

Alex

Posted 2018-08-11T11:31:18.190

Reputation: 5 606

Ok, so this does not ususally happen. But your example does not exclude that it can happen, and that tmux somehow has an influence. Maybe the interaction with the GPU, or with the python interpreter, causes it? Also, this 'sleeping' doesn't happen every time I detach while being in scroll mode. – dasWesen – 2018-08-11T12:26:13.443

A bunch of people using GPU for mining bitcoins without monitors, so I don't think it is a tmux or GPU issue. Do you run python virtual environment before using tmux?If yes, try to exit it and run python virtual environment inside of tmux. Also if you using anaconda, some its version don't support parallel environments. – Alex – 2018-08-11T12:42:18.183

Tonight, I'll write down where exactly the process is, to see whether it's just tensorflow taking random breaks. But I believe epoch 40 was the last thing visible in scroll mode this morning, but I'll try to really make sure it is correlated to tmux in this way. – dasWesen – 2018-08-11T12:45:04.750

No, I don't use a python virtual environment on the server. But anaconda I use. Maybe that's it, but then there's probably nothing I can do, except regularly looking what the process is doing. Thanks for your suggestions. – dasWesen – 2018-08-11T12:46:32.600

So it probably anaconda, read this tread: https://github.com/openai/universe-starter-agent/issues/9. Every time when I investigating kinda the same issue, it turns out that it isn't tmux fault for sure.

– Alex – 2018-08-11T12:52:48.480

One more clue: https://unix.stackexchange.com/questions/366553/tmux-is-causing-anaconda-to-use-a-different-python-source. Also to exclude tmux from been guilty, you can try to run tensorflow with nohup instead of tmux, but Im pretty sure it is a pythont environment that screw things up.

– Alex – 2018-08-11T13:00:21.167

As said, there is no python environment involved - no conda env, no python virtualenv. Python versions and paths are the exact same both within and outside the tmux session. But will try nohup! – dasWesen – 2018-08-11T15:40:46.017

0

I know I'm late, but I've had the same thing happen to me a few times. The environment is a little different, I'm running a python script on a slurm front end, which submits jobs, moves files, sumbits more jobs etc. A single compute job usually takes about an hour.

I started my python script one day in the evening, checked on it a few times and then left tmux in scroll mode, detached and checked on the script in the morning. It seemed to be stuck, so I checked to see if any jobs were currently running, none were. I checked if the expected files were present, which they were not. My script didn't print its "all jobs successful" note, so clearly it was still running, just not doing anything. I left scroll mode, and suddenly the script continued, produced a lot more output and lo and behold, submitted another batch of compute jobs.

Now, this could just be odd timing, and unfortunately, I don't have iterating milestones with time stamps to see how long it got stuck, but this is the third time this has happened, I'm really doubting this is coincidental timing.

Did you ever figure out why/if your script got stuck? I will exit scroll mode from now on before detaching and see if it makes a difference.


Edit: Apparently, this used to be a known bug in tmux, but no note whether it has been fixed: https://github.com/tmux/tmux/issues/431. The tmux version on the machine I'm working on is quite outdated: tmux 1.8. So, in essence, the workaround would be:

Always exit scroll mode and detach properly from tmux.

RemusKaos

Posted 2018-08-11T11:31:18.190

Reputation: 11

Hi, it was a while ago but if my memory is correct, unfortunately I never figured out what was causing it. I guess that was also my solution: Trying not to forget to exit scroll mode. -- Thanks for the link. – dasWesen – 2019-08-05T10:37:34.683