1

I have a cluster running with torque to distribute jobs. I want to run a job with tensorflow code and I am having problems with tensorflow not being recognized.

I installed tensorflow on my LDAP user using anaconda and so I can enter the tensorflow environment in any node and run my code manually. My problem is that the torque job doesn't open up the environment when it runs and so I get "ImportError: No module named tensorflow" and my code doesn't run.

How can I tell the nodes to run my python file in a tensorflow environment?

This is how my torque job file looks

Note: Here I tried running the command that opens the environment, in other versions I didn't.

Thanks in advance for any help available.

Oha Noch
  • 121
  • 5
  • what is the error you get from the job error file? I am guessing it might be a path related issue. you might need to include the full path. export some variables in the job script or load a module. – Tux_DEV_NULL Oct 05 '17 at 12:21
  • In the error file I get a python error: "ImportError: No module named tensorflow" . So they pythons code does run but cant find the tensorflow module when I import it because it doesn't run in the tensorflow conda environment. – Oha Noch Oct 06 '17 at 05:16
  • ok. yes, this usually means the module has not been installed properly or the path has not been set. so check those things. – Tux_DEV_NULL Oct 06 '17 at 06:36

1 Answers1

1

Sorry, I forgot to reply when I got the answer.. If anyone sees this in the future the fix to my problem was exporting the PATH variable to the anaconda bin folder (where it could find the python binary that anaconda uses and that can access tensorflow):

export PATH="<path_to_anaconda_folder>/anaconda3/bin:$PATH"

Thanks to Tux_DEV_NULL for the help!

Also I added the following just in case to avoid any future cuda issues (I am using the gpu), I actually don't know if it is necessary, but maybe it ends up helping someone..:

export PATH=$PATH:/usr/local/cuda-8.0/bin
Oha Noch
  • 121
  • 5