2

As stated, I have just installed Torque on a Ubuntu 16.04 machine. The submitted jobs complete just fine but the -e and -o flags seem to not be working. No error and log files are created even though I have

  • given the flag an absolute path to the log directory.

  • creating the log file in the directory before submitting the job.

I am certain that the PBS file works because I copied it from a machine which ran the job just fine.

The following is the PBS file mentioned. An apology in advance for my inability to indent the code blocks.

#! /bin/bash
#PBS -e /path/to/error.err
#PBS -o /path/to/log.log
#PBS -l nodes=1:ppn=8
#PBS -l walltime=1:00:00

cd /path/to/working/directory
execute function.binary

mkdir /backup/folder
cp -r /results/ /backup/folder

echo "Job complete." >> /path/to/log.log

edit: Thanks to /u/tux_DEV_NULL, I managed to solve it. I added the lines $no_spool_dir_list /home/ and $spool_as_final_name true to /var/spool/torque/mom_priv/config and everything worked as expected.

user121392
  • 13
  • 1
  • 6

1 Answers1

0

Anything in the torque server log files?

This looks like an issue with your spool setup. Do you see a undelivered directory in /var/spool/torque/spool? Do you have a mom node/service running?

I think by default stdout and stderr files are generated in placed in the spool directory as $JOBID.OU and $JOBID.ER then copied to the working directory. Unless you have the $nospool_dir_list setup so check that setting as well.

Tux_DEV_NULL
  • 1,083
  • 7
  • 11
  • Yes there is a undelivered directory in /var/spool/torque/spool. – user121392 Aug 15 '17 at 12:37
  • yes the machine is also serving a mom and node purpose too. How does $nospool_dir_list work and where should I check for it, qmgr? Thanks for the reply. – user121392 Aug 15 '17 at 12:39
  • $nospool_dir_list goes in the mom config. If your users are in /home, you can add "$nospool_dir_list /home". Also look at "$spool_as_final_name true". Set that in the mom config file and restart the mom service. That might be a easier solution. There also should be error messages to make sure you know what exactly causing the issue. I am guessing some copy or scp is failing. – Tux_DEV_NULL Aug 15 '17 at 12:58