Questions tagged [pbs]

29 questions
1
vote
5 answers

I get the error qsub: Bad UID for job execution when trying to submit a job via PBS

OS Version: CentOS release 4.6 (Final) Kernel \r on an \m 2.6.9-100.ELsmp When I attempt to run a job it gives me the error as follows. qsub: Bad UID for job execution I have created a fresh user account and the same error occurs, yet other users…
1
vote
1 answer

Non-exclusive job scheduling in PBS/Torque

The cluster resource manager Torque typically allocates compute nodes on an exclusive basis. However, when you have a lot of small jobs (like we do) running against multi-core compute nodes, this can result in a lot of wasted resources. Is there…
ajdecon
  • 1,291
  • 4
  • 14
  • 21
0
votes
1 answer

How to launch a PBS job with hybrid MPI/Openmp

I would like to understand how a GROMACS job launched on my SGI cluster with PBS/Torque using a hybrid parallelization MPI/OpenMPI, works. The cluster is hyper-threading enabled and each node has 16 physical cores (32 logical). What I expect: use…
0
votes
1 answer

How to append PBS job output to log files instead of overwriting it?

I'm using Torque and I want my PBS log+error output to be appended to previous log/error file instead of overwriting it.
user121392
  • 13
  • 1
  • 6
0
votes
1 answer

Library not found only when application is executed from PBS file.

I have a compiled file a.out that runs fine when executed directly from my terminal. However, trying to execute that file from my PBS file yields a missing library libmkl_intel_lp64.so. I have already tried exporting the path of the library to…
user121392
  • 13
  • 1
  • 6
0
votes
1 answer

PBS, adding job - job added, seems that it didnt run at all

First of all, Im very new to clusters and PBS systems. I was told to prepare a simple script (which I did): #PBS -S /bin/bash #PBS -o host_out #PBS -e host_err #PBS -q batch hostname date exit 0 Then, I made it executable and I submitted with the…
mirx
  • 159
  • 2
  • 9
0
votes
1 answer

prevent normal users from running code on a cluster outside of pbs system

In our cluster with PBS batch system (torque) installed, we wish all the users execute their jobs by qsub so that the CPU resources can be well managed. However, it is found that users in our cluster can still directly run their programs directly…
0
votes
1 answer

Torque reports error when posting job to client nodes

The system has two machines, one (called macondo02) runs pbs_server and pbs_schedule, another (called macondo01) runs pbs_mom. I have ensured that the host can clearly identify the existance of the guest: $ pbsnodes -a macondo01 state = free np =…
0
votes
2 answers

Numerous pbs_server errors in /var/log/messages

On supercomputer's management node we receive numerous errors such as: pbs_server: LOG_ERROR::is_request, bad attempt to connect from 10.10.0.254:1023 (address not trusted - check entry in server_priv/nodes) And after them nearly every minute…
0
votes
1 answer

Only half of my pbs/Torque jobs are being scheduled

My supercomputing center recently moved from SGE to pbs/Torque. Now, when I schedule my array jobs, only half of the jobs in the array get scheduled. When they finish, the other half get scheduled. This happens despite the fact that they are largely…
vy32
  • 2,018
  • 1
  • 15
  • 20
0
votes
0 answers

torque/undelivered - collect finished jobs

I had the problem that due to an incorrectly configured ssh the pbs_mom couldn't send the finished data (.ER and .OU files) back to the server. After a look into the log, I could locate the data on the nodes in the /var/spool/torque/undelivered/…
stephanp
  • 21
  • 3
-1
votes
1 answer

PBS/Torque priority vs MPI program priority

We have a Cluster performing different tasks. It is computing simulations using the Torque scheduler. We also have an interactive simulation, which also needs the full computation power. The interactive simulation is an OpenMPI program, starting…
stephanp
  • 21
  • 3
-1
votes
1 answer

How can we configure torque with multiple nodes for a workstation?

I've got a GPU workstation with 48 core CPU + 4 NVIDIA GPU. I am going to make this machine to be a small cluster which contains: 4 nodes 12 core +1 CPU/node I've installed Torque in this machine with command: ./configure --without-tcl…
-1
votes
1 answer

Torque: How lock node cores to one application

I am having an issue that I do not know how to solve. Say I have 2 nodes (node1 and node2) each with 24 cores, I have software where I have a license for 32 cores. I want to be able to configure node1 to ONLY accept jobs for that software, and I…
James
  • 1
1
2