Questions tagged [slurm]

Slurm Workload Manager (formerly known as Simple Linux Utility for Resource Management or SLURM), or Slurm, is a free and open-source job scheduler for Linux and Unix-like kernels, used by many of the world's supercomputers and computer clusters.

38 questions
1
vote
1 answer

How to upgrade Slurm?

I've been asked to upgrade our Slurm Workload Manager installation. I have a slurm 2.3.4 on a Debian 7.0 wheezy cluster (1 master + 8 nodes). I've not installed it so I'm a bit confused about how to do this and how to proceed without destroying…
Sasha Grievus
  • 223
  • 2
  • 11
1
vote
0 answers

program on cluster exceeds RSS memory limit

I have been trying to run a python script on a computer cluster but keep running into a error saying that RSS memory limit exceeded. I am using this program to analyse a data set consisting of around 40000 cases. I have tried it on my pc for 1000…
MSB
  • 111
  • 2
1
vote
0 answers

Slurm not filtering sacct results by date

We're using Slurm as a resource manager on our Beowulf cluster, so I installed Slurm on my workstation to test out my scripts before I submit them to the cluster. When I try to list old jobs on my workstation, sacct won't filter them by date. $…
Don Kirkby
  • 1,154
  • 3
  • 10
  • 23
1
vote
0 answers

ssh directly into a specific node on a cluster, without first ssh into login node?

I usually log on to a cluster, start a slurm interactive job, then I am able to ssh into specific running nodes. My questions is, is it generally possible to ssh into a specific node from my local machine, without first ssh-ing into a login node? I…
georg
  • 111
  • 2
1
vote
1 answer

Computer cluster admin: how to limit users running program but permit file transferring

I am managing a small computer cluster with slurm on CentOS 7. I want to discourage users to run programs on login node. This can be achieved by adding user hard cpu 1 to file /etc/security/limits.conf. However, I do not want file transferring…
wdg
  • 143
  • 1
  • 5
1
vote
1 answer

slurm nvidia-docker ignores CUDA_VISIBLE_DEVICES

I have a problem running nvidia-docker containers on a slurm cluster. When inside the container all gpus are visible so basically it ignores the CUDA_VISIBLE_DEVICES set env by slurm. Outside the container the visible gpus are correct. Is there a…
1
vote
1 answer

Wrong LDAP user ID is mapped into Slurm account management service

I configured a Slurm head node as follows: sssd to contact openLDAP slurmctld/slurmdbd/slurmd/munged to act as the Slurm controller and compute node ...where ray.williams is an LDAP user. Its UID can be mapped on the node. SSH login works…
Nicolas De Jay
  • 177
  • 1
  • 9
1
vote
0 answers

View/request instruction sets available on SGE host

How can I view or request hosts that can handle a particular instruction set in SGE? With Slurm, to view available instruction sets on each host I can use sinfo --Node -o '%n %f', and to submit a batch job only to, e.g., hosts with the AVX2…
1
vote
0 answers

Slurmd remains inactive/failed on start

I currently have a cluster of 10 worker nodes managed by Slurm with 1 master node. I have previously successfully set up the cluster, after some teething problems, but managed to get it working. I put all my scripts and instructions on my GitHub…
1
vote
0 answers

Slurm Error: “If using PrologFlag=Contain for pam_slurm_adopt, either proctrack/cgroup or proctrack/crau_aries is required.”

I'm using the flag x11 (PrologFlags=x11) in my slurm.conf file and jobs with x11 works perfectly, but I am getting this error every time I run a slurm command (e.g. sbatch, srun, sacctmgr): scontrol: error: If using PrologFlag=Contain for…
1
vote
1 answer

Single-node SLURM server: restrict interactive CPU usage

I have SLURM setup on a single node, which is also a 'login node'. I would like to restrict interactive CPU usage, e.g. outside the scheduling system. I found the following article which suggests to use cgroups for this:…
Compizfox
  • 375
  • 1
  • 6
  • 17
0
votes
1 answer

slurm salloc and how it get user login

Some theoretical question. I understand that the better way to know is to look at the code, but maybe I can do some cheat and just ask about it? I wonder that after salloc user can log in to the node. How does it work? Does salloc add user to…
Black S.
  • 35
  • 3
0
votes
1 answer

Trouble Installing slurm on Fedora 29

When I run slurmd, it gives a -bash: slurmd: command not found. I ran sudo yum install slurm to install slurm. I don't know why it isn't working, or if I installed all the required packages for slurm.
user3273814
  • 213
  • 3
  • 8
0
votes
1 answer

What does "CPU Minutes" mean exactly?

I'm actually trying to report cluster utilization in Slurm but i don't understand the metric CPU Minutes. [root@XXXX]# sreport cluster Utilization Start=2018-12-01…
m4hmud
  • 3
  • 3
0
votes
1 answer

Allow other users to cancel jobs

I have a test cluster with Slurm in which I would like that other users where able to cancel other users' jobs. By default, the users are able to cancel their own jobs. How can I define several administrators? My Slurm configuration…
Bub Espinja
  • 101
  • 3