Slurm: Have two separate queues for GPU and CPU-only jobs

Question

At the moment, we have set up Slurm to manage a small cluster of six nodes with four GPUs each. That has been working great so far, but now we want to utilize the Intel Core i7-5820K CPUs for jobs which only require CPU processing power. Each CPU has six cores and 12 threads, each GPU requires one thread/logical core, so there are 8 threads remaining (per node) which could be used for "CPU-only" jobs.

Current configuration:

cat /etc/slurm-llnl/gres.conf

Name=gpu File=/dev/nvidia0
Name=gpu File=/dev/nvidia1
Name=gpu File=/dev/nvidia2
Name=gpu File=/dev/nvidia3

cat /etc/slurm-llnl/slurm.conf (excerpt)

SchedulerType=sched/builtin
SelectType=select/cons_res
SelectTypeParameters=CR_Core
AccountingStorageType=accounting_storage/none
GresTypes=gpu
MaxTasksPerNode=4

NodeName=node1 CoresPerSocket=4 Procs=8 Sockets=1 ThreadsPerCore=2 Gres=gpu:4 State=UNKNOWN
NodeName=node2 CoresPerSocket=4 Procs=8 Sockets=1 ThreadsPerCore=2 Gres=gpu:4 State=UNKNOWN
NodeName=node3 CoresPerSocket=4 Procs=8 Sockets=1 ThreadsPerCore=2 Gres=gpu:4 State=UNKNOWN
NodeName=node4 CoresPerSocket=4 Procs=8 Sockets=1 ThreadsPerCore=2 Gres=gpu:4 State=UNKNOWN
NodeName=node5 CoresPerSocket=4 Procs=8 Sockets=1 ThreadsPerCore=2 Gres=gpu:4 State=UNKNOWN
NodeName=node6 CoresPerSocket=4 Procs=8 Sockets=1 ThreadsPerCore=2 Gres=gpu:4 State=UNKNOWN

PartitionName=gpu Nodes=node[2-6] Default=NO Shared=NO MaxTime=INFINITE State=UP
PartitionName=short Nodes=node1 Default=YES Shared=NO MaxTime=INFINITE State=UP

I guess the first step would be to change CoresPerSocket=4 Procs=8 to CoresPerSocket=6 Procs=12, because that would match the actual hardware.

I alread tried to consult the documentation, but I still don't know what to do. Do I need to modify the gres.conf? Which File= should I specify for a CPU? Then, I thought I would add a third partition, maybe called cpuonly. But is that even the right way to accomplish what I am trying to do? I guess I have to add something to the Gres= parameter in the lines starting with NodeName.

score 1 · Accepted Answer · answered May 22 '16 at 21:57

1

Set up two partitions, one for GPU, one for CPU jobs. Use MaxCPUsPerNode for each.
Set up the nodes using the CPUs Parameter. All MaxCPUsPerNode added should be less or qual than this (available CPUs/Cores/Threads)
Use SelectTypeParameters=CR_CPU
Use SchedulerType=sched/backfill

answered May 22 '16 at 21:57

Micha

121
4

In that situation, if multiple collaborators (users) are to share the same minutes quota, would the right approach be to have 2 separate Slurm accounts (one for GPU, one for CPU-only)? And set AllowAccounts in each partition's definition accordingly? – Youssef Eldakar Mar 20 '19 at 09:29

Slurm: Have two separate queues for GPU and CPU-only jobs

1 Answers1