1

At the moment, we have set up Slurm to manage a small cluster of six nodes with four GPUs each. That has been working great so far, but now we want to utilize the Intel Core i7-5820K CPUs for jobs which only require CPU processing power. Each CPU has six cores and 12 threads, each GPU requires one thread/logical core, so there are 8 threads remaining (per node) which could be used for "CPU-only" jobs.

Current configuration:

cat /etc/slurm-llnl/gres.conf

Name=gpu File=/dev/nvidia0
Name=gpu File=/dev/nvidia1
Name=gpu File=/dev/nvidia2
Name=gpu File=/dev/nvidia3

cat /etc/slurm-llnl/slurm.conf (excerpt)

SchedulerType=sched/builtin
SelectType=select/cons_res
SelectTypeParameters=CR_Core
AccountingStorageType=accounting_storage/none
GresTypes=gpu
MaxTasksPerNode=4

NodeName=node1 CoresPerSocket=4 Procs=8 Sockets=1 ThreadsPerCore=2 Gres=gpu:4 State=UNKNOWN
NodeName=node2 CoresPerSocket=4 Procs=8 Sockets=1 ThreadsPerCore=2 Gres=gpu:4 State=UNKNOWN
NodeName=node3 CoresPerSocket=4 Procs=8 Sockets=1 ThreadsPerCore=2 Gres=gpu:4 State=UNKNOWN
NodeName=node4 CoresPerSocket=4 Procs=8 Sockets=1 ThreadsPerCore=2 Gres=gpu:4 State=UNKNOWN
NodeName=node5 CoresPerSocket=4 Procs=8 Sockets=1 ThreadsPerCore=2 Gres=gpu:4 State=UNKNOWN
NodeName=node6 CoresPerSocket=4 Procs=8 Sockets=1 ThreadsPerCore=2 Gres=gpu:4 State=UNKNOWN

PartitionName=gpu Nodes=node[2-6] Default=NO Shared=NO MaxTime=INFINITE State=UP
PartitionName=short Nodes=node1 Default=YES Shared=NO MaxTime=INFINITE State=UP

I guess the first step would be to change CoresPerSocket=4 Procs=8 to CoresPerSocket=6 Procs=12, because that would match the actual hardware.

I alread tried to consult the documentation, but I still don't know what to do. Do I need to modify the gres.conf? Which File= should I specify for a CPU? Then, I thought I would add a third partition, maybe called cpuonly. But is that even the right way to accomplish what I am trying to do? I guess I have to add something to the Gres= parameter in the lines starting with NodeName.

Micha
  • 121
  • 4

1 Answers1

1
  • Set up two partitions, one for GPU, one for CPU jobs. Use MaxCPUsPerNode for each.
  • Set up the nodes using the CPUs Parameter. All MaxCPUsPerNode added should be less or qual than this (available CPUs/Cores/Threads)
  • Use SelectTypeParameters=CR_CPU
  • Use SchedulerType=sched/backfill
Micha
  • 121
  • 4
  • In that situation, if multiple collaborators (users) are to share the same minutes quota, would the right approach be to have 2 separate Slurm accounts (one for GPU, one for CPU-only)? And set AllowAccounts in each partition's definition accordingly? – Youssef Eldakar Mar 20 '19 at 09:29