2

I am trying to install SLURM with NFS on a small ubuntu 18.04 HPC cluster, in a typical fashion, e.g. configure controller (slurmctld) and clients (slurmd) and shared directory, etc. What I am curious about is, is there a way to set it up such that there is a controller on a portion of the head node, and other drives in the head node are used by the resource partitioning routine, like the other nodes? Is there a way to accomplish this using the SLURM configuration file?

I am essentially asking how to maximise the resources if the controller will only be doing light work.

Thank you, cheers!

rage_man
  • 123
  • 3
  • I can't fully understand your question. NFS has nothing to do with SLURM. What you're trying to achieve? Do you want to use your headnode (where you run slurmctld) as a computing node so you can run both slurmctld and slurmd? – Vinícius Ferrão Jul 19 '21 at 23:43
  • The NFS was a possibly irrelevant detail, except for maybe if the shared home drive was impacted by the set up. Yes, this is what I wanted to do, computations on head node instead of wasting its resources, since it is a small cluster and I want to maximize everything. In any case, this is no longer my project, so I have no interest in it anymore. But, if you have tips for the future I would gladly hear them – rage_man Jul 19 '21 at 23:53
  • I will write an answer, but yes you can run slurmd on the headnode, we usually do this on small machines. – Vinícius Ferrão Jul 19 '21 at 23:57

1 Answers1

1

You're trying to consume the headnode as a compute node. This is perfectly normal on small clusters and even on workstations that have SLURM as a queue system to consume it as an easier way to enqueue jobs or to share the compute power among a group of users that have access to this workstation.

To do this just enable slurmd on the same machine that runs slurmctld. Remember to add the respective node and partition entry on /etc/slurm/slurm.conf with the compute specifications. As example, you should have something like this:

ClusterName=Cloyster
ControlMachine=charizard.cluster.example.com
SlurmUser=slurm
SlurmctldPort=6817
SlurmdPort=6818
AuthType=auth/munge
StateSaveLocation=/var/spool/slurm/ctld
SlurmdSpoolDir=/var/spool/slurm/d
SwitchType=switch/none
MpiDefault=none
SlurmctldPidFile=/var/run/slurmctld.pid
SlurmdPidFile=/var/run/slurmd.pid
ProctrackType=proctrack/pgid
SlurmctldTimeout=300
SlurmdTimeout=300
InactiveLimit=0
MinJobAge=300
KillWait=30
Waittime=0
SchedulerType=sched/backfill
SelectType=select/cons_tres
SelectTypeParameters=CR_Core
SlurmctldDebug=info
SlurmctldLogFile=/var/log/slurmctld.log
SlurmdDebug=info
SlurmdLogFile=/var/log/slurmd.log
JobCompType=jobcomp/none
TaskPlugin=task/affinity
PropagateResourceLimitsExcept=MEMLOCK
AccountingStorageType=accounting_storage/filetxt
Epilog=/etc/slurm/slurm.epilog.clean
SlurmctldParameters=enable_configless

ReturnToService=2
NodeName=charizard Sockets=2 CoresPerSocket=64 ThreadsPerCore=2 State=UNKNOWN
PartitionName=execution Nodes=charizard Default=YES MaxTime=720:00:00 State=UP Oversubscribe=EXCLUSIVE

Observe that NodeName have the hostname of the control machine.

Vinícius Ferrão
  • 5,400
  • 10
  • 52
  • 91
  • Assigning a slurmctld management daemon to the head looks like exactly what would be needed. Thanks! – rage_man Jul 20 '21 at 00:35