5

i am new to SLURM. I am searching for a comfortable way, to see how many memory at an node/nodelist is available for my srun allocation.

I already played around with sinfo and scontrol and sstat but none of them gives me the information i need in one comfortable overview.

I had the idea to write a shell script, in order to fetch all fields of all jobs from scontrol and sum them up. But there must be an easier way. Would be great if anyone has an hint or idea!

PlagTag
  • 233
  • 1
  • 3
  • 9

1 Answers1

7

The 7th column of the output of sinfo -N -l will tell you how much memory is installed in each compute node.

$sinfo -N -l
Wed Nov  6 16:31:45 2013
NODELIST                NODES PARTITION       STATE CPUS    S:C:T MEMORY TMP_DISK WEIGHT FEATURES REASON              
node001                    1      Def*        idle    8    2:4:1  24150   920644    100 Xeon,X55 none  

The command scontrol -o show nodes will tell you how much memory is already in use on each node. Look for the AllocMem entry. (Needs Slurm 2.6.0 or more recent)

$ scontrol -o show nodes | awk '{ print $1, $13, $14}'
NodeName=node001 RealMemory=24150 AllocMem=0
NNWizard
  • 186
  • 1
  • 3
  • Hi NNWizard, thanks for your reply. I tried the command but found out, that there is no such entry like AllocMem. Could it be that this is depending on the version of slurm? Currently we have 2.3.2. – PlagTag Nov 11 '13 at 08:29
  • 1
    Yes you need 2.6.0 at least. Otherwise you will need to write a script summing the allocated memory on a node by yourself I'm afraid. – NNWizard Nov 11 '13 at 09:13
  • yes, i think i better go with an update :-) – PlagTag Nov 11 '13 at 10:47
  • 1
    This mostly works, but is fragile due to the lack of quoting in `scontrol -o`. For instance, on our cluster the OS variable contains spaces so to I had to use `scontrol -o show nodes|awk '{print $1, $23, $24}'` – Quantum7 Jul 08 '19 at 11:20