0

I would like to understand how a GROMACS job launched on my SGI cluster with PBS/Torque using a hybrid parallelization MPI/OpenMPI, works.
The cluster is hyper-threading enabled and each node has 16 physical cores (32 logical).

What I expect: use 4 nodes and all available CPUs of each node (so 16 physical and 32 threads each, which makes 64 cores and 128 threads if I am not wrong).

I did a test job that should meet my expecations. These are the lines I use to ask ressources in the PBS script:

...
#PBS -l select=4:ncpus=16:mpiprocs=16
#PBS -l place=scatter:excl
nprocs=$(cat $PBS_NODEFILE|wc -l)
...

And this is the command I use to launch the Gromacs job in the same script:
mpiexec_mpt -n $nprocs mdrun_mpi -v -s test.tpr -deffnm test
I expect, as written in the documentation of Gromacs, that he guesses the number of openmp threads to use automatically. Which should be 2 since there are 2 threads per core, right ?

As a result, Gromacs gives this verbose output before actually calculating, and this is where I have issues understanding what is happening:

Note: 32 CPUs configured, but only 16 of them are online.

Number of logical cores detected (16) does not match the number reported by OpenMP (1).
Consider setting the launch configuration manually!

Running on 4 nodes with total 64 cores, 64 logical cores
  Cores per node:           16
  Logical cores per node:   16
Hardware detected on host r1i0n13 (the node of MPI rank 0):
  CPU info:
    Vendor: GenuineIntel
    Brand:  Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz
    SIMD instructions most likely to fit this hardware: AVX_256
    SIMD instructions selected at GROMACS compile time: AVX_256

Reading file test-515-short.tpr, VERSION 5.1.1 (single precision)
Changing nstlist from 10 to 25, rlist from 1.2 to 1.222

The number of OpenMP threads was set by environment variable OMP_NUM_THREADS to 1

Will use 56 particle-particle and 8 PME only ranks
This is a guess, check the performance at the end of the log file
Using 64 MPI processes
Using 1 OpenMP thread per MPI process

1) 16/32 CPUs are online / node... only ?
2) logical cores detected is 16 != 32, why ? And openmp only reports 1 ?
3) I would expect 64 MPI processes and 2 openmp threads per MPI process, isn't that logical ?

1 Answers1

0

Ok so I just understood why this happened... previous collegues installed SGI MPT which is used to launch GROMACS jobs, but this tool is in conflict with openMP, so it is either one or the other !