1

I'm writing because I'm facing an issue that I cannot solve trying to configure a cluster with a master node ( or Frontend node ) as a Virtual machine managing nodes with infiniband network.

I use slurm on this nodes, the frontend node is the slurm controller.

Each compute node got ethernet and infiniband interface , Master node ( or Frontend Node ) got only ethernet interface.

When I launch a job from the frontend VM node, compute nodes network traffic ( between them ) is going through the ethernet interface , I haven't found a way to force the use Infiniband interface .

I found out that launching jobs from a compute node instead of the VM Frontend solves the problem! Is there a way to force the use of IB interface ? What am I missing here ?

any idea is much appreciated.

Best Regards, Simo

SimoneM
  • 11
  • 1

1 Answers1

0

I'm new into the HPC works, english is not my native language... but my guess would be to do it via weighed routes:

Assign in each machine the route for the IB net-segment with a very low cost for the interface, and all other net segments with high price for the IB-interfaces (and viceversa: ethernet with a very high weight for the IB-segment).

Kind of the split access mentioned here:

https://tldp.org/HOWTO/Adv-Routing-HOWTO/lartc.rpdb.multiple-links.html

The only downside I see is that SSH traffic might be sent via infiniband instead of ethernet, but there must be a workaround for that.

zRISC
  • 1
  • 1