I am working with a 13 computer cluster, running on Windows Server 2012 R2, using MS HPC Pack 2012 R2. The headnode is working properly. The servers are connected to the corporate network via IPv4 on standard adapters. The nodes however are also connected to each other via infiniBand.
For a week ago the cluster was moved to a new domain, with the same hostnames. FQDN changed of course, and it's behaving correctly in most ways except that Network Direct is now set to false on all 12 compute nodes.
Post-migration the head node had all HPC components reinstalled. The nodes were left untouched. Since I thought this was the reason that Network Direct was enabled, I also tried to reinstall HPC on one of the nodes. That didn't solve the problem.
Windows Firewall is OFF on all levels on all nodes, including the head node.
In
- ifiniBand adapter on headnode: Mellanox ConnectX-3 Pro IPoIB Adapter
- inifiniBand adapter on nodes: HP 10Gb/40Gb 2-port 544+FLR-QSFP IPoIB Adapter
The servers responds to PING requests on IPs set for the infiniBand adapters.
Anyone have any ideas on this? Thanks in advance.