7

I have two Windows Server 2019 Datacenter Edition servers, each with RDMA-enabled Mellanox ConnectX‐4 Lx 25GbE DA/SFP NICs. I've directly connected them with a 1 meter SFP28 copper cable.

How can I get them to see each other?

The cards try to negotiate and wind up with APIPA addresses.

I read that setting static IPs without a default gateway would work as long as they are on the same subnet, but doing so doesn't seem to work (can't ping each other; Test-Cluster fails).

Two-node, switchless setups are supported configurations (see Storage Spaces Direct hardware requirements): Support for switchless two-node cluster is officially supported

I can't find any examples of Microsoft or anyone else discussing such a setup. Maybe I don't know what to search for, as I've never used this type of NIC before. I would really like to avoid having to buy a switch for them.

How can I get these two NICs to ping each other without a switch?

Louis Waweru
  • 695
  • 9
  • 26
  • What part of Test-Cluster fails exactly? – joeqwerty Jun 21 '19 at 03:15
  • @joeqwerty I put the Mellanox NICs on a private network with static IPs, and the hostnames are assigned to MAC addresses on the the onboard 10-Gbit NICs in DNS. The talk to the domain controllers and everything else over the onboard NICs. Any network test to or from the Mellanox NICs will fail, including to each other. I understand why Test-Cluster is failing, and am not hoping to solve that here. I'd actually just like to imagine the onboard NICs don't exist if that makes things simpler. Example here: https://i.stack.imgur.com/Yy34U.png – Louis Waweru Jun 21 '19 at 03:35
  • I would like to help but there is a lot of information missing here. I think that it purely a networking problem and not related to clustering. There is no way you should need a switch. 1) what IPs have you used? 2) what does IPCONFIG /ALL look like at both ends? 3) Does 'Network Connections' windows show the link is up (ie what is the 'Media State')? – Daniel K Jun 24 '19 at 17:14
  • @DanielK Thank you, Daniel. Sorry I got a bit swamped from all angles, and wan't able to work on this. I will improve the question once things calm down. – Louis Waweru Jun 27 '19 at 04:36
  • 2
    Avoid using Storage Spaces Direct (S2D) in production at any cost! It’s extremely non-mature technology known for numerous data loss cases. WS2016 got some recent fixes for 2-node scenario, but WS2019 can’t survive second node reboot/failure w/out losing pool quorum. So you either roll back to WS2016 and use 3+ nodes, or you have to look at alternatives... – RiGiD5 Aug 02 '19 at 14:08

1 Answers1

5

To enable DA NICs to communicate with each other, simply assign IPs and subnet and leave gateway and dns server fields blank. For example

Node 1

IP: 192.168.10.1

SubnetMask: 255.255.255.0

Node 2

IP: 192.168.10.2

SubnetMask: 255.255.255.0

If NICs still try to use APIPA, then just remove device/drivers form device management and reinstall NICs drivers.

2-node S2D is broken and can’t survive second node reboot / down w/out losing pool quorum and disconnecting S2D pool. This is known issue and has to be fixed around this fall with some of the fixes (not guaranteed time frame). Example thread with an issue.

If you need 2-node switchless configuration, look at solutions, which have been designed for such setup. Like Starwind vsan

batistuta09
  • 8,723
  • 9
  • 21