edit: On CentOS 8.5, tried with Mellanox driver 4.9-4.1.7.0 (legacy) and 5.5-1.0.3.2:
I am not able to get my Infiniband adapter working.
The output of ibstat
states that it is down:
CA 'mlx5_0' CA type: MT4123 Number of ports: 1 Firmware version: 20.31.1014 Hardware version: 0 Node GUID: 0xb8cef60300a7fbbc System image GUID: 0xb8cef60300a7fbbc Port 1: State: Down Physical state: Disabled Rate: 10 Base lid: 65535 LMC: 0 SM lid: 0 Capability mask: 0x2651e848 Port GUID: 0xb8cef60300a7fbbc Link layer: InfiniBand
And mlxlink -d mlx5_0
outputs:
Operational Info
----------------
State : Disable
Physical state : ETH_AN_FSM_ENABLE
Speed : N/A
Width : N/A
FEC : N/A
Loopback Mode : N/A
Auto Negotiation : ON
Supported Info
--------------
Enabled Link Speed : 0x00000075 (HDR,EDR,FDR,QDR,SDR)
Supported Cable Speed : 0x00000007 (QDR,DDR,SDR)
Troubleshooting Info
--------------------
Status Opcode : 1036
Group Opcode : MNG FW
Recommendation : Connected wrong module type. Change to a different module type.
So here I have a troubleshooting info, I just dont understand it. I am pretty sure the cable is connected, could it be some incompatibilities between Connect-X 3 (where opensm service runs) and Connect-X 6 adapters?
edit:
The adapters are connected by a Mellanox SX6012 switch.
The output of ibcheckstate -v
is given in the following. Port 1 is the node with opensm running, the port of the new node with the ConnectX-6 adapter is missing.
# Checking Switch: nodeguid 0x248a070300ccc140
Node check lid 2: OK
Port check lid 2 port 1: OK
Port check lid 2 port 2: OK
Port check lid 2 port 3: OK
Port check lid 2 port 4: OK
Port check lid 2 port 5: OK
# Checking Ca: nodeguid 0x0cc47affff5fb364
Node check lid 4: OK
Port check lid 4 port 1: OK
# Checking Ca: nodeguid 0x0cc47affff5fb8e4
Node check lid 6: OK
Port check lid 6 port 1: OK
# Checking Ca: nodeguid 0x0cc47affff5fb4c4
Node check lid 5: OK
Port check lid 5 port 1: OK
# Checking Ca: nodeguid 0x0cc47affff5fb89c
Node check lid 3: OK
Port check lid 3 port 1: OK
# Checking Ca: nodeguid 0x248a070300f97f50
Node check lid 1: OK
Port check lid 1 port 1: OK
*** WARNING ***: this command is deprecated
## Summary: 6 nodes checked, 0 bad nodes found
## 10 ports checked, 0 ports with bad state found
The cable has worked at least with a ConnectX-4 adapter.