I'm trying to configure Infiniband partitions between two Debian based Linux hosts running the 4.15 kernel and A Mellanox SX6036 switch. I've setup a "DMZ" partition on the swich using a PKey of 0x0001
and added the Port GUID
numbers from the active IB connection of both Linux hosts (which happens to be ib1
on both of them).
From what I've read here, and here I now run echo PKEY_VALUE > /sys/class/net/ib1/create_child
on both hosts and I should get a new interface named ib1.PKEY_VALUE
. I can then assign a private IP address to the new interfaces and communicate between the hosts which are members of the MY_PKEY partition. Is that how it's supposed to work?
In the example at the kernel.org link they use 0x8001
which works fine on the Linux end and creates an interface named ib1.8001
. The Mellanox switch however won't let me set the PKey to that value. I get an error: Invalid Pkey 0x8001. Value must be between 0x1 and 0x7fff. I've tried different PKey values on the switch (like 0x0001
) but Linux always creates the interface prefixed with 0x8...
which I cannot use for a PKey on the switch. Have I misunderstood something here?
UPDATE: Hoping some extra info may help. Listing the link info for both host1 and host2 along with output from ibnodes
(results in the same output on both hosts), ibstat
(same output aside from GUIDs) and ibdiagnet
. When I assign an IP to the ib.8001
interface, dmesg
on host1 shows this:ib1.8001: P_Key 0x8001 is not found
I'm adding a new screenshot of the current partition since I've changed it to include Full membership for all Port-GUIDs.
host1# ip link sho
24: ib1.8001@ib1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 4092 qdisc pfifo_fast state LOWERLAYERDOWN mode DEFAULT group default qlen 256
link/infiniband 80:00:02:1f:fe:80:00:00:00:00:00:00:00:02:c9:03:00:10:df:5a brd 00:ff:ff:ff:ff:12:40:1b:80:01:00:00:00:00:00:00:ff:ff:ff:ff
host2# ip link sho
16: ib1.8001@ib1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 4092 qdisc pfifo_fast state LOWERLAYERDOWN mode DEFAULT group default qlen 256
link/infiniband 80:00:02:1e:fe:80:00:00:00:00:00:00:e4:1d:2d:03:00:e0:88:02 brd 00:ff:ff:ff:ff:12:40:1b:80:01:00:00:00:00:00:00:ff:ff:ff:ff
host2# ibnodes
Ca : 0xe41d2d0300e08800 ports 2 "MT25408 ConnectX Mellanox Technologies"
Ca : 0x0002c9030010df58 ports 2 "MT25408 ConnectX Mellanox Technologies"
Switch : 0xf452140300823b60 ports 36 "MF0;msx6036:SX6036/U1" enhanced port 0 lid 1 lmc 0
host2# ibstat
CA 'mlx4_0'
CA type: MT4099
Number of ports: 2
Firmware version: 2.34.5000
Hardware version: 0
Node GUID: 0xe41d2d0300e08800
System image GUID: 0xe41d2d0300e08803
Port 1:
State: Down
Physical state: Polling
Rate: 10
Base lid: 6
LMC: 0
SM lid: 1
Capability mask: 0x0251486a
Port GUID: 0xe41d2d0300e08801
Link layer: InfiniBand
Port 2:
State: Active
Physical state: LinkUp
Rate: 40 (FDR10)
Base lid: 2
LMC: 0
SM lid: 5
Capability mask: 0x0251486a
Port GUID: 0xe41d2d0300e08802
Link layer: InfiniBand
host2# ibdiagnet
Loading IBDIAGNET from: /usr/lib/x86_64-linux-gnu/ibdiagnet1.5.7
-W- Topology file is not specified.
Reports regarding cluster links will use direct routes.
Loading IBDM from: /usr/lib/x86_64-linux-gnu/ibdm1.5.7
-I- Using port 2 as the local port.
-I- Discovering ... 3 nodes (1 Switches & 2 CA-s) discovered.
-I---------------------------------------------------
-I- Bad Guids/LIDs Info
-I---------------------------------------------------
-I- No bad Guids were found
-I---------------------------------------------------
-I- Links With Logical State = INIT
-I---------------------------------------------------
-I- No bad Links (with logical state = INIT) were found
-I---------------------------------------------------
-I- General Device Info
-I---------------------------------------------------
-I---------------------------------------------------
-I- PM Counters Info
-I---------------------------------------------------
-I- No illegal PM counters values were found
-I---------------------------------------------------
-I- Fabric Partitions Report (see ibdiagnet.pkey for a full hosts list)
-I---------------------------------------------------
-I- PKey:0x7fff Hosts:2 full:2 limited:0
-I---------------------------------------------------
-I- IPoIB Subnets Check
-I---------------------------------------------------
-I- Subnet: IPv4 PKey:0x7fff QKey:0x00000b1b MTU:2048Byte rate:10Gbps SL:0x00
-W- Suboptimal rate for group. Lowest member rate:40Gbps > group-rate:10Gbps
-I---------------------------------------------------
-I- Bad Links Info
-I- No bad link were found
-I---------------------------------------------------
----------------------------------------------------------------
-I- Stages Status Report:
STAGE Errors Warnings
Bad GUIDs/LIDs Check 0 0
Link State Active Check 0 0
General Devices Info Report 0 0
Performance Counters Report 0 0
Partitions Check 0 0
IPoIB Subnets Check 0 1