0

SOLVED

I got troubles setting up infiniband software from OpenSUSE repository.
The same card works fine with SLES12 + Mellanox OFED stack.

Installing everything containing 'infiniband' from yast I see that HCA is up and diagnostic tools like ibnodes show relevant data:

>ibnodes
Ca      : 0x0002c90300a00360 ports 2 "cnode1 HCA-1"
Ca      : 0x0002c90300ea8fd0 ports 1 "helper1 mlx4_0"
Switch  : 0x0008f1050020096c ports 36 "Voltaire 4036 # spine2" enhanced port 0 lid 17 lmc 0

Here helper1 is an OpenSUSE machine and cnode1 is a SLES node.

But when it comes to verbs, I get:

>ibv_devinfo
No IB devices found

Consequently, I can not get MPI working with infiniband.

Am I missing some intermediate layer, or do libibverbs components need some additional configuration?

Thanks!

UPD: some more output from zypper and lsmod:

Here are the packages installed from Leap 42.2 repository:

>zypper se verbs

S | Name                    | Summary                                                     | Type
--+-------------------------+-------------------------------------------------------------+--------
i | libibverbs-devel        | Development files for the libibverbs library                | package
  | libibverbs-devel-32bit  | Development files for the libibverbs library                | package
i | libibverbs-devel-static | Static libibverbs library                                   | package
i | libibverbs-runtime      | Tools for the Infiniband Verbs library and manpages         | package
i | libibverbs1             | Infiniband verbs library                                    | package
  | libibverbs1-32bit       | Infiniband verbs library                                    | package
i | libipathverbs-rdmav2    | PathScale InfiniPath HCA Userspace Driver                   | package
i | libusnic_verbs-rdmav2   | Cisco UCS InfiniBand HCA Userspace Driver                   | package
  | texlive-newverbs        | Define new versions of \verb, including short verb versions | package
  | texlive-newverbs-doc    | Documentation for texlive-newverbs                          | package

List of loaded modules for OpenSUSE (ibv_* diagnostics can not find HCA)

>lsmod | grep ib
ib_ucm                 24576  0
ib_ipoib               98304  0
ib_cm                  49152  3 rdma_cm,ib_ucm,ib_ipoib
ib_uverbs              61440  2 ib_ucm,rdma_ucm
ib_umad                24576  0
iscsi_ibft             16384  0
iscsi_boot_sysfs       20480  1 iscsi_ibft
mlx4_ib               167936  0
ib_sa                  40960  5 rdma_cm,ib_cm,mlx4_ib,rdma_ucm,ib_ipoib
ib_mad                 57344  4 ib_cm,ib_sa,mlx4_ib,ib_umad
ib_core               131072  10 rdma_cm,ib_cm,ib_sa,iw_cm,mlx4_ib,ib_mad,ib_ucm,ib_umad,ib_uverbs,ib_ipoib
ib_addr                20480  4 rdma_cm,ib_sa,ib_core,rdma_ucm
mlx4_core             323584  1 mlx4_ib
libahci                36864  1 ahci
libata                270336  2 ahci,libahci
scsi_mod              262144  4 sg,libata,sd_mod,sr_mod    
libcrc32c              16384  1 xfs
snd_usbmidi_lib        36864  1 snd_usb_audio
snd_rawmidi            36864  1 snd_usbmidi_lib
snd                    90112  12 snd_hda_codec_realtek,snd_usb_audio,snd_hwdep,snd_timer,snd_hda_codec_hdmi,snd_pcm,snd_rawmidi,snd_hda_codec_generic,snd_usbmidi_lib,snd_hda_codec,snd_hda_intel,snd_seq_device
usbcore               270336  6 snd_usb_audio,uvcvideo,snd_usbmidi_lib,ehci_hcd,ehci_pci,usbhid

List of loaded modules for SLES 12 (ibv_* work)

>lsmod |grep ib
ib_ucm                 18489  0
ib_ipoib              144838  0
ib_cm                  46900  3 rdma_cm,ib_ucm,ib_ipoib
ib_uverbs              83349  2 ib_ucm,rdma_ucm
ib_umad                22281  6
mlx5_ib               204339  0
mlx5_core             572759  1 mlx5_ib
inet_lro               13400  3 mlx4_en,mlx5_core,ib_ipoib
iscsi_ibft             12862  0
iscsi_boot_sysfs       16051  1 iscsi_ibft
mlx4_ib               208061  0
ib_sa                  37997  5 rdma_cm,ib_cm,mlx4_ib,rdma_ucm,ib_ipoib
ib_mad                 60774  4 ib_cm,ib_sa,mlx4_ib,ib_umad
ib_core               159115  12 rdma_cm,ib_cm,ib_sa,iw_cm,mlx4_ib,mlx5_ib,ib_mad,ib_ucm,ib_umad,ib_uverbs,rdma_ucm,ib_ipoib
ib_addr                19098  3 rdma_cm,ib_core,rdma_ucm
ib_netlink             14070  3 rdma_cm,iw_cm,ib_addr
mlx4_core             374829  2 mlx4_en,mlx4_ib
mlx_compat             14630  18 rdma_cm,ib_cm,ib_sa,iw_cm,mlx4_en,mlx4_ib,mlx5_ib,ib_mad,ib_ucm,ib_netlink,ib_addr,ib_core,ib_umad,ib_uverbs,mlx4_core,mlx5_core,rdma_ucm,ib_ipoib
libahci                36105  1 ahci
libata                235807  2 ahci,libahci
scsi_mod              244354  3 sg,libata,sd_mod

Also grep for verbs only:

OpenSUSE

>lsmod | grep verbs
ib_uverbs              61440  2 ib_ucm,rdma_ucm
ib_core               131072  10 rdma_cm,ib_cm,ib_sa,iw_cm,mlx4_ib,ib_mad,ib_ucm,ib_umad,ib_uverbs,ib_ipoib

SLES

>lsmod | grep verbs
ib_uverbs              83349  2 ib_ucm,rdma_ucm
ib_core               159115  12 rdma_cm,ib_cm,ib_sa,iw_cm,mlx4_ib,mlx5_ib,ib_mad,ib_ucm,ib_umad,ib_uverbs,rdma_ucm,ib_ipoib
mlx_compat             14630  18 rdma_cm,ib_cm,ib_sa,iw_cm,mlx4_en,mlx4_ib,mlx5_ib,ib_mad,ib_ucm,ib_netlink,ib_addr,ib_core,ib_umad,ib_uverbs,mlx4_core,mlx5_core,rdma_ucm,ib_ipoib

UPD2: As Derek Mitchell wrote, i can see service openibd in SLES+Mellanox OFED:

>service openibd status
openibd.service - openibd - configure Mellanox devices
   Loaded: loaded (/usr/lib/systemd/system/openibd.service; enabled)
   Active: active (exited) since Thu 2017-03-23 17:45:38 MSK; 1 weeks 2 days ago
     Docs: file:/etc/infiniband/openib.conf
 Main PID: 678 (code=exited, status=0/SUCCESS)
   CGroup: /system.slice/openibd.service

There is no such service in Leap 42.2, but there is rdma service instead:

service rdma status
* rdma.service - Initialize the iWARP/InfiniBand/RDMA stack in the kernel
   Loaded: loaded (/usr/lib/systemd/system/rdma.service; disabled; vendor preset: disabled)
   Active: active (exited) since Sat 2017-04-01 19:23:45 MSK; 2min 45s ago
     Docs: file:/etc/rdma/rdma.conf
  Process: 601 ExecStart=/usr/sbin/rdma-init-kernel (code=exited, status=0/SUCCESS)
 Main PID: 601 (code=exited, status=0/SUCCESS)
    Tasks: 0 (limit: 512)
   CGroup: /system.slice/rdma.service

Apr 01 19:23:37 helper1 systemd[1]: Starting Initialize the iWARP/InfiniBand/RDMA stack in the kernel...
Apr 01 19:23:45 helper1 rdma-init-kernel[601]: /sys/class/infiniband /
Apr 01 19:23:45 helper1 rdma-init-kernel[601]: /
Apr 01 19:23:45 helper1 systemd[1]: Started Initialize the iWARP/InfiniBand/RDMA stack in the kernel.

Anyway, ibv_devinfo still can not find connectx-3 card.

UPD3 So the problem was caused by Leap 42.2 main repository not including libmlx4-rdmav2 package, and mlx4 is the driver for connectx-3 HCAs.

After adding Factory OFED repository

zypper addrepo http://download.opensuse.org/repositories/OFED:Factory/openSUSE_Leap_42.2/OFED:Factory.repo

installing libmlx4-rdmav2 and downgrading all other infiniband packages to Factory version, i got ibv_devinfo working.

>ibv_devinfo
hca_id: mlx4_0
        transport:                      InfiniBand (0)
        fw_ver:                         2.36.5000
        node_guid:                      0002:c903:00ed:3ed0
        sys_image_guid:                 0002:c903:00ed:3ed3
        vendor_id:                      0x02c9
        vendor_part_id:                 4099
        hw_ver:                         0x0
        board_id:                       MT_1100120019
        phys_port_cnt:                  1
                port:   1
                        state:                  PORT_ACTIVE (4)
                        max_mtu:                4096 (5)
                        active_mtu:             4096 (5)
                        sm_lid:                 3
                        port_lid:               19
                        port_lmc:               0x00
                        link_layer:             InfiniBand
catbus
  • 1
  • 1
  • Check if you have installed `*verbs*` packages and libraries as well. It also could be that the `*verbs*` modules are not loaded. – Thomas Apr 01 '17 at 08:41
  • Looks like all verbs packages are installed. Added lsmod output for both Leap 42.2 and SLES 12, there is some difference in modules loaded but I am not sure what modules are really neccessary. – catbus Apr 01 '17 at 09:35
  • did you build OFED from source or just install what zypper suggested. download and build OFED from source. if you built from source and are having this problem it would be interesting – Matt Apr 01 '17 at 22:09
  • Just installed packages, and there was no package libmlx4-rdmav2 in the standard leap 42.2 repo needed for connectx-3 – catbus Apr 01 '17 at 22:14
  • Then build from source, where you can select exactly which packages are built. OS Distributions are notoriously bad about support for IB and the best bet is to build from source or download the OFED version from your IB solution provider(probably Mellanox) – Matt Apr 04 '17 at 23:35
  • are you still stuck here ? I can provide a solution if you are – itsmrbeltre Mar 15 '18 at 16:46

1 Answers1

-1

I did this for my cards VOLTAIRE 410-4EX (mthca) on leap 42.1:

zypper install opensm ibutils ibutils-devel infiniband-diags infiniband-diags-devel libibcm1 libibverbs-devel libibverbs-runtime ibacm libibcm1 libmthca-rdmav2 rdma tvflash libibnetdisc5 ibsim qperf

Then:

systemctl enable openibd

systemctl start  openibd
iwaseatenbyagrue
  • 3,588
  • 12
  • 22
  • Derek, thank you for the reply, I successfully made zypper install, but there is no openibd.service or /etc/init.d/openibd script. As I understand, it is replaced in OpenSUSE 42.2 with rdma service, which status is active (exited) in my system. – catbus Apr 01 '17 at 16:30
  • added upd2 with some more output – catbus Apr 01 '17 at 16:39
  • solved installing libmlx4-rdmav2 from Factory repo, thanks for pointing at hca type! – catbus Apr 01 '17 at 20:38