1

I'm setting up a Linux cluster with infiniband network, and I'm quite a newby in infiniband wolrd, any advice is more than welcome !

We are currently using Mellanox OFED drivers, but our infiniband cards are old and not recognized by the latest MOFED drivers. So I'm wondering why not to use distribution shipped drivers (running CentOS7).

What difference will that make to use one or another ? Should I expect any performance decrease ?

thx

nirnaeth
  • 23
  • 6

2 Answers2

2

By not using the vendor OFED distribution, in this case Mellanox OFED you should expect not only a performance penalty but lack of features and a lot of stability issues.

Infiniband is not rock solid as Ethernet is, the main goal of Infiniband is to provide a low latency fabric, not only a high throughput network as everybody usually think.

The inbox driver (that's how Mellanox calls the OFED distribution shipped on the distribution) is unreliable at best, and if you're running cards older than Connect-X4 you'll have a bad time when running IPoIB if needed, just keeping it enabled will eventually lead to kernel panics. Performance is just bad and the network is unreliable.

There are some alternatives, first of all there's the MLNX OFED 4.9 which is an LTS release that support older cards like the Connect-X3. I would stick with it since it's supported and will be supported for a long time.

The difference is the support for the following hardware and technology:

  • ConnectX-3 Pro
  • ConnectX-3
  • Connect-IB
  • RDMA experimental verbs library (mlnx_lib)

Download it from here: https://www.mellanox.com/products/infiniband-drivers/linux/mlnx_ofed

If the LTS version of Mellanox OFED does not fit you, another solution is moving to Oracle Linux, adopt UEK (Unbreakable Enterprise Kernel) and consume its RDMA distribution. At least Oracle test this OFED release, their Exadata product uses it. There's documentation available here: https://docs.oracle.com/en/operating-systems/uek/6/relnotes6.2/ol_instav.html#uek6_install_rdma

Vinícius Ferrão
  • 5,400
  • 10
  • 52
  • 91
1

The "inbox" drivers have gone through the Linux Q&A process and through the distro Q&A. The MOFED drivers have not.

There are severe bugs in MOFED that prevent our code from running under it and support for our old hardware has been disabled in MOFED. But it works with inbox/Distro drivers.

MOFED is experimental software. Could be useful if the system could crash once in awhile and if you want to use cutting edge features that have not matured yet.