6

I have Two HPBL685c G6 blade servers running Ubuntu 15.04

When I configure each of the 4 10GB NIC separately, I can test using iperf and I receive ~10Gbit/sec bandwidth between servers, for each NIC. This works as expected.

Now, I am trying to bond all of the 10GB NIC on each server, using bond mode "balance-rr". These results vary, but land somewhere between 2.5Gbits/sec and 5Gbits/sec

I am using the same configuration to bond 2X1GB NIC on these same servers, 2X1GB NIC bonded results in ~2Gbit/sec bandwidth when testing with iperf. These 2 NICs are not connected to a Virtual Connect Domain, and instead are each connected to a different Cisco Catalyst Blade Switch 3120

So, my question is: Why does bonding 4X10GB NIC using balance-rr, result in less performance than using a single NIC. I would have expected ~40Gbit/sec bandwidth minus TCP/Bonding overhead, which would align with my results when bonding 2X1GB and getting ~2GB when testing.

I have tried this with different bonding modes, and the others result in about ~10Gbit/sec bandwidth when bonded. Still not ideal, but better than the balance-rr results.

  • Results from `cat /proc/net/bonding/bond-net` show all interfaces are up with 10000Mbps http://d.pr/n/16hKi – Nicholas Curtis Jun 21 '15 at 00:10
  • Results from `iperf -c 10.10.10.101 -P 4` to test using 8 parallel clients, shows that I am able to surpass the 10Gbit/sec of a single NIC: http://d.pr/i/Tk3a However, I would still expect that using -P 8, each connection would get ~5Gbit/sec and SUM to ~40Gbit/sec – Nicholas Curtis Jun 21 '15 at 00:19
  • 1
    Did you try running multiple iperf streams at once? – SpacemanSpiff Jun 21 '15 at 01:50
  • Only using the -P flag, I did not start up multiple instances of iperf. However, even w/o multiple streams or parallel clients, I would expect iperf to get at least the bandwidth of a single NIC in the bond - currently it is getting 25% of one of the 4 nics when using a single iperf client. – Nicholas Curtis Jun 21 '15 at 02:10
  • Why aren't you using LACP? – ewwhite Jun 21 '15 at 04:07
  • And why does this matter? Is it a science experiment, or do you need to design for a particular throughput target? – ewwhite Jun 21 '15 at 04:08
  • I intend on using these two nodes as Openstack Neutron Nodes, and the 4x10GB interfaces will be bound to an IP used for instance tunneling between Compute and Neutron nodes. Each compute node has a single 10GB NIC, so I want to be able to maximize the amount of bandwidth between the 8 compute nodes, and the 2 neutron nodes in my setup. – Nicholas Curtis Jun 21 '15 at 04:36
  • How the CPU usage looks like? Try to install `htop` and see how CPU Hard-IRQ and Soft-IRQ looks like. I've seen significant impact of bonding when using 2x1gbps, I suspect that CPU may be your bottleneck here. – Michal Sokolowski Jun 21 '15 at 09:46
  • You seem to be mixing up bits and bytes in your question. I think most of the places where you said GB you meant Gb, but I am not sure if all of them need to be corrected. – kasperd Jun 21 '15 at 22:23
  • These are G6 systems. They're not even capable of this type of traffic when bonded. HP Blades are probably a poor platform for what you're doing as well; especially with VirtualConnect. – ewwhite Jun 21 '15 at 23:04
  • @ewwhite G6 blades are capable of round robin load balancing with bonded NICs, I know this because I have 2x1GB NICs bonded on 16+ G6 blade servers, and each reports ~2Gb bandwidth to the other nodes when measuring performance with iperf. The problem is Virtual Connect not supporting the balance-rr bond mode, using bond mode 6 provides a full 10Gb performance to each NIC, and allows for 40Gb bandwidth when connecting to multiple servers. – Nicholas Curtis Jun 22 '15 at 01:03
  • Note that B is traditionally for byte while ethernet speed is usually expressed in bits. Please try and be specific in your question – sch Jun 23 '15 at 18:51

1 Answers1

6

It appears that the Virtual Connect Modules do not support bond mode 0 (balance-rr) in linux deployments.

From HP Support: http://h20564.www2.hp.com/hpsc/doc/public/display?docId=emr_na-c02957870

Information Unsupported bonding modes in an HP Virtual Connect environment can produce packet loss and/or performance issues.

Details HP Virtual Connect supports bonding modes 1, 5, or 6. VC does not support modes 0 (round robin) or 7 (switch assisted load balancing).

Mode 1: Active/backup. Active-backup policy: Only one slave in the bond is active. A different slave becomes active if, and only if, the active slave fails. The bond's MAC address is externally visible on only one port (network adapter) to avoid confusing the switch.

Mode 5: Adaptive transmit load balancing: channel bonding that does not require any special switch support. The outgoing traffic is distributed according to the current load (computed relative to the speed) on each slave. Incoming traffic is received by the current slave. If the receiving slave fails, another slave takes over the MAC address of the failed receiving slave.

Mode 6: Adaptive load balancing: includes balance-tlb plus receive load balancing (rlb) for IPV4 traffic, and does not require any special switch support. The receive load balancing is achieved by ARP negotiation.