11

I'm seeing some confusing behaviour regarding bonded interfaces under Linux and I'd like to throw the situation out there in hopes that someone can clear it up for me.

I have two servers: Server 1 (S1) has 4x 1Gbit ethernet connections; Server 2 (S2) has 2x 1Gbit ethernet connections. Both servers are running Ubuntu 12.04, albeit with kernel 3.11.0-15 (from the lts-saucy linux-generic package).

Both servers have all their respective network interfaces bundled into a single bond0 interface with the following configuration (in /etc/network/interfaces):

bond-mode 802.3ad
bond-miimon 100
bond-lacp-rate fast
bond-slaves eth0 eth1 [eth2 eth3]

Between the servers are a couple of HP switches which are (I think) correctly configured for LACP on the ports in question.

Now, the link is working - network traffic flows happily to and from both machines. And all respective interfaces are being used, so it's not like the aggregation is completely failing. However, I need as much bandwidth as possible between these two servers, and I'm not getting the ~2Gbit/s that I would expect.

In my testing, I can observe that each server seems to allocate each TCP connection (e.g. iperf, scp, nfs, whatever) to a single slave interface. Essentially everything seems capped at a max of 1 gigabit.

By setting bond-xmit-hash-policy layer3+4, I can use iperf -c S1 -P2 to send on two slave interfaces, but on the server side, reception is still only occurring on one slave interface and the total throughput is therefore capped at 1Gbit/s, i.e. the client shows ~40-50MB/s on two slave interfaces, the server shows ~100MB/s on one slave interface. Without setting bond-xmit-hash-policy the sending is also limited to one slave interface.

I was under the impression that LACP should allow this kind of connection bundling, allowing, for example, a single scp transfer to make use of all available interfaces between the two hosts.

Is my understanding of LACP wrong? Or have I missed some configuration options somewhere? Any suggestions or clues for investigation would be much appreciated!

Zetten
  • 213
  • 1
  • 2
  • 5

3 Answers3

18

A quick and dirty explanation is that a single line of communication using LACP will not split packets over multiple interfaces. For example, if you have a single TCP connection streaming packets from HostA to HostB it will not span interfaces to send those packets. I've been looking at LACP a lot here lately for a solution we are working on and this is a common misconception that 'bonding' or 'trunking' multiple network interfaces with LACP gives you a "throughput" of the combined interfaces. Some vendors have made proprietary drivers that will route over multiple interfaces but the LACP standard does not from what I've read. Here's a link to a decent diagram and explanation I found from HP while searching on similar issues: http://www.hp.com/rnd/library/pdf/59692372.pdf

Mike Naylor
  • 927
  • 1
  • 7
  • 15
  • 1
    That all makes sense. I have no idea why I hadn't discovered my misconception sooner; I must have just been skirting around the right search terms and documentation pages. It seems that depending on the network hardware we might be able to change the src-dest hashing mode and luck out on multi-interface throughput, but I think at this stage I'll just be happy with what we have. Thanks for your clarifications and the very useful link. – Zetten Jan 23 '14 at 09:32
  • Glad to help. I've been reading up a lot on this lately trying to get clarification on terminology dealing with trunking and bonding that is used differently by different vendors. I've found that outside of specific standards such as those defined by IEEE vendors tend to use some terms interchangeably... – Mike Naylor Jan 23 '14 at 12:02
  • 6
    The document is no longer available on original URL, but it is still accessible through Internet Archive: https://web.archive.org/web/20030324105208/http://www.hp.com/rnd/library/pdf/59692372.pdf – smbear Mar 13 '17 at 12:49
3

bond-xmit-hash-policy layer3+4 sets the load balancing from your source server to the switch. It doesn't set the load balancing algorithm from your switch to the second server. That is almost certainly still layer-2 or layer-3 balanced, i.e. not at all.

MSalters
  • 690
  • 5
  • 6
2

Well, first off, when you're using a teaming driver, that will create some overhead, and lower the expected max throughput, which is ~940 MB/s on a 1GB adapter, by ~10%.

I'm not sure what kind of adapter you have, but if you're using in-box drivers, settings are probably not ideal for max throughput. you could consider adding queues, up to 4, as a single queue on the adapter probably can't reach wire rate.

Another consideration, is that one thread of iperf probably isn't going to get top speeds. For 1GB, 2-6 threads is probably more ideal, you can use a simple bash script to launch multiple threads at the same time.

For an Intel NIC, but RSS and Hardware RSC can affect throughput, on Broadcom make sure that TOE is working.

Step one, though, would be to remove the LAGs and just try testing 1 port of traffic on each system to see how much throughput it gets, do this with all the ports, then try 2. LACP is a fickle beast to get set up right, and I've never tried to set it up on a HP switch, only Force10 (pre-Dell).

Also, why are there a couple switches?

mortenya
  • 321
  • 1
  • 8
  • As the other answer described, the underlying problem was my understanding of LACP, but just to fill out the picture: the linux boxes are using the kernel's bonding driver. Each interface individually can push near-max-gigabit throughput (apparently about 110-117MB/s depending on other traffic) so I was really just looking to increase that bandwidth rather than tune the individual NICs. As to the switches, we have a multi-office site and there are trunking switches with fibre mux/demux and various other bits and bobs in the way. I had both servers on one HP 2920-48G switch for testing though. – Zetten Jan 23 '14 at 09:38
  • iperf has `--parallel` parameter which controls the number of parallel client streams to run – 8.8.8.8 Feb 03 '20 at 09:24