9

As I understand it, bonding brings among other benefits the ability to increase the network speed between two machines in a LAN.

Bonding [...] means combining several network interfaces (NICs) to a single link, providing either high-availability, load-balancing, maximum throughput, or a combination of these.

Source: Ubuntu documentation, emphasis mine.

I have bonding configured on two servers; both have two 1Gbps NIC adapters. When testing speed between those servers using iperf, the report indicates:

  • 930 to 945 Mbits/sec when using balance-rr bonding mode.
  • 520 to 530 Mbits/sec from machine A to B when using 802.3ad,
  • 930 to 945 Mbits/sec from machine B to A when using 802.3ad.

An interesting thing is that when using 802.3ad, ifconfig indicates that practically all RX is on eth0 (2.5 GB vs. a few KB/MB) and all TX on eth1 on machine A, and the inverse on machine B.

When asking iperf to use multiple connections (iperf -c 192.168.1.2 -P 10), the obtained sum is very close to the results displayed when using a single connection.

Two machines are connected to a Netgear GS728TS which has LACP configured properly (I hope), with two LAGs covering two ports each. IEEE 802.3x mode is enabled.

Is iperf suited well for this sort of tests? If yes, is there something I'm missing?

Arseni Mourzenko
  • 2,165
  • 5
  • 23
  • 41
  • iperf should be fine. Is it possible that an equipment in the middle is the bottleneck? –  Aug 29 '14 at 22:32
  • What mode do you have it in? Have you verified that both links are being utilized (even if not fully)? – phemmer Aug 29 '14 at 22:41
  • @FrederikDeweerdt: the equipment in the middle is a switch which should be able to handle a 2 Gbps connection correctly, I suppose. I edited the question to provide more details. – Arseni Mourzenko Aug 29 '14 at 23:27
  • @Patrick: I'm using `802.3ad`. I edited the question to provide more details. – Arseni Mourzenko Aug 29 '14 at 23:27
  • 3
    I think 802.3ad uses a hash of the endpoints' addresses to choose which interface to use, so between any two endpoints your throughput won't be any higher than a single NIC's throughput. Round robin may result in higher throughput, if you want to maximize transfer rates between two specific endpoints, but I believe the disadvantage of that is that packets can arrive out of order (not a problem for TCP). Check if your switch has overall limits on bandwidth for physically adjacent ports; sometimes they'll have 2 or 4 ports sharing the same hardware. – Mark Plotnick Aug 30 '14 at 08:31
  • See http://packetpushers.net/the-scaling-limitations-of-etherchannel-or-why-11-does-not-equal-2/ of why 1 + 1 does not equal 2 in terms of bonding – Jonathan Sep 16 '14 at 12:59
  • @Jonathan: This doesn't explain why 1 + 1 equals 0.5. – Arseni Mourzenko Sep 16 '14 at 18:21

3 Answers3

5

Bonded interfaces do not grant additional bandwidth to individual network flows. So if you're only running one copy of iperf then you will only be able to use one network interface at a time. If you have two NIC in a lagg then you'll need at least two completely independent copies of iperf running on the computer to see any simultaneous utilization. This will apply to actual loads as well - eg a Samba client will still only see 1Gb throughput, but two clients could each see 1Gb if your lagg has two NICs. This all assumes you have the lagg configured to use both NICs (The 802.3ad option will do this).

Chris S
  • 77,337
  • 11
  • 120
  • 212
4

After contacting Netgear support, it appears that:

If you use 2 stations (1 client/1 server), it will actually only use one link (hence the 1Gbps/940mbps), the link used is decided by the LACP hashing algorithm.

To go above the 1Gbps limit, you will need to test with more that 1 client.

Source: Netgear support ticket response

The same ticket response links to Netgear's public forum post, where we can read that:

You can only get 2Gbps aggregate when the LACP hashing algorithm puts multiple traffic streams down different paths and it doesn't always. With a small number of clients (2 in your case), odds are good that they both might get hashed to the same link.

For those who don't want to read the entire forum discussion, here are the key points:

  • There should be at least two clients connecting to the server to benefit from LACP. A single client will use one link only, which will limit its speed to 1 Gbps.

  • Two clients should be using different links to benefit from LACP.

  • With only two network adapters on the server, there is a 50% chance of getting the same link from two clients, which will result in total speed capped at 1 Gbps. Three network adapters decrease the chance down to 33%, four—to 25%.

To conclude, there is no way with Netgear GS728TS to obtain a 1.4 to 1.8 Gbps speed between two machines.

Arseni Mourzenko
  • 2,165
  • 5
  • 23
  • 41
  • I am not familiar with this particular platform, but some Netgear platforms allow you to choose among different hashing algorithms to be used in a LAg group. Depending on the hashing algorithm and how you run your iperf tests, it is possible to obtain speeds above 1Gbps between two machines. – YLearn Jan 09 '20 at 22:48
1

This Q&A was very helpful for me to understand bonding with LACP but there is no concrete example how to verify a throughput of about 1.8Gb/s. For me it was important to verify this so I will share how I have tested it.

As @ChrisS noted in his answer it is important to have completely independent copies of iperf running. To achieve this I connect to the lacp-server with two clients. On the lacp-server I use screen to run independent instances of iperf in two screen windows/sessions. I also ensure to have independent data streams by using different ports for each connection. My switch with bonding LACP to the server is a TP-LINK T1600G-52TS. All devices uses Debian 10 (Buster). The two test clients are connected to a port of the switch. First I started iperf in server mode on the lacp-server two times within screen and then executed on the clients at the same time (using ssh):

iperf --time 30 --port 5001 --client lacp-server   # first test client
iperf --time 30 --port 5002 --client lacp-server   # second test client

Here are the results on the lacp-server for the first connection:

lacp-server ~$ iperf -s -p 5001
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 85.3 KByte (default)
------------------------------------------------------------
[  4] local 192.168.10.11 port 5001 connected with 192.168.10.69 port 44120
[ ID] Interval       Transfer     Bandwidth
[  4]  0.0-30.0 sec  2.99 GBytes   855 Mbits/sec

and for the second connection:

lacp-server ~$ iperf -s -p 5002
------------------------------------------------------------
Server listening on TCP port 5002
TCP window size: 85.3 KByte (default)
------------------------------------------------------------
[  4] local 192.168.10.11 port 5002 connected with 192.168.10.80 port 48930
[ ID] Interval       Transfer     Bandwidth
[  4]  0.0-30.0 sec  3.17 GBytes   906 Mbits/sec

Together this is a Bandwidth of 855Mb/s + 906Mb/s = 1.761Mb/s.

@ArseniMourzenko noted in his answer:

With only two network adapters on the server, there is a 50% chance of getting the same link from two clients, which will result in total speed capped at 1 Gbps. Three network adapters decrease the chance down to 33%, four—to 25%.

I have repeated the test more than 10 times to verify this but always get a Bandwidth of about 1.8Gb/s so I cannot confirm this.

The statistics of the interfaces shows that its usage is balanced:

lacp-server ~$ ip -statistics link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    RX: bytes  packets  errors  dropped overrun mcast
    3088       30       0       0       0       0
    TX: bytes  packets  errors  dropped carrier collsns
    3088       30       0       0       0       0
2: eno1: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond1 state UP mode DEFAULT group default qlen 1000
    link/ether 5e:fb:29:44:e9:cd brd ff:ff:ff:ff:ff:ff
    RX: bytes  packets  errors  dropped overrun mcast
    39231276928 25845127 0       0       0       916
    TX: bytes  packets  errors  dropped carrier collsns
    235146272  3359187  0       0       0       0
3: eno2: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond1 state UP mode DEFAULT group default qlen 1000
    link/ether 5e:fb:29:44:e9:cd brd ff:ff:ff:ff:ff:ff
    RX: bytes  packets  errors  dropped overrun mcast
    36959564721 24351697 0       0       0       60
    TX: bytes  packets  errors  dropped carrier collsns
    267208437  3816988  0       0       0       0
4: bond1: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 5e:fb:29:44:e9:cd brd ff:ff:ff:ff:ff:ff
    RX: bytes  packets  errors  dropped overrun mcast
    69334437898 50196824 0       4253    0       976
    TX: bytes  packets  errors  dropped carrier collsns
    502354709  7176175  0       0       0       0

With three test clients I get these results:

  • 522 Mb/s + 867 Mb/s + 486 Mb/s = 1.875 Mb/s
  • 541 Mb/s + 863 Mb/s + 571 Mb/s = 1.975 Mb/s
  • 534 Mb/s + 858 Mb/s + 447 Mb/s = 1.839 Mb/s
  • 443 Mb/s + 807 Mb/s + 606 Mb/s = 1.856 Mb/s
  • 483 Mb/s + 805 Mb/s + 512 Mb/s = 1.800 Mb/s


References:
Link Aggregation and LACP basics
LACP bonding and Linux configuration
Linux Ethernet Bonding Driver HOWTO
RedHat - Using Channel Bonding

Ingo
  • 396
  • 4
  • 11