7

I have DL380 server with QLogic Gigabit Ethernet installed. Simply trying to create a bond but can't seem to get throughout more than 1 Gig link. All 3 cables from 2 servers are connected to S40 switch where I created the LACP (Lag) the link comes up and lag shows active, but I just can't get more than 1 Gig throughput. I am testing with iperf3. Tried all different bonding modes, rr, 802.3d everything but can't just go more than 900 Mbps or so. I am missing something but can't figure out.

Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)

Bonding Mode: IEEE 802.3ad Dynamic link aggregation
Transmit Hash Policy: layer3+4 (1)
MII Status: up
MII Polling Interval (ms): 0
Up Delay (ms): 0
Down Delay (ms): 0

802.3ad info
LACP rate: fast
Min links: 0
Aggregator selection policy (ad_select): stable
System priority: 65535
System MAC address: 9c:8e:99:0b:78:70
Active Aggregator Info:
    Aggregator ID: 4
    Number of ports: 3
    Actor Key: 9
    Partner Key: 418
    Partner Mac Address: 00:01:e8:d5:f4:f3

Slave Interface: enp3s0f1
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 9c:8e:99:0b:78:70
Slave queue ID: 0
Aggregator ID: 4
Actor Churn State: none
Partner Churn State: none
Actor Churned Count: 0
Partner Churned Count: 0
details actor lacp pdu:
    system priority: 65535
    system mac address: 9c:8e:99:0b:78:70
    port key: 9
    port priority: 255
    port number: 1
    port state: 63
details partner lacp pdu:
    system priority: 32768
    system mac address: 00:01:e8:d5:f4:f3
    oper key: 418
    port priority: 128
    port number: 12
    port state: 63

Slave Interface: enp4s0f0
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 9c:8e:99:0b:78:72
Slave queue ID: 0
Aggregator ID: 4
Actor Churn State: none
Partner Churn State: none
Actor Churned Count: 0
Partner Churned Count: 0
details actor lacp pdu:
    system priority: 65535
    system mac address: 9c:8e:99:0b:78:70
    port key: 9
    port priority: 255
    port number: 2
    port state: 63
details partner lacp pdu:
    system priority: 32768
    system mac address: 00:01:e8:d5:f4:f3
    oper key: 418
    port priority: 128
    port number: 7
    port state: 63

Slave Interface: enp4s0f1
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 9c:8e:99:0b:78:74
Slave queue ID: 0
Aggregator ID: 4
Actor Churn State: none
Partner Churn State: none
Actor Churned Count: 0
Partner Churned Count: 0
details actor lacp pdu:
    system priority: 65535
    system mac address: 9c:8e:99:0b:78:70
    port key: 9
    port priority: 255
    port number: 3
    port state: 63
details partner lacp pdu:
    system priority: 32768
    system mac address: 00:01:e8:d5:f4:f3
    oper key: 418
    port priority: 128
    port number: 5
    port state: 63

Tried all sort of google stuff but can't seem to get it working. And I am out of any ideas. Appreciate if someone can drive me to the correct direction.

Thanks.

NBhatti
  • 173
  • 5

2 Answers2

12

@ewwhite right. I'll just explain some thing. When you test your link between two machines, you use only one NIC, LACP will not split packets across multiple interfaces for a single stream/thread. For example a single TCP stream will always send/receive packets on the same NIC. So you could see higher speed only when test with more then one destination. There is good answer, where it described.

Alexander Tolkachev
  • 4,513
  • 3
  • 14
  • 23
  • Yep! This is the right thing. – ewwhite Aug 14 '17 at 20:20
  • 1
    So the aggregation does works but in specific conditions. I would say sending data to/from multiple hosts that will kick in automatic load balancing of the slave interfaces thus increasing the throughout. It's interesting to see quite of this misconception that somehow automagically the packets will get distributed over the different slaves but going thorough the link you mentioned above and the bonding document does seems to clear enough thoughts. Seems like the only option is to play around with ```xmit_hash_policy```, I guess. – NBhatti Aug 14 '17 at 20:46
  • @NBhatti yes, you're right. – Alexander Tolkachev Aug 14 '17 at 20:57
8

It seems like bonding and LACP are one of the worst understood concepts in networking.

But the short explanation is that you'll never achieve more than one connection's throughout with a single source-destination pair. If you need more bandwidth on a single connection, you'll have to move to 10GbE.

ewwhite
  • 194,921
  • 91
  • 434
  • 799
  • I understand that. Either of them should work. Nic Teaming, bonding or whatever we need to call it. Should be able to aggregate throughout leaving overheads a side. Yes 10 GbE is the way forward, but I would have to buy in more equipment and upgrade all servers for that. For now, I need to get this working for a PoC to be able to run Ceph for some testing – NBhatti Aug 14 '17 at 20:17
  • 2
    You can't 'get this working' if 'it doesn't work like that'. Read the [Linux Bonding Howto](https://www.kernel.org/doc/Documentation/networking/bonding.txt). I'd pay close attention to `xmit_hash_policy`. –  Aug 14 '17 at 20:20