5

Hey all this is a repost from a question I asked on the cisco forums but never got a useful reply.

Hey I'm trying to convert the FreeBSD servers at work to dual-gig lagg links from regular gigabit links. Our production servers are on a 3560. I have a small test environment on a 3550. I have achieved fail-over, but am having troubles achieving the speed increase. All servers are running gig intel (em) cards. The configs for the servers are:

BSDServer:

#!/bin/sh

#bring up both interfaces
ifconfig em0 up media 1000baseTX mediaopt full-duplex
ifconfig em1 up media 1000baseTX mediaopt full-duplex

#create the lagg interface
ifconfig lagg0 create

#set lagg0's protocol to lacp, add both cards to the interface,
#and assign it em1's ip/netmask
ifconfig lagg0 laggproto lacp laggport em0 laggport em1 ***.***.***.*** netmask 255.255.255.0

The switches are configured as follows:

#clear out old junk
no int Po1
default int range GigabitEthernet 0/15 - 16

# config ports
interface range GigabitEthernet 0/15 - 16
description lagg-test
switchport
duplex full
speed 1000
switchport access vlan 192
spanning-tree portfast
channel-group 1 mode active
channel-protocol lacp
**** switchport trunk encapsulation dot1q ****
no shutdown
exit

interface Port-channel 1
description lagginterface
switchport access vlan 192
exit

port-channel load-balance src-mac
end

obviously change 1000's to 100's and GigabitEthernet to FastEthernet for the 3550's config, as that switch has 100Mbit speed ports.

With this config on the 3550, I get failover and 92Mbits/sec speed on both links, simultaneously, connecting to 2 hosts.(tested with iperf) Success. However this is only with the "switchport trunk encapsulation dot1q" line.

First, I do not understand why I need this, I thought it was only for connecting switches. Is there some other setting which this turns on that is actually responsible for the speed increase? Second,

This config does not work on the 3560. I get failover, but not the speed increase. Speeds drop from gig/sec to 500Mbit/sec when I make 2 simultaneous connections to the server with or without the encapsulation line. I should mention that both switches are using source-mac load balancing.

In my test I am using Iperf. I have the server(lagg box) setup as the server(iperf -s), and the client computers are client(iperf -c server-ip-address), so the source mac(and IP) are different for both connections.

Any ideas/corrections/questions would be helpful, as the gig switches are what I actually need the lagg links on. Ask if you need more information.

Flamewires
  • 433
  • 2
  • 10
  • Looks good. Perhaps there's something else in the configuration on the 3560 switch? – Chris S Jul 07 '10 at 15:20
  • I'm not sure, I don't think so as i default the ports and remove the logical interface. Is there anything in particular that comes to mind? I can't really post the whole running config for privacy/security reasons. – Flamewires Jul 07 '10 at 16:16
  • 2
    I might be wrong, but have you tried, **load-balance dst-mac**? – YOU Aug 17 '10 at 02:13
  • 1
    Perhaps the obvious question: are you running the other side of the traffic tests from two different machines on the same VLAN as your LACP host? IOW, is the combined bandwidth of your senders > 1G? – James Cape Nov 24 '10 at 13:55
  • @S.Mark `load-balance dst-mac` will add the destination MAC address to the hash computation which won't help since a single flow will always have the same destination MAC. Not only that, in this case the bundle connects directly to a single server, so the destination MAC will be the same for every packet. – eater Jan 06 '11 at 23:58

2 Answers2

1

+1 to James Cape. Most Ethernet bonding for speed increases only affect multiple connections. A single socket usually won't be spread across more than one interface.

Note the use of "usually," as I'm no link bonding expert.

Jeff McJunkin
  • 1,342
  • 1
  • 8
  • 16
  • i was. i think the issue was resolved by adding src-dst-mac which the 3550 doesn't have. however, i selected eaters answer because its very informative, something i only learned after hours of messing around with those switches. – Flamewires Jan 07 '11 at 01:46
0

802.3ad link aggregation (and many other multi-path techniques) typically split the traffic across multiple links on a per-flow basis, not per-packet—and for good reason: There will always be a slight difference in the transmission delay on each link. Perhaps one interface has a more efficient driver, or higher interrupt priority, or or the cable is a little shorter so the electrons can get there faster (seriously). Whatever the case may be, it's extremely common for the packets to arrive at the destination in a different order than they were sent.

Lots of out-of-order packets are generally a Bad Thing because TCP only acknowledges the last packet received in order. Out-of-order packets will result in TCP sending duplicate ACKs, which are interpreted by TCP as indicative of congestion, prompting TCP to slow down (or even retransmit unnecessarily if TCP fast retransmit is triggered).

None of this is a problem, of course, if each conversation is confined to a single link. Nobody cares if the packet from one conversation gets reordered with a packet from another conversation.

So most multi-path implementations select the outgoing physical link by performing a hash algorithm on a few header fields. Normally the header fields looked at would be a combination of:

- src-ip
- src-port (or possibly another identifier if not TCP/UDP)
- dst-ip
- dst-port (or possibly another identifier if not TCP/UDP)
- ip-proto
- vlan-id

So for each packet, the values of each of those fields gets hashed together and the result determines which interface to send the packet out of. On average if there are lots of different flows, a similar amount of traffic will end up on each link. If you only have one flow, all of those fields will be the same for every packet, so every packet ends up on the same link.

If you have two flows, you're basically taking a 50/50 gamble that the two flows will be on different links—which is exactly what you're seeing. If you get unlucky, you can roll the dice again by changing any of the variables that are considered by the hash function: try a different port, for example. In fact by turning on 802.1q tagging, you introduced a vlan-id into the mix which apparently changed the result of the hash.

Also, there is no standard way of performing the hash which means if you've connected systems from different vendors (or even different versions of software from the same vendor), each side may perform the hash in a different way, so two particular flows may end up different links from server to switch, but the same link from switch to server.

The bottom line is that 802.3ad and other packet-level multi-path techniques are necessarily flow-based and work great if you have many different flows, but are not well suited to a small number of large flows.

eater
  • 1,519
  • 9
  • 12