2

I want to connect a CentOS 6.4 Linux Box with two NICs to a Cisco 2960S using LACP 802.3ad port aggregation. This mainly for redundancy reasons (and hopefully more bandwith). We don't use VLAN tagging.

With the config listed below the link aggregation only works partially. Approx half of the network hosts can ping and ssh the Linux box, whereas the other half cannot. Same is true for the Linux box itself, approx only half of the hosts can be pinged.

Setting up adapter bonding (or in Cisco speech EtherChannel) shouldn't be that hard. But does anyone know what's wrong here?

On the Linux side, the configuration looks like this:

cat /etc/modprobe.d/bond.conf 
alias bond0 bonding  

cat /etc/sysconfig/network-scripts/ifcfg-bond0 
DEVICE=bond0
ONBOOT=yes
USERCTL=no
BOOTPROTO=none
NM_CONTROLLED="no"
IPADDR=10.76.161.135
PREFIX=21
IPV4_FAILURE_FATAL=yes
IPV6INIT=no
NAME="System bond0"
BONDING_OPTS="mode=4 miimon=100 lacp_rate=1"

cat /etc/sysconfig/network-scripts/ifcfg-eth0
DEVICE="eth0"
BOOTPROTO=none
ONBOOT=yes
MASTER=bond0
SLAVE=yes
USERCTL=no

cat /etc/sysconfig/network-scripts/ifcfg-eth1
DEVICE="eth1"
BOOTPROTO=none
ONBOOT=yes
MASTER=bond0
SLAVE=yes
USERCTL=no

And this commands I applied to the Cisco 2960S:

sw01>enable     
sw01#config term
sw01(config)#int range Gi0/13 - 14
sw01(config-if-range)#description lacp ch2     
sw01(config-if-range)#channel-protocol lacp
sw01(config-if-range)#channel-group 2 mode active
Creating a port-channel interface Port-channel 2
sw01(config-if-range)#no shutdown
sw01(config-if-range)#exit
sw01(config)#interface Port-channel2
sw01(config-if)#description lacp ch2 for ssensvr03
sw01(config-if)#switchport mode access
sw01(config-if)#no shutdown
sw01(config-if)#exit

sw01>show interface description 
Gi0/13                         up             up       lacp ch2
Gi0/14                         up             up       lacp ch2
Po2                            up             up       lacp ch2 for svr03
sw01>show etherchannel summary
Number of channel-groups in use: 1
Number of aggregators:           1

Group  Port-channel  Protocol    Ports
------+-------------+-----------+-----------------------------------------------
2      Po2(SU)         LACP      Gi0/13(P)   Gi0/14(P)   

sw01>show etherchannel 
Group: 2 
----------
Group state = L2 
Ports: 2   Maxports = 16
Port-channels: 1 Max Port-channels = 16
Protocol:   LACP
Minimum Links: 0
StackUnderflow
  • 63
  • 1
  • 1
  • 8

2 Answers2

3

RHEL and CentOS have the NetworkManager enabled by default, which causes troubles. Permanently disable it as root in order to make your adapter bonding working properly:

service NetworkManager stop
chkconfig NetworkManager off
chkconfig network on
service network restart

Additionally to this remove the lacp_rate=1 from the BONDING_OPTS:

BONDING_OPTS="mode=4 miimon=100"
StackUnderflow
  • 63
  • 1
  • 1
  • 8
  • 2
    You can also just add `NM_CONTROLLED=no` to any ifcfg-files for interfaces you don't want managed by NM. I do this on my laptop, so network scripts manage the bridge and wired connection, but I can still use wifi from the graphical interface. – suprjami Jan 09 '14 at 10:10
  • 1
    In fact you should do this, because starting NM in the graphical interface with `nm-applet` will let NM manage any interfaces which don't have `NM_CONTROLLED=no` anyway. – suprjami Jan 09 '14 at 10:11
  • 2
    suprjami, you're right. `NM_CONTROLLED=no` is the better solution in most cases. – StackUnderflow Apr 25 '14 at 07:57
  • I have seen where nm option doesnt work right so its good to have the info for both. This fixed an issue on my box where bond0 was created but none of the slaves would attach to the bond interface. also a side note bond.cfg modprobe didn't put the options on the bond if I had to specify all options in the bonding_opts for it to take. – Kendrick Mar 23 '15 at 15:27
2

I wonder if this is because you are setting Fast LACPDUs (lacp_rate=1) on the Linux end of the bond, but the switch is still running in the default Slow LACPDUs mode (the default), so the bond isn't negotiating properly.

If this is right, you'll be able to either show etherchannel 2 detail or show lacp internal on the switch, the flags on the Channel Group will probably say SA (Slow Active). If you do a show lacp neigh you'll probably see F on the Linux end (Fast).

To resolve this, just remove lacp_rate=1 from your BONDING_OPTS and restart.

Everything else is configured correctly, though you don't need alias bond0 bonding, the network scripts will load and configure the bonding driver when starting the interface.

suprjami
  • 3,476
  • 20
  • 29
  • 1
    That was indeed a configuration error. But the problem still persists after this correction. – StackUnderflow Jul 01 '13 at 14:48
  • Why not change to fast LACPDUs on the switch? – 030 Mar 02 '15 at 15:08
  • 1
    all depends on the switch. I have mine set to fast on the switch side. the bond came up even with my linux box not taking the lacp rate command due to module options being ignored. – Kendrick Mar 23 '15 at 15:29