Cisco HSRP with spanning-tree slow failover

Question

I'm having an issue with networking that I can't wrap my head around since I'm not a strong networking guy to get this. From our provider we have 2 drops via HSRP that go into our cisco 2960 switches that are stacked. So each switch has a drop. From there we have two Astaro devices behind the switches that handle all the firewall and VLAN routing. These then feed back into the Cisco 2960's and also all the VM hosts are on the same 2960's So it looks something like

                           --------------              --------------
                   |------ | Cisco 1 2960 | <--------> |Astaro 1 / VMS|
                   |       ______________              --------------
----------- --------
| Uplink  | 
|---------- -------- 
                   |       --------------              --------------
                   |-------| Cisco 2 2960 | <--------> |Astaro 2 / VMS|
                           --------------              --------------

So at anytime a cisco is the master of the stack and the an astaro is also master.

Say I have the following scenerio

Master Astaro is #1 Master Switch in the stack is #2

If I reload switch #2 i get around a 2 minute downtime as switch 1 takes over and things re-negotiate.

Some of my cisco configs look like

spanning-tree mode rapid-pvst 
spanning-tree extend system-id
no spanning-tree vlan 1,100

interface GigabitEthernet1/0/1
 switchport access vlan 100
 switchport mode access
 switchport nonegotiate
 duplex full
!
interface GigabitEthernet1/0/2
 switchport mode trunk
 switchport nonegotiate
!
interface GigabitEthernet1/0/3
 switchport mode access
 switchport nonegotiate
!
interface GigabitEthernet1/0/4
 switchport access vlan 100
 switchport mode access
 switchport nonegotiate
!

port 1 is to my provider and 2-4 are to the switch to the astaro for management port/vlan port and wan port.

I'm at a lose for why I can't have a better then a 2 minute failover if I reboot a switch.

Edit

below is the config for our "stack"

sw1a>show switch
Switch/Stack Mac Address : 64d8.1431.6a80
                                           H/W   Current
Switch#  Role   Mac Address     Priority Version  State
----------------------------------------------------------
 1       Member 0cd9.960b.5b00     15     1       Ready
*2       Master 64d8.1431.6a80     10     1       Ready

Port 1 on the switch is our uplink
port 2 is the WAN port which goes back to the astaro
port 3 is the management vlan port back to the astaro
port 4 is the vlan port that goes back to the astaro

The astaro is just pretty much a linux appliance that gives a gui to all the iptables and such tools that linux will offer for networking.

Need some clarifications. Are your Astaro devices running spanning-tree? Why did you disable STP on VLANs 1 and 100? Which of ports 1-3 are doing what? And when you're saying "master switch", are you talking about spanning-tree root? Also, 2960 switches don't "stack". Are they independent switches, or are they actually 3750s? It would also be helpful if you had an image that was a more-detailed diagram including port numbers. How are your switches connected to each other? — Keller G, Apr 10 '13 at 22:13
I disabled it due to what i was reading might fix it. These are 2960's also. I'm a server guy not a networking person so I know the basics of why we use spanning tree that's about it. i also updated my post — Mike, Apr 10 '13 at 22:22
Okay, so I was wrong -- looks like 2960s CAN stack with FlexStack, didn't know that. — Keller G, Apr 10 '13 at 22:54

score 2 · Accepted Answer · answered Apr 10 '13 at 23:04

Based on your edits and comments, I don't think that this is spanning-tree delay that you're seeing. The downtime that you're describing (2 minutes) is really too long to be explained by STP, and I kind of doubt that the Linux servers are running STP with the switches. You also basically are doing single-switch spanning-tree, as a switch stack is considered one logical switch.

There are some STP tweaks that are probably a good idea in your situation, though. First of all, you can re-enable Spanning-Tree on your VLANs -- no reason to have it turned off. Mode rapid-pvst is a good idea unless you're trying to run spanning-tree with the Linux boxes. You can also tell the switch that the trunks towards your Linux devices (Gi1/0/2) are not switches.

spanning-tree vlan 1,100
interface GigabitEthernet1/0/2
spanning-tree portfast trunk

That leaves the other redundancy features you've got here, which are the switch stack itself, HSRP, and anything on the Astaros.

My bet is on the failure recovery mechanism on the Astaros. Since you mentioned that one is "master", that implies that only one is active at any one time. What kind of timers are setup on the Astaros devices for failover? Do you have any logs that indicate how long it takes the standby device to go active after the switch fails?

Spanning-tree doesn't seem right because of the fact that all the STP is being done on one switch, and because of the downtime. The switch stack (at least on 3750 stacks) failover should be faster than that too, although you might hookup a console to the secondary switch to see if its taking a long time to take over as master. HSRP (assuming its running at the provider and not on your switches) will also fail a good bit faster than that, and shouldn't be affecting you.

TL;DR -- I think it's the failover timers on your Linux boxes that are causing the delay. Second place goes to the switch stack taking a long time to have the secondary switch take over as master.

You are a hero amoung men. It was the `spanning-tree portfast trunk` that I think was what did it.. now it's around 3-5 seconds which is fine and could be less if i can get my uplink to lessen their side of hsrp — Mike, Apr 11 '13 at 00:57

Cisco HSRP with spanning-tree slow failover

1 Answers1