3

I've got a connection in a datacenter where the network provider gives me two Ethernet connections. They're supposedly hooked up to the same VLAN, such that I can wire them up to my switch and only one of them will be active at a time, but either side could then do hardware maintenance (rewiring, switch upgrades, etc.) without causing a service outage.

I've partitioned my switch off to have a separate VLAN for this external edge - let's say that ports 1-3 are on the VLAN, with ports 1 & 2 being my colo-provided internet connections, and port 3 being the outside interface of my firewall. This works fine with either port 1 or port 2 connected, but about 2 minutes after both are connected simultaneously, my switch becomes unresponsive, I get about 80% packet loss, and doing some diagnostics show millions of broadcast packets per minute.

I have a basic understanding of STP to know that it should be enabled for this to work; while STP is turned on, both interfaces still get marked as Forwarding.

Anyone have any ideas on what would cause the packet storm? Is there a better way to set up a redundant connection?

Xerxes
  • 4,133
  • 3
  • 26
  • 33
natacado
  • 3,317
  • 28
  • 27

2 Answers2

4

Quick answer: You need to talk to your provider.

In order for STP to prevent the network loop you're getting, all potential nodes in a loop must be running the same STP protocol configured the same way.

You need to get in touch with your provider and ask him "How is STP configured?" and ensure that your end is the same. (Possible spanning tree protocols include STP, MST, RST, PVST, PVST+, ...)

On the other hand, it's quite possible that he's not running STP on your links since you're probably not sharing VLAN configurations.

If he's willing to do so, configure link aggregation on those uplinks (on both ends!). Then you won't need to worry about STP.

MikeyB
  • 38,725
  • 10
  • 102
  • 186
  • Much delayed on the response, but it ended up being a case of needing to use STP instead of RSTP. My switch is a Foundry/Brocade FESX448; running: show 802-1w detail showed that I was receiving Config BPDUs instead of RST BPDUs, which according to the documentation means it's the older 802.1d STP. Adding spanning-tree 802-1w force-version 0 to the VLAN config kicked it into an STP/MST-compatible mode, and it works, albeit more slowly than RSTP would – natacado Aug 24 '09 at 01:18
1

As has already been said, one way to get a properly working redundant connection at layer 2 like this is by running some variant of spanning tree protocol. Most, if not all, variants have a compatibility mode to deal with older switches running the original 802.1d spanning tree so specifically matching versions are not required for basic operation; however fail-over speed will not be good if you are running in compatibility mode.

For a colo provider this type of loop can be disastrous. Configuring the switches to detect and prevent this type of loop is absolutely fundamental to running a large switching network. I would seriously consider changing your supplier or at least requesting that they review their best practices.

I generally prefer not to link switches under different administrative domains into a single spanning tree.

One method to break the loop on your side without spanning tree would be to use some form of "private-vlan" configuration on your side. Most of the larger switch vendors have a feature of this type which can be used to prevent subscribers on the same vlan talking to each other directly. This is very useful in the data-centre and metro ethernet area. In your configuration ports 1 & 2 would be marked as private/UNI (or whatever your vendor's terminology is) and port 3 would be promiscuous/NNI (etc).

Some vendors also have an explicit backup port configuration which will allow you to mark a port as inactive until the configured primary link goes down.

If you have more than one switch make sure that you are in control of your spanning tree. Generally you can just string switches together and as long as spanning tree is running they will just work; however sooner or later something will break, and if you know where your root is and which links should be forwarding/blocked it makes troubleshooting a lot easier.

Russell Heilling
  • 2,527
  • 19
  • 21