6

We're lucky, every server we have has multiple NICs/HBAs/CNAs connected to multiple switches and this approach has kept our platform up on numerous occasions. That said we ran into a problem last week that I'm not sure how to fix.

We had a switch that was carrying a good chunk of our traffic crash (the details aren't important but it was a Cisco 6509, it had a hard CPU crash and didn't come back up automatically). Unfortunately it left its line cards working (i.e. L1 & L2 up) but lost all of its uplinks. The servers connected were the following;

  • Windows Server 2003 32-bit EE SP2 with Veritas Storage Foundation
  • Oracle Enterprise Linux 5.3 64-bit
  • VMWare ESXi 4.0
  • NetApp 3040 running OnTap 7.3.2

All of these machines failed to detect the crashed switch and kept sending traffic its way rather than detecting the failure and moving their traffic to the another switch.

I need help looking at my options for better multipathing, this can't be the first time this has happened - there must be other ways of doing this (polling the HSRP interfaces for instance) - can you help?

Thanks in advance.

Chopper3
  • 100,240
  • 9
  • 106
  • 238

1 Answers1

4

If the switches between your Cisco 6509 and your servers are also Cisco you have an option to shut down all the ports if one (or more) ports goes down. You set a set of "upstream" ports and "downstream" ports. If all the upstream ports go down, the switch will take down the downstream ports.

It is called link state tracking and it is designed for situations like yours.

You will find a little info on this page.

Antoine Benkemoun
  • 7,314
  • 3
  • 41
  • 60
  • Sorry, I can't have been clear enough, we only use 6509s (and Nexus's in the core), the servers connect directly to the 6509s in this case. The problem with this option is that it's based on the assumption that the switches lose their actual links rather than the switches just crashing - in this situation they have no ability to action anything at all - they're dead. Thank you for your suggestion though – Chopper3 Sep 20 '10 at 15:14
  • I guess pinging the HSRP interface is a solution then but it doesn't look very good. Another way to do this would be to traceroute some host and expect it to have at least X hops. If it doesn't reach the Xth hop, then you can suppose that interface is no good. – Antoine Benkemoun Sep 20 '10 at 15:25
  • 1
    The problem is that I want that 'pinging' ability built into four different platforms - two of which I can't really make changes to (ESXi and Netapp). Cheers. – Chopper3 Sep 20 '10 at 15:49