5

We have 3x ESX hosts and 2x SANS that we wish to move to a redundant 10G networking infrastructure.

We have 4x Dell PowerConnect 8024F's to provide our backbone and are configured as so (only core switches relevant to this question):

Switching

So the questions are:

1) Do the interconnects between the 4x 8024F's need to be LAG'd or just STP'd

2) As the NICs on the servers are split across 2 switches, does any special configuration need to be done here or on the switches?

3) If a link or switch fails will the switches automatically find a new path to the Server/SAN?

ewwhite
  • 194,921
  • 91
  • 434
  • 799
Myles Gray
  • 639
  • 4
  • 12
  • 33

3 Answers3

4

For the user facing network, STP is fine. Yes, you will have a small interruption as a new tree is computed. However, the convergence time is lower that the TCP timeout so the interruption should be effectively unnoticeable. Only for extremely time sensitive application such as VOIP would you run in to problems, and even those can be mitigated.

For your iSCSI network, you should use multipathing (MPIO). This can detect a failed path much faster and retry before the storage system gives up.

longneck
  • 22,793
  • 4
  • 50
  • 84
  • So connect the switches as above, unstacked, unLAGd and use separate IPs on the interfaces coming from the NICs on the servers and SANs to provide multiple iSCSI targets for the same LUNs allowing failover from the ESX side? – Myles Gray Jul 01 '13 at 15:10
  • Yes, that's the first step. There is probably more you have to do to get multipath working. I found this http://www.techromeo.com/?p=45 but I can't attest to the quality of the instructions. – longneck Jul 01 '13 at 15:16
0
  1. You should avoid the use of STP since when your network topology changes, your switch will stop forwarding packets few seconds, leading to an interruption of your network. But to do this you would need to stack your 8024F, then use link aggregation

  2. With a stacked configuration, 2 switches = 1, then you can use 802.3ad even between switches and servers

  3. That's the purpose of 802.3ad : https://www.kernel.org/doc/Documentation/networking/bonding.txt

Ladadadada
  • 25,847
  • 7
  • 57
  • 90
user2299634
  • 147
  • 4
  • However the problem with stacking the 8024F's is there is zero redundancy, if we lose the master we lose the stack until the entire stack reboots and a new master (the remaining switch) is re-elected? – Myles Gray Jul 01 '13 at 14:49
  • 1
    Implementation detail. One would assume that this depends on the type of switch. I am sure enterprise models can handle the master failing and falling over to another system. Need to read documentation and ask question to find that out. – TomTom Jul 01 '13 at 14:51
  • Sorry, that was meant as a statement rather than a question, the 8024's on master failure - reboot the entire stack and re-elect a master meaning zero redundancy. – Myles Gray Jul 01 '13 at 14:53
  • Then you exactly pointed your issue... You have to chose between trusting your hardware or trusting your links/optics/facility management ("oops we scratched this wire"). – user2299634 Jul 01 '13 at 15:00
  • So the answer is redundancy cannot be provided? – Myles Gray Jul 01 '13 at 15:02
  • For your network, automatic failover can be done (STP does this job), but saying "We'll never experience network interruption" cannot be said for me, because of the convergence time. Look at rapid stp to minimize your eventual outage on your interconnections. For switches to server I'd recommand still 802.3ad in active-backup mode (active-active can only be done with links on the same switch, or stacked switches). When the link goes down, failover is done properly. – user2299634 Jul 01 '13 at 15:06
  • That's more like what I wanted to hear, so teaming with 802.3ad would still be preferred over say separate IFs with separate IPs and relying on MPIO failover from ESX? – Myles Gray Jul 01 '13 at 15:09
0

I would recommend you post this over on the Network Engineering Stack exchange site. However, I believe the following answers to be solid answers.

1) LAG is not what most people thinks it is. It increases the redundancy and capacity of an interconnect, not of the connection itself. If you have two 1Gb links, your total throughput capacity is 2Gb/sec, but the most any single transfer will use is 1Gb/s. But it allows you to have two simultaneous 1Gb/sec transfers running. That being said, I see no reason to LAG your 8024 backone (storage or network traffic?) switches, unless you have unstated requirements. You already are setup for redundancy and MPIO. I would disable STP on the ports between the switches and the SAN and Hosts. On Cisco I'd set the ports to "switchport mode access" and "spanning-tree portfast". I don't know what the PowerConnect equivalent is. If you are concerned about redundancy, ensure that each switch has A & B power supplies and that you have independent A & B power circuits that the corresponding switches are plugged into.

2) For iSCSI, VMWare has a white paper and setup guide here: http://www.vmware.com/files/pdf/techpaper/vmware-multipathing-configuration-software-iSCSI-port-binding.pdf This document is very straight forward. For VM network traffic, this depends on your needs. As it is, I would not configure any kind of LACP/LAG between the VMHOSTS and switches. Per the VMWare Networking Best Practices recommendation here:VMWare Networking Best Practices.PDF; I would team 2 nics per vswitch (set to trunk 802.1q w/no spanning-tree, split across the two ciscos) and use Active/Active teaming based on originating VM port-id on the vmhost or alternatively put all 4 nics in one vswitch in Active/Active and put 1 port from each nic card on each switch. TLDR: Read the VMWare network best practices and design LAN switching to your requirements. Nothing special needs to be done in regards to the individual nics for iSCSI; configure MPIO via VMWare guide. PAY ATTENTION TO STP SETTINGS FOR YOUR VLANS

3) This depends. For iSCSI traffic, if you have MPIO enabled, then yes. Theoretically you could lose 1 switch upstairs and 1 switch downstairs and keep running, but in a degraded capacity. For the VM network traffic, it depends on how you have the VMWare vswitches configured and your VLAN/STP environment. But properly configured, then yes, you could lose 1 Cisco and 1 8024 switch downstairs and continue running in a degraded capacity.

That being said, your gotchas here will be your vSwtich configuration and ensuring your VLAN and STP settings are correct. Good luck!

Chiron
  • 1