8

I have a pair of ProCurve 2810-24G that I will use with a Dell Equallogic SAN and Vmware ESXi. Since ESXi does MPIO, I am a little uncertain on the configuration for links between the switches. Is a trunk the right way to go between the switches?

I know that the ports for the SAN and the ESXi hosts should be untagged, so does that mean that I want tagged VLAN on the trunk ports?

This is more or less the configuration:

trunk 1-4 Trk1 Trunk 
snmp-server community "public" Unrestricted 
vlan 1 
    name "DEFAULT_VLAN" 
    untagged 24,Trk1 
    ip address 10.180.3.1 255.255.255.0 
    no untagged 5-23 
exit 
vlan 801 
    name "Storage" 
    untagged 5-23 
    tagged Trk1 
    jumbo 
exit 
no fault-finder broadcast-storm 
stack commander "sanstack" 
spanning-tree
spanning-tree Trk1 priority 4
spanning-tree force-version RSTP-operation

The Equallogic PS4000 SAN has two controllers, with two network interfaces each. Dell recommends each controller to be connected to each of the switches. From vmware documentation, it seems creating one vmkernel per pNIC is recommended. With MPIO, this could allow for more than 1 Gbps throughput.

enter image description here

ewwhite
  • 194,921
  • 91
  • 434
  • 799
3molo
  • 4,340
  • 5
  • 30
  • 46

3 Answers3

12

There has been some debate in the comments to Chopper3's answer that is not well informed because of some poorly understood aspects of Equallogic's networking requirements and multipathing behaviour.

First the VMware side: For starters on the ESXi side the current recommendation, when using the iSCSI Software Initiator, from VMware (for ESX\ESXi 4.1) and Dell is that you should have a single physical Nic mapped to each VMkernel Port that will be used for iSCSI. The binding process that is now recommended enforces this. It requires that you have only one active physical nic and no standby nics for each VMkernel port. No bonding allowed. Now you can cheat this and go back afterwards and add a failover nic but the intention is that MPIO will handle the failover so this serves no useful purpose (at least when everything is working as intended by VMware).

The default multipathing policy will allow active, active connections to an Equallogic array using round robin.

Second the Equallogic side: Equallogic arrays have dual controllers that act in an active\standby mode. For the PS4000 these have two Gigabit Nics on each controller. For the active controller both of these nics are active and can receive IO from the same source. The network configuration recommends that the array's nics should be connected to separate switches. From the server side you have multiple links that should also be distributed to separate switches. Now for the odd part - Equallogic arrays expect that all initiator ports can see all active ports on the arrays. This is one of the reasons you need a trunk between the two switches. That means that with a host with two VMkernel iSCSI ports and a single PS4000 there are 4 active paths between the initator and the target - two are "direct" and two traverse the ISL.

For the standby controller's connections the same rules apply but these nics will only become active after a controller failover and the same principles apply. After failover in this environment there will still be four active paths.

Third for more advanced multipathing: Equallogic now have a Multipath Extension Module that plugs into the VMware Plugable Storage Architecture that provides intelligent load balancing (using Least queue depth, Round Robin or MRU) across VMkernel ports. This will not work if all vmkernel uplink nics are not able to connect to all active Equallogic ports. This also ensures that the number of paths actually used remains reasonable - in large Equallogic environments the number of valid paths between a host and an Equallogic Group can be very high because all target nics are active, and all source nics can see all target nics.

Fourth for larger Equallogic Environments: As you scale up an Equallogic environment you add additional arrays into a shared group. All active ports on all member arrays in a group must be able to see all other active ports on all other arrays in the same group. This is a further reason why you need fat pipes providing inter switch connectiongs between all switches in your Equallogic iSCSI fabric. This scaling also dramatically increases the number of valid active paths between initiators and targets. With an Equallogic Group consisting of 3 PS6000 arrays (four nics per controller vs 2 for the PS4000), and an ESX host with two vmkernel ports, there will be 24 possible active paths for the MPIO stack to choose from.

Fifth Bonding\link aggregation and Inter Switch links in an Equallogic Environment: All of the inter array and initator<->array connections are single point to point Gigabit connections (or 10Gig if you have a 10Gig array). There is no need for, and no benefit to be gained from, bonding on the ESX server side and you cannot bond the ports on the Equallogic arrays. The only area where link aggregation\bonding\whatever-you-want-to-call it is relevant in an Equallogic switched ethernet fabric is on the interswitch links. Those links need to able to carry concurrent streams that can equal the total number of active Equallogic ports in your environment - you may need a lot of aggregate bandwidth there even if each point to point link between array ports and iniatator ports is limited to 1gbps.

Finally: In an Equallogic environment traffic from a host (initiator) to an array can and will traverse the interswitch link. Whether a particular path does so depends on the source and destination ip-address for that particular path but each source port can connect to each target port and at least one of those paths will require traversing the ISL. In smaller environments (like this one) all of those paths will be used and active. In larger environments only a subset of possible paths are used but the same distribution will happen. The aggregate iSCSI bandwidth available to a host (if properly configured) is the sum of all of its iSCSI vmkernel port bandwidth, even if you are connecting to a single array and a single volume. How efficient that may be is another issue and this answer is already far too long.

Helvick
  • 19,579
  • 4
  • 37
  • 55
  • 1
    You need to follow this advice! I work for the largest EQL reseller in the midwest, I put in these systems daily. Tagging/trunks is the way to go and let the MEM plugin create you vswitches for you. Your ISL should be as large as you can afford it to be connection wise. We typically use stackable switches. Juniper EX4200's are AWESOME for iSCSI. – SpacemanSpiff Feb 23 '11 at 01:39
  • Wow, awesome answer. Didnt see this message until just now, but I did manage to get it all up and working as expected, and iometer results shows it performs as good as it gets. Still have to check all redundancy. Thanks a lot for your extremely informative answer! – 3molo Mar 08 '11 at 14:12
6
Since ESXi does MPIO, I am a little uncertain on the configuration for links between the switches. Is a trunk the right way to go between the switches?

ESX/i does its own path management - it won't go active/active on its links unless two or more of its links are either going to the same switch or the switches are in a CAM-sharing mode such as Cisco's VSS - anything else will be an active/passive configuration.

By all means trunk between switches if you want but presumably they both have uplinks to some core switch or router? if so then I'm not entirely sure why you'd trunk between just two switches in this manner as the ESX/i boxes will just switch to the second switch if the first one goes down (if configured correctly anyway).

I know that the ports for the SAN and the ESXi hosts should be untagged, so does that mean that I want tagged VLAN on the trunk ports?

I don't know where this assumption comes from, ESX/i is just as comfortable working in a tagged or untagged setup, whether for guest or iSCSI traffic. That said I have had problems with mixing tagged and untagged when using default vlans so I always tag everything now and have no default vlan, it's a very flexible setup and has no discernible performance hit in my experience.

Chopper3
  • 100,240
  • 9
  • 106
  • 238
  • I'm pretty sure the ESXi documentation for SAN storage states that one should not bond nics, but instead rely on MPIO both for increased performance and benefits of redundancy (no matter if links go to the same switch or not). Of course there will be no uplinks to core switches, this is a pair of switches for storage only. I also state that I intend to use untagged vlans to hosts and SAN, so that still makes my question valid; should I or should I not use TAGGED in the trunk links? – 3molo Feb 15 '11 at 08:48
  • 1
    The reasoning for using tagged ports is if you need to carry more than one VLAN down it. Tagging the VLANs gives you the ability to distinguish between them. You also don't need to use LACP to create a link aggregation (trunk for HP, Etherchannel for Cisco) bundle. You can set a static aggregation group and benefit from the switch side balancing and fail over. That said, it's also common to leave the switch side alone and let ESX handle the traffic decision making. – mcmeel Feb 15 '11 at 08:52
  • mcmeel, please consider writing a real answer. Its easier to comment. Not sure how the inter-switch configuration would be, if I would let ESXi do the decision making? – 3molo Feb 15 '11 at 08:55
  • -1 chopper, I feel you either don't know enough about the subject or you didn't quite read my question. – 3molo Feb 15 '11 at 09:06
  • I'm sorry 3molo but you're wrong - you also seem confused by how 'bonding' and 'MPIO' are related. We have over 8,000 ESXi 4.1 hosts, I'm a VCP and VCAP trained and bonding of NICs is not only supported but significantly encouraged with bonding able to play an important role in an MPIO configuration. As for "of course there will be no uplinks" how is this apparent from your question? Even without uplinks I don't see how an cross-switch trunk would help unless you only connect the Equalogic box into a single switch? mcmeel is also a) correct and b) trying to help you here - don't be so abusive. – Chopper3 Feb 15 '11 at 09:18
  • I think it's apparent from my configuration. The EQ will be connected to both switches, each controller to both even. The ESXi hosts will use two pNICs for storage, one connected to each switch. Is bonding still valid in this scenario? – 3molo Feb 15 '11 at 09:28
  • Will a pair of aggregated links on my procurves not help me at all? From EQ PS4000 manual: "For Maximum network bandwidth and availability. Dell recommends that you use four network cables to connect Ethernet 0 and Ethernet 1 on each control module to a different network switch. The switches must be connected with interswitch links that have sufficient bandwidth.". The picture shows two links between the switches. – 3molo Feb 15 '11 at 09:36
  • You're wrong, nothing in your question implies that, and needlessly rude. As for the cabling scenario you finally grace us with, no bonding won't help in this situation as your switches don't support VSS/VSL or similar modes - you're stuck with only 1Gbps of throughput per server. – Chopper3 Feb 15 '11 at 09:36
  • Aggregated links could help you, if you wired it that way - you'd need two or more links from one server to one switch, not the way you describe above. So for a bonded 2 x 1Gbps link with resilience you'd need 4 NICs, two going to each switch. As for me "not knowing enough about the subject" or not "quite reading your question", I don't know why I'm trying to help - you're the one who doesn't know how to do this or how to write clear questions - you're not exactly helping yourself here. – Chopper3 Feb 15 '11 at 09:42
  • Im sorry. I didnt mean to sound rude, and I am greatful for any help. Updated the question with a crude sketch. I realize that you know what you are talking about. Would I not benefit at all performance wize, if having one vmkernel per pNIC connected to one switch each? Th vmware documentation states that one session per pNIC will be established to the SAN. Seems to me that would allow me to utilize more than 1 Gbps? Would the link aggregation in this case not help me at all? – 3molo Feb 15 '11 at 10:01
  • The scenario you're outlining is not link aggregation and has nothing to do with ethernet, it solely relies on ESXi doing the load-balancing across two independent vmkernel-configured vSwitches. I don't believe their is a policy to allow for this, it would mean the same LUN being opened by two vmkernels on the same host - I'm actually 'lab'ing this right now and will let you know my findings later. – Chopper3 Feb 15 '11 at 11:56
  • Okey chopper, you don't need to do that. It's more likely that I missunderstood it and I am only able to utilize 1 Gbps. How is it not link aggregation between the switches btw? – 3molo Feb 15 '11 at 12:03
  • Right, in my lab, admitted only using the software iscsi initator I can't get the build in multi-pathing of ESXi 4.1u1 to go over two adapters in an active/active manner - round-robin yes but not active-active. This may be different with hardware iscsi but I don't have the time right now to try that sorry. – Chopper3 Feb 15 '11 at 13:51
  • Link aggregation required two or more ports to go to either the exact same switch or to a pair of switches that support VSS/VSL - the reason being that L2 expects a single switch to 'own' a given MAC, aggregation creates a trunk with a single MAC, so that's easy for a single switch to say "these two are actually that MAC" but try doing that across two regular switches - doesn't work, so you need switches that can work as a pair such as Cisco Cat 65xx VSS models. Really not sure what the cross-switch trunk is for you know, no idea at all. – Chopper3 Feb 15 '11 at 13:54
  • 1
    The LAG works as expected. I have two 1 Gbit links and the EQ is connected to one switch each. I get up to 235 MB/s sequential reads and writes. Either we didnt understand each other at all, or I was correct in my statements about the setup. Btw, its round-robin but it states active/active. – 3molo Mar 08 '11 at 14:15
1

It's the SAN array controller that defines how you should connect this. Is it providing the same LUN on both ports on the same controller? Then port0 goes to switchA, port1 to switchB and same with the next controller.

Why would you want to use LACP/etherchannel against a iSCSI SAN with 1gbit uplink ports? It doesnt help you in any way. Create 2 vswitches with a single pNic in each vSwitch, and connect the first pNic to switchA, second pNic to switchB. This will give you full redundancy against controller/switch/nic failures.

pauska
  • 19,532
  • 4
  • 55
  • 75
  • The etherchannel/LACP is inter switches only, but that doesnt help me at all? I imagined that connections could traverse between the switches because of MPIO, in case say a port on the switch where one of the controller is connected to. – 3molo Feb 15 '11 at 12:04
  • Why would connections traverse between the switches? It makes no sense. – pauska Feb 16 '11 at 08:26
  • 1
    Each initiator contacts the groups IP, which redirects to the IP of one of the NICs in the group. There is nothing stopping an initiator on Switch A connecting to the array on it's NIC connected to Switch B. Based on the number of connections, a 4GB LACP link between switches should be sufficient to avoid any troubles. I personally prefer to connect all the ports on a controller to one switch. When you split them, you halve your bandwidth at the array in a failure situation. – SpacemanSpiff Feb 23 '11 at 02:00
  • Great answer SpacemanSpiff, I went with 2 LACP links. – 3molo Mar 08 '11 at 14:13