29

This is a Canonical Question about iSCSI we can use as a reference.

iSCSI is a protocol that puts SCSI commands as payload into TCP network packets. As such, it is subject to a different set of problems than, say, Fibre Channel. For example, if a link gets congested and the switch's buffers are full, Ethernet will, by default, drop frames instead of telling the host to slow down. This leads to retransmissions which leads to high latency for a very small portion of storage traffic.

There are solutions for this problem, depending on the client operating system, including modifying network settings. For the following list of OSs, what would an optimal iSCSI client configuration look like? Would it involve changing settings on the switches? What about the storage?

  • VMWare 4 and 5
  • Windows Hyper-V 2008 & 2008r2
  • Windows 2003 and 2008 on bare metal
  • Linux on bare metal
  • AIX VIO
  • Any other OS you happen to think would be relevant
Basil
  • 8,811
  • 3
  • 37
  • 73
  • iSCSI is far more complex than that - but with respect to the IP-stack all applies that applies to high-throughput, low latency IP connections - not much special here. – Nils May 18 '12 at 20:27

3 Answers3

6

I'm not familiar with VMWare, but I do use Xenserver and I have used Hyper-V (R2).

With my current Xenserver configuration I have:

  • 8 Dell Poweredge 29xx servers
  • 2 Dell Powerconnect 6248 switches
  • 2 Dell MD3000i SAN (iSCSI)

I have setup my switches in a multipath configuration and optimized for iSCSI by:

  • Separating my switches into 3 VLANS (2 for iSCSI traffic and 1 for management)
  • Using JumboFrames
  • Applying the "iSCSI" optimizations that the powerconnect has

Each server has multiple network cards to provide a connection to each switch, in turn providing redundancy via multipathing between the servers and the iSCSI SAN. The iSCSI VLANs contain no other traffic than iSCSI.

I'm pleased to report that with this configuration the Xenserver "cluster" works brilliantly.

On a side note I do have a Windows 2008 server connected directly by iSCSI to an HP SAN (old file server). It used to run Windows 2003, and regularly would drop the connection (even after a reinstall of 2003); however, as soon as I upgraded to windows 2008 it remains connected.

I'll be happy to answer any question about my setup.

Basil
  • 8,811
  • 3
  • 37
  • 73
Steve
  • 188
  • 4
  • 1
    Are you using the stacking cables between the two Dell switches? – SpacemanSpiff May 17 '12 at 19:43
  • Why iSCSI? Why not DRBD on directly connected MD3000? – Nils May 18 '12 at 20:31
  • @SpacemanSpiff My switches are not stacked. – Steve May 21 '12 at 18:57
  • @Nils I have not researched DRBD, although I have heard of it. What will DRBD offer over iSCSI for my directly connected storage? – Steve May 21 '12 at 18:57
  • DRBD has no SCSI-overhead. The other thing is that you can not get rid of an iSCSI-client-process when your iSCSI-server dies or is unreachable (the latter should not be a problem in your setup). – Nils May 22 '12 at 19:42
3

This is not an answer... yet. This is the framework for the Generic Answer. If you have time please fill-in anything you know about. In regards to configuring specific hardware, please post a separate answer for each vendor so we can keep that information organized and separate.

QoS profile to the ports, as well as turning off storm control, setting up MTU to 9000, turning on flow control, and putting the ports into portfast

Throughput and Latency

Updated firmware, drivers, and other systems

MPIO

Jumbo Frames/MTU

As the speed of network links increases the number of packets potentially generated also increases. This yields more and more CPU/interrupt time spent generating packets which has the effect of both unduly burdening the transmitting system and taking up an excessive amount of link bandwidth with framing.

So-called "jumbo" frames are Ethernet frames that exceed the canonical 1518 byte limit. While the numbers may vary based on switch vendors, operating systems and NIC's the most typical jumbo packet sizes are 9000 and 9216 bytes (the latter being the most common). Given that roughly 6X the data can be put into a 9K frame, the number of actual packets (and interrupts) is reduced by a similar amount on the host. These gains are especially pronounced on higher speed (i.e. 10GE) links that send large volumes of data (i.e. iSCSI).

Enabling jumbo frames requires configuration of both the host and the Ethernet switch and considerable care should be taken before implementation. Several guidelines should be followed-

1.) Within a given Ethernet segment (VLAN) all hosts and routers should have the same MTU configured. A device without proper configuration will see larger frames as link errors (specifically "giants") and drop them.

2.) Within the IP protocol two hosts with differing frame sizes need some mechanism to negotiate an appropriate common frame size. For TCP this is path MTU (PMTU) discovery and relies upon the transmission of ICMP unreachable packets. Make sure that PMTU is enabled on all systems and that any ACL's or firewall rules permit these packets.

Ethernet Flow Control (802.3x)

Despite being recommended by some iSCSI vendors, simple 802.3x ethernet flow control should not be enabled in most environments unless all switch ports, NICs, and links are totally dedicated to iSCSI traffic and nothing else. If there are any other traffic on the links (such as SMB or NFS file sharing, heartbeats for clustered storage or VMware, NIC teaming control/monitoring traffic, etc.) simple 802.3x flow control should not be used as it blocks entire ports and other non-iSCSI traffic will also be blocked. The performance gains of Ethernet Flow Control are often minimal or non-existent, realistinc benchmarking should be performed on the entire OS/NIC/switch/storage combinations being considered to determine if there is any actual benefit.

The actual question from a servers perspective is: Do I stop network traffic if my NIC or Network is overrun, or do I start dropping and retransmitting packets? Turning flow-control on will allow for buffers the NIC to be emptied on the receiver side but will stress the buffers on the sender side (normally a network device will buffer here).

TCP Congestion Control (RFC 5681)

TOE (TCP/IP Offload Engines)

iSOE (iSCSI Offload Engines)

LSO (TCP Segmentation/Large Send Offload)

Network Isolation

A common best practice for iSCSI is to isolate both initiators and targets from other non-storage network traffic. This offers benefits in terms of security, manageability and, in many cases, dedication of resources to storage traffic. This isolation may take several forms:

1.) Physical isolation - all initiators have one or more NIC's dedicated solely to iSCSI traffic. This may- or may not- imply dedicated network hardware depending on the capabilities of the hardware in question and the specific security and operational requirements within a given organization.

2.) Logical isolation - Mostly found in faster (i.e. 10GE) networks, initiators have VLAN tagging (see 802.1q) configured to separate storage and non-storage traffic.

In many organizations additional mechanisms are employed to also assure that iSCSI initiators are unable to reach one another over these dedicated networks and that, further, these dedicated networks are not reachable from standard data networks. Measures used to accomplish this include standard access control lists, private VLAN's and firewalls.

Something about backplane and switching fabric here too.

QoS (802.1p)

vLAN (802.1q)

STP (RSTP, MSTP, etc)

Traffic Suppression (Storm Control, Multi/Broad-cast Control)

Security

Authentication and Security

CHAP

IPSec

LUN Mapping (Best Practices)

Chris S
  • 77,337
  • 11
  • 120
  • 212
  • Are there any tunables for RFC 5681 on any device? If not we should delete that section. – Nils May 22 '12 at 19:56
  • Would it be worth adding that jumbo frames are rarely supported for iSCSI replication (since all the intermediary WAN devices would have to support them)? – Jeremy May 24 '12 at 18:45
  • @Jeremy sure - write it up above. Even on LAN - if you forget one device on the way (or if your outsourced network team does misconfigure something) the path MTU will not support jumbo frames. – Nils May 24 '12 at 20:12
  • Agree with Jeremy. Nils, if TCP-CC is available enabling it has possible benefits and consequences, those should be outlined at least. – Chris S May 25 '12 at 02:24
1

Some consideration and research you should be taken subjectively in regards to:

1) Multi-pathing - Your SAN solution and your OS, be it hypervisor or bare metal OS may need vendor specific software for this to function properly.

2) Initiators - You need to vet out whether the software initiator is sufficient enough performance based upon the demands. Many NICs have iSCSI offloading capabilities which can significantly improve throughput, but certain older hypervisors have been known to get quite pissy with them support wise. The more mature offerings (ESXi 4.1+) seem to place nice.

3) Security/Permissions - Be sure to fully vet out which initiators require access to which LUNs... you'll be in for a bad day if an admin on one of your Windows machines does an "initialize disk" on a disk that is really in use by another server as a VMware datastore.

SpacemanSpiff
  • 8,733
  • 1
  • 23
  • 35
  • With regards to multi-pathing - actually you can achieve this through different networks, too - which is a little bit more tricky with IP than with FC-SAN (where the concept of SAN A/B with different hardware fabrics is quite common). – Nils May 18 '12 at 20:24
  • My experience with multi-pathing has been primarily equallogic, in which case the client is usually given a discovery IP address (the group IP) and then negotiates with that address for the actual target addresses. I suppose this could be done with different networks and the client would either have a path to that, or not, but discovery would go down if that subnet the group IP was on was the one that died. – SpacemanSpiff May 21 '12 at 18:54
  • I tried (native) multipathing on SLES11 on different VLANs. The tricky part was to modify the multipath-configuration, so the iSCSI-targets that went to the same physical storage were seen as the same device. – Nils May 24 '12 at 20:15