4

I now need to start planning the replacement of our main ESX cluster. Implementation will be around December time but it suddenly doesn't seem that far away.

Right now I have a pair of ESX hosts, single quad core Dell PE2950's with 24gb of RAM each with dual FC HBA's going through a pair of switches into a Dell/EMC AX4.

I have around 17tb raw storage on it right now and due to the fairly basic disk pool/virtual disk way the AX4 works there is some wastage, but as with any business we're only going to keep needing more storage.

We have a range of VM's ranging from our main file server, Microsoft SQL Server, Exchange Server as well as lots of smaller VM's for specific roles, WSUS, Antivirus, Print etc.

We're a big site with fiber everywhere, and our immediate "offsite" location is another building a couple of miles away with a 10gbps fibre link between the two.

Where I'd like to end up is with "smart" SAN level snapshots and replication between units at both ends of the fiber of either everything, or selected LUNs.

I don't need instant failover, what I would like is simply to be in a position where if one room gets wiped out I can get stuff up and running (not necessarily all of it) in the other location in a reasonable (not SLA'd) amount of time.

I'd really appreciate suggestions on what to look at replacing the main cluster/SAN with.

Right now my main contenders are Equallogic and the HP Lefthand P4000.


I would have added a comment but it didn't seem to let me type enough, so...

We use Exchange and SQL but the usage is pretty low. Currently we're on Exchange 2003 but in a few months I hope for us to be on Exchange 2010 so the storage IO requirements should drop quite a bit.

Right now we have a mix in the AX4 of 7.2k SATA and 10k and 15k SAS. The AX$ is our first SAN and our first exercise in using ESX and in all honesty I suspect I went a bit overboard on the disk specs.

Our busiest period is our backup window and I've been doing some measurements, admittedly rough, and it seems we see an average of around 1400 IOPS - as you say the main limitation there may be the NIC on our file server which is a 1gbps vNIC (the file server is a VM).

I hadn't thought to look in the switch GUI for performance metrics but I shall see what I can find (they are Brocade 200E's not rebadged or anything).

I do need to do some digging is into how the different products MPIO drivers work. My understanding with EQL is that it will open multiple connections even to the same iSCSI LUN - not sure if the LeftHand can do that or if "1 LUN = 1gbps maximum throughput"?

When the time does come (around December) we'll obviously be going with the lastest stable/supported vSphere release.

As for 10GigE I really do like the sound of that, however by the time you factor in redundancy I can't help but think it'll get really damned expensive and part of the issue here is that whilst we aren't trying to be cheap, I do have a limit on what I can spend.

sysadmin1138
  • 131,083
  • 18
  • 173
  • 296
flooble
  • 2,364
  • 7
  • 28
  • 32
  • I'd recommend the LeftHand unit, but we're an HP shop too. They're willing to give some pretty hefty discounts if you tell them you're currently a Dell shop and considering switching some of your gear. I would imagine Equallogic would compete similarly, but I don't have much experience with them though. – Chris S Jul 18 '10 at 14:55
  • Yeah I think part of the problem is before getting down to bottom line pricing, how to best find out about the pros and cons of the products. Dell seem very keen for our business, with HP I've found resellers to be hard work and that there's little knowledge of the LeftHand kit - quite annoying when you seem to know more than the salesperson you're speaking to because at least you've actually downloaded and played with it whilst they seem to be stuttering their way through. – flooble Jul 18 '10 at 16:10
  • That depends on the HP VAR. We're also an HP shop, and when we went to buy some really cheap storage, we got the full court press for LeftHand. They really knew what it was about, and clearly understood it's niche in the market place. It was about 40% more than we were targeting which is why we didn't use it, but it's a contender the next time we upgrade our middle tier of storage. – sysadmin1138 Jul 18 '10 at 16:15
  • Interested how people see the pitch difference between LeftHand and Equallogic? Both seem similar i.e. full virtualization of storage, both sell by the "node", both offer all the licensing/functionality out the box. Also I'm wondering if I'm cutting my nose off by not looking at the likes of Netapp and Compellent in too much depth? I can't help think there's a huge benefit to having a "solution" from one vendor who is responsible for the lot. I don't want the SAN vendor blaming the HBA vendor who blames the switch vendor etc. – flooble Jul 18 '10 at 16:20
  • One thing that might effect your decision that we overlooked, VMware don't currently support Microsoft Clustering Services on iSCSI, so if you are or considering Windows/SQL Clustering, then you probably want to go for something else. – SteveBurkett Jul 19 '10 at 10:15

1 Answers1

1

Putting Exchange and MSSQL into ESX means you've got some serious storage users in your cluster and whatever the storage is needs to keep up. You're using the AX4 with fibre, and is obviously keeping up, but you don't mention what your drives are (SAS, SATA, 7.2K RPM, 15K RPM) or how many you have.

One step I'd take a close look at is the peak transfer rate from the storage device. The FibreChannel switches should have the ability. Last time I looked the Dell FC Switches were rebagged Brocade units, and I know that Brocade has a 'performance monitor' in their java-based GUI. If your peak IO (which could be during backups) is under 1Gb, then an iSCSI based system is juuuust fine. If it does peak over 1Gb, then you'll need to take care that your biggest I/O generators are physically housed in different iSCSI units. FC can peddle as fast as 4Gb, where GigE is 1Gb.

Knowing how much storage I/O your ESX nodes are generating is key for finding a best fit Ethernet-based solution. If peak I/O is over 1Gb, then you may need to use NIC-bonding to maintain throughput, but keep in mind that the max bandwidth between IP pairs is still GigE with NIC bonding. 10GigE can fix this, but I'm pretty sure those cards don't yet exist in server-space, and I'm very sure they don't yet exist for LeftHand (though it is coming!). Make sure your big I/O consumers (probably the database and Exchange during backups) are on different nodes talking to different Equilogic/LeftHand nodes, and you should be fine.

As for Equilogic vs LeftHand, when we had that full court press 12 months ago, LeftHand was a really compelling prospect. We'd had the Equilogic presentation about a week before the merger was announced, so we were pretty familiar with both product lines. The LeftHand product was technically superior, and handled the virtual storage network very well we thought. The idea of built in replication (which both can do) was enough to make us grin in anticipated delight.

sysadmin1138
  • 131,083
  • 18
  • 173
  • 296