7

I'll try as hard as I can to word this so it is not considered a shopping list.

We have been successfully running a dev/test ESXi environment for some time, with a couple of Dell PE2950III servers over an HP MSA2012fc Starter Kit (with the Brocade-based HP Class B SAN switch). This has worked very well for us, but being in dev/test, it comes with various caveats with regards to uptime/performance.

In any case, the perceived success of the dev/test platform has led to calls for a more 'production-ready' virtualisation platform. We are drafting the recommendations at the moment.

However, one of the complaints levelled at the existing stack is a lack of support for other virtualisation technologies (HyperV, Xen, etc), as the SAN LUNs are fully-allocated and formatted as VMFS. This is something that we have been told to overcome but, as is typical, there is no indication of the uptake of HyperV/Xen (and we don't particularly want to waste the 'expensive' storage resource by allocating LUNs to such where it won't be used).

As such, our current line of thinking is to forego the traditional fibre SAN, in favour of a straight-forward CentOS box (probably higher-end HP ProLiant DL380p Gen8), running NFS and Samba/CIFS daemons, with a 10GbE switch (probably Cisco Nexus 5000/5500-series).

The reasoning is and that the ESXi heads could talk NFS and the HyperV heads could talk CIFS, but both ultimately be pointing to the same XFS/RAID1+0 volumes.

Now, I'm not green enough to think that 10GbE is going to allow me to get true 10 gigabits of I/O throughput between the heads and the disks, but I don't know the kinds of overheads I can expect to see from the NFS and CIFS implementations (and any other bits that might interfere when more than one host tries to talk to it).

I am hoping to at least get near to the sustained disk read/write speeds of direct-attached disks, though, for as many hosts as I can. Looking at various drive manufacturer websites, I'm roughly anticipating this to be somewhere around the 140-160MB/s mark (if I am way off, please let me know).

What recommendations/guidelines/further reading can anyone offer with regards to Linux/NFS/Samba or 10GbE switch configuration that might help attain this?

ewwhite
  • 194,921
  • 91
  • 434
  • 799
jimbobmcgee
  • 2,645
  • 4
  • 24
  • 40
  • You're going down the right line with the NFS side of things. I need to know how many spindles you're looking at. – Tom O'Connor Feb 11 '13 at 23:37
  • 1
    I'll put some numbers and examples together.... – ewwhite Feb 11 '13 at 23:38
  • Also, can you explain the shortcomings of the MSA/P2000 SAN in your situation? – ewwhite Feb 11 '13 at 23:56
  • @TomO'Connor - thoughts were *either* to start with (HP-specific) a DL380p with 8 disks (on the P420i internal RAID), expand when necessary to 8 more (via a P420 card + drive cage) and then out to 25-disk D2000 arrays (via P421 or P800 cards) *or* to start with the CTO 25-disk HP DL380p case (P420i + SAS expander) and expand out to D2000 arrays (by P421 / P800) – jimbobmcgee Feb 11 '13 at 23:56
  • @ewwhite - been very happy with our MSA and P2000 models, so far. Desire to move to NAS is driven by the facts that I can't figure a way to extend it to HyperV *without* a dedicated LUN (and that we don't *want* to have a whole LUN assigned for what will likely be one or two HyperV VMs), nor can I use the SAN switch to 'talk' IP. Open to other ideas, if that's not what you meant... – jimbobmcgee Feb 12 '13 at 00:02
  • You'll never get 140MB/s out of 8 disks. 25 maybe, but never 8. – Tom O'Connor Feb 12 '13 at 00:07
  • @jimbobmcgee See below. I'd build a ZFS box on the same hardware and use a specific type of SSD for write caching and leverage another for heavy read caching. This will definitely outperform the MSA unit. – ewwhite Feb 12 '13 at 00:15
  • @TomO'Connor - are you basing this on figures or experience (either/both is welcome, references for the former would be nice)? Is my 140MB/s estimate for direct-attached wildly off the mark? Or is it more a case of latency in the network? – jimbobmcgee Feb 12 '13 at 00:21
  • @jimbobmcgee I can give you real numbers from an existing installation. What do you need to know? – ewwhite Feb 12 '13 at 00:25
  • @jimbobmcgee It's all based on this attempt at a previous job to tune a Bluearc array and tape library documented here http://tomoconnor.eu/blogish/my-battle-commvault/#.URoB01rC-TY – Tom O'Connor Feb 12 '13 at 08:49

3 Answers3

9

I understand the desire to move away from pure block storage to something more flexible.

However, I would avoid using a straight-up Linux storage stack for this when several storage appliance software offerings are available right now. A Linux approach could work, but the lack of management features/support, the XFS tuning needed (here and here) and the fact that it's not a purpose-built storage OS are downsides.

Add to that, some nagging issues with the XFS/RHEL code maintainer and a nasty kernel bug that's impacting system load average, and the Linux combination you describe becomes less-appealing.

A pure Linux could be made to work well for this purpose, but the setup would certainly be outside of the norm and may use esoteric solutions like ZFS on Linux or the not-so-ready-for-primetime Btrfs. More details on those later.

I do this often, opting to go with NFS on ZFS-based storage for most of my VMware deployments versus an entry-level SAN like the HP P2000 array. I augment the ZFS installation with L2ARC (read) and ZIL (write) SSD and DRAM cache devices. In addition, I've been using 10GbE with this type of setup for four years.

I'll focus on NexentaStor for the moment, as that's the appliance software I use most of the time...

I've build numerous HP ProLiant-based systems for ZFS storage, from all-in-one VMware hosts to standalone DL380 storage "appliances" to full-on multi-path SAS connections to cascaded storage JBODs units (front and rear).

NexentaStor and NFS/CIFS.

Nexenta supports the presentation of file AND block storage to external systems. I can take a pool of 24 disks and provide iSCSI storage to hosts that need native block storage, NFS to my VMware ESXi infrastructure and CIFS to a handful of Windows clients. The space is used efficiently and is carved out of the pool's storage. E.g. no artificial caps. Compression is transparent and helps tremendously in VM scenarios (less to move over the wire).

10GbE helps but it depends on what you're presenting to your virtualization hosts. Will they be 1GbE or 10GbE as well?

Benchmarks:

I'll run a quick test of a guest virtual machine running on an ESXi host connected via 10GbE to a NexentaStor SAN.

This is going to a 6-disk array. (in an HP D2600 enclosure - 600GB 15k SAS)

[root@Test_VM /data]# iozone -t1 -i0 -i1 -i2 -r1m -s6g 
        Iozone: Performance Test of File I/O

        Run began: Mon Feb 11 18:25:14 2013
        Record Size 1024 KB
        File size set to 6291456 KB
        Command line used: iozone -t1 -i0 -i1 -i2 -r1m -s6g
        Output is in Kbytes/sec

        Children see throughput for  1 initial writers  =  128225.65 KB/sec
        Children see throughput for  1 readers          =  343696.31 KB/sec 
        Children see throughput for 1 random readers    =  239020.91 KB/sec
        Children see throughput for 1 random writers    =  160520.39 KB/sec

This is going to a busy 16-disk array (in an HP D2700 enclosure - 300GB 10k SAS).

[root@Test_VM2 /data]# iozone -t1 -i0 -i1 -i2  -r1m -s4g
        Iozone: Performance Test of File I/O

        Run began: Mon Feb 11 16:33:53 2013
        Record Size 1024 KB
        File size set to 4194304 KB
        Command line used: iozone -t1 -i0 -i1 -i2 -r1m -s4g
        Output is in Kbytes/sec

        Children see throughput for  1 initial writers  =  172846.52 KB/sec
        Children see throughput for  1 readers          =  366484.00 KB/sec
        Children see throughput for 1 random readers    =  261205.91 KB/sec
        Children see throughput for 1 random writers    =  152305.39 KB/sec

The I/O graphs from the same run... Kilobytes/second and IOPS measures.

enter image description here

ewwhite
  • 194,921
  • 91
  • 434
  • 799
  • Some detailed stuff here -- much appreciated. It looks like (at least in your setup) you are getting close to what I guessed/hoped I *might* be able to get. I *will* need to spend a little time deciphering what the output of your tools are telling us, though (e.g. initials, rewriters, etc are new to me)... – jimbobmcgee Feb 12 '13 at 00:28
  • I'm still editing... will clean it ip... – ewwhite Feb 12 '13 at 00:32
  • In the meantime, I will add here that I'm not *wedded* to Linux, although I did briefly look at OpenFiler (at least so I wasn't *completely* rolling my own). I can't see much in the way of pricing on the Nexenta site, though (which usually means 'too expensive for me!') -- cost is definitely a factor as far as my lot are concerned. – jimbobmcgee Feb 12 '13 at 00:35
  • And I'm not *wedded* to XFS, either -- it just looked a good fit for large data files. ZFS was my other option (I've anecdotally heard its the best for this kind of thing), but I was a little put off by the out-of-box support in Linux (or lack thereof) and marginally worried by the Oracle 'ownership'. I just figured that either would *surely* be more appropriate than EXT or NTFS (even though my experience is with those). – jimbobmcgee Feb 12 '13 at 00:40
  • Would be interested to understand how you actually use the SSDs. Outside of the flash-backed caches that come with the RAID cards, I've not dealt with storage caching... – jimbobmcgee Feb 12 '13 at 00:42
  • NexentaStor has a free Community Edition for up-to 18TB usable space. Otherwise, it's based on RAW capacity in 8TB chunks. My own systems are the commercial edition, but I've deployed plenty of the Community Edition. – ewwhite Feb 12 '13 at 00:42
  • @jimbobmcgee This would be best for chat in the [**Server Fault Comms Room**](http://chat.stackexchange.com/rooms/127/the-comms-room). I could answer specific questions there. – ewwhite Feb 12 '13 at 00:45
  • let us [continue this discussion in chat](http://chat.stackexchange.com/rooms/7479/discussion-between-jimbobmcgee-and-ewwhite) – jimbobmcgee Feb 12 '13 at 00:48
3

Using a Linux host providing CIFS storage for Hyper-V hosts is not reasonable, and definitely not supported by Microsoft. When you're talking something as important as virtualization for business critical infrastructure, you definitely want to have vendor support.

You will either need to provide more traditional iSCSI or Fibre Channel storage to your Hyper-V servers, or if you plan on running Windows 2012 you could use Windows 2012 storage services to provide iSCSI to your hosts.

Another possibility is running Windows 2012 or something like Nexenta as a virtual guest in your VMWare infrasucture to provide iSCSI for your Hyper-V guests. It's not the most performant configuration, but its also not bad. Since your Hyper-V footprint is small to nonexistent this could be a good compromise for maximum flexibility without dedicated a LUN.

Otherwise you'll need to go with something that completely virtualized your LUNs like an HP LeftHand SAN. With LeftHand, disks are not dedicated to a LUN. Instead all LUNs are striped across all disks. It sounds a bit strange but its a good product.

longneck
  • 22,793
  • 4
  • 50
  • 84
  • Absolutely acknowledged and appreciated. As far as I know, the HyperV thing is not definite (or even likely), but the powers that be don't like to be told that we *can't* do something (you know how it is). I fully expect any HyperV offering based on this to be a second level of 'test' rather than real honest-to-$DEITY 'production' and will caveat any proposal that centres around HyperV with the non-availability of formal support (not that that's stopped them before -- after all, that's what *I'm* for!!) – jimbobmcgee Feb 12 '13 at 00:11
  • See my edit with additional suggestions. – longneck Feb 12 '13 at 00:56
2

It's probably partly my background and experience talking here, but I wouldn't recommend a home-brew server solution for storage for virtual machines in what could be considered a "production" or "enterprise" environment.

I'd be looking at the mainstream storage vendors who could provide a SAN solution, but with a pair of high-availability NAS heads to export the underlying filesystem as NFS/CIFS in a supported, certifiable manner.

Tom O'Connor
  • 27,440
  • 10
  • 72
  • 148
  • Are you including HP's supposedly 'Enterprise'-grade server/disk offerings in your definition of 'home-brew'? I'm not intending to cobble together bits into a unbranded case and call it a NAS, but appreciate that I'm not going all EMC on this, either... – jimbobmcgee Feb 12 '13 at 00:15
  • And would I not *still* have the issue of getting the VM hosts to talk to the NAS 'heads' in an expedient fashion? – jimbobmcgee Feb 12 '13 at 00:16
  • If it doesn't come as an appliance running some storage server, then it's just a box full of disks running linux. And it's all down to you to fix. – Tom O'Connor Feb 12 '13 at 08:50