11

Currently we use an iSCSI SAN as storage for several VMware ESXi servers. I am investigating the use of an NFS target on a Linux server for additional virtual machines. I am also open to the idea of using an alternative operating system (like OpenSolaris) if it will provide significant advantages.

What Linux-based filesystem favours very large contiguous files (like VMware's disk images)? Alternatively, how have people found ZFS on OpenSolaris for this kind of workload?

(This question was originally asked on SuperUser; feel free to migrate answers here if you know how).

mlambie
  • 1,201
  • 2
  • 16
  • 22

5 Answers5

13

I'd really recommend you take a look at ZFS, but to get decent performance, you're going to need to pick up a dedicated device as a ZFS Intent Log (ZIL). Basically this is a small device (a few GB) that can write extremely fast (20-100K IOPS) which lets ZFS immediately confirm that writes have been synced to storage, but wait up to 30secs to actually commit the writes to the hard disks in your pool. In the event of crash/outage any uncommitted transaction in the ZIL are replayed upon mount. As a result, in addition to a UPS you may want a drive with an internal power supply/super-capacitor so that any pending IOs make it to permanent storage in the event of a power loss. If you opt against a dedicated ZIL device, writes can can have high latency leading to all sorts of problems. Assuming you're not interested in Sun's 18GB write optimized SSD "Logzilla" at ~$8200, some cheaper alternatives exist:

  • DDRDrive X1 - 4GB DDR2 + 4GB SLC Flash in a PCIe x1 card designed explicitly for ZIL use. Writes go to RAM; in the event of power loss, it syncs RAM to NAND in <60sec powered by a supercapacitor. (50k-300k IOPS; $2000 Direct, $1500 for .edu)
  • Intel X25-E 32GB 2.5inch SSD (SLC, but no super cap, 3300 write IOPS); [$390 @ Amazon][11]
  • OCZ Vertex 2 Pro 40GB 2.5inch SSD (supercap, but MLC, 20k-50k write IOPS); $435 @ Amazon.

Once you've got OpenSolaris/Nexenta + ZFS setup there are quite a few ways to move blocks between your OpenSolaris and ESX boxen; what's right for you heavily depends on your existing infrastructure (L3 switches, Fibre cards) and your priorities (redundancy, latency, speed, cost). But since you don't need specialized licenses to unlock iSCSI/FC/NFS functionality you can evaluate anything you've got hardware for and pick your favorite:

  • iSCSI Targets (CPU overhead; no TOE support in OpenSolaris)
  • Fibre Channel Targets (Fibre Cards ain't cheap)
  • NFS (VMWare + NFS can be finicky, limited to 32 mounts)

If you can't spend $500 for evaluation, test with and without ZIL disabled to see if the ZIL is a bottleneck. (It probably is). Don't do this in production. Don't mess with ZFS deduplication just yet unless you also have lots of ram and an SSD for L2ARC. It's definitely nice once you get it setup, but you definitely try to do some NFS Tuning before playing with dedup. Once you get it saturating a 1-2 Gb links there are growth opportunities in 8gb FC, 10gigE and infiniband, but each require a significant investment even for evaluation.

notpeter
  • 3,505
  • 1
  • 24
  • 44
2

I wouldn't do exactly this. In my experience, Linux (specifically CentOS 3/4/5) is a generally poor choice for a NFS server. I have had several and found that under load, latency and throughput tend to drop for reasons we could never quite get our heads around.

In our cases, we were comparing back-to-back Linux's performance to Solaris (on Ultra-SPARC) and NetApp; both of which returned results in terms of apples-to-apples performance and in nebulous terms of "engineers not complaining nearly as much about latency when the server was under load". There were multiple attempts to tune the Linux NFS server; both the NetApps and Solaris systems ran as-is out of the box. And since both the Solaris and NetApp systems involved were older, the Linux servers could be argued to have had every advantage and still failed to be convincing.

If you have the time, it would be a worth while experiment to set up the same hardware with OpenSolaris (now that Solaris is effectively too expensive to use), Linux, and perhaps a BSD variant or two, and race them. If you can come up with some performance metrics (disk I/O counts in a VM hosted off the store, for example) it might make for an interesting white paper or internet article. (If you have the time.)

Regarding NFS in general, the NetApp people told me several times that their benchmarks showed NFS only had a cost 5 to 10% in performance for VMs -- and if your application was sensitive enough that this was a problem, you shouldn't be virtualizing it in the first place.

But I should confess that after all that time and tears, our non-local production VM stores are all fed by iSCSI, mostly from NetApp.

David Mackintosh
  • 14,223
  • 6
  • 46
  • 77
  • I think it's NetApp that started out with NFS, then bolted on iSCSI support later, hence their products always see 'best case' NFS performance vs 'worst case' iSCSI... Seconding avoiding NFS though - You can use iSCSI on Linux and that's a better choice IMO. – Chris Thorpe May 24 '10 at 03:01
2

We're using OpenSolaris 2009/06 with a RAID 10 ZFS config to provide NFS to our VMWare ESXi server. It works fairly well for our needs so far. We are using SATA Raid type drives (Seagate ES.2 1TB drives). We still have some tuning to do however.

tegbains
  • 1,956
  • 12
  • 27
2

I am a big fan of NFS datastores for VMware, NetApp has an excellent implementation.

TR-3808 compares the scaling of NetApp FC, iSCSI, and NFS connected shared datastores, which is an excellent read.

MDMarra
  • 100,183
  • 32
  • 195
  • 326
chamdor
  • 76
  • 1
-2

You might want to consider the 3+ years bug with ZFS ARC that still persists before jumping in too deep with ZFS...

http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6522017

(This one is nasty as it will also go out-of-bounds from the VM limits of a hypervisor!)

user48838
  • 7,393
  • 2
  • 17
  • 14
  • You've copied/pasted this same "answer" to at least two different Nexenta-related questions. While this is a serious bug, one would only run into it in a very rare set of circumstances. As such, your actions seem a bit excessive. The benefits of running ZFS *far* outweigh the very slim chance that you'll hit this bug. – EEAA Jul 26 '10 at 02:12
  • Okay, make that 8 separate questions you've pasted this same answer into. – EEAA Jul 26 '10 at 02:23
  • They are related, but that is your opinion. I agree with the benefits, but impact of this outstanding/ongoing bug is significant as it will bring the entire OS to a grinding halt - no benefits then when you can not reliably access the stored data. – user48838 Jul 26 '10 at 02:37
  • For the folks that truly want to rate this fairly for the overall usefulness of this forum/format, please read through the comment on the following first: http://serverfault.com/questions/162693/my-opensolaris-server-hangs-when-writing-large-files-after-upgrading-zpool/163955#163955 – user48838 Jul 26 '10 at 17:53
  • ErikA will NOT identify his ZFS rig, so the comments made by this person of the situation identified in the referenced question occurring under "very rare set of circumstances" can not be substantiated by this person... The choice to ignore requests for identifying the basis of their statement/position are on those comments too. – user48838 Jul 27 '10 at 12:48