8

Looking to make use of old server lying idle as a proof-of-concept ...here are specs Dell PE 2900: Xeon 5110 - 2P, 12 GB RAM, 8x 300 GB 15K drives, Perc 5i+256 MB cache

what additional h/w would be needed on the server and hosts? 1GB ethernet card, 1GB Switch?

there are 4 ESX servers which may connect to this storage server (iSCSI or NFS)

what is software is recommended? Opensolaris? Nexenta community edition? FreeNAS?

appreciate any links to guides, tutorials.

Maruti

JMS77
  • 1,275
  • 5
  • 27
  • 44
  • Now that you've edited the question so much that it bears no resemblance to the original version perhaps you should update the title as well. – John Gardeniers Jul 10 '10 at 03:44
  • This isn't a complete answer, but OpenFiler is doing excellent iSCSI serving duties for 9 ESX hosts at our site, running on an old proliant with a couple of Gb RAM. – Chris Thorpe Jul 10 '10 at 09:34
  • could i ask what hardware is in-use? like network devices: switch etc? and what i/o speeds are you experiencing? – JMS77 Jul 10 '10 at 16:06

6 Answers6

26

For ZFS, there are number of factors which contribute to the overall cost, performance and your satisfaction with the system you've built.

SUPPORTABILITY If you need to be able to call someone when you have problems don't DIY, buy a Sun 7000 Unified Storage appliance. They're a little pricey, but you get what you pay for. High quality hardware, with recent OpenSolaris code in an appliance form...oh and Analytics to die for. It's the only way you can buy OpenSolaris support from Oracle and you've got relatively deep pockets talk to your Oracle rep it might be worth it. (it was for me at work)

SOFTWARE Since Solaris 10 doesn't have the cool cutting edge ZFS features (dedup, non-mirrored ZIL, COMSTAR iSCSI/FibreChannel target, etc) you're gonna want to run something based on the OpenSolaris bits. Since OpenSolaris itself is dead and there isn't a full distribution around Illumos yet, consider Nexenta. It's basically OpenSolaris Kernel + Debian userland (apt). Nexenta Core Platform is free for unlimited use, but if you're willing to pay for support, consider NexentaStor although I'm not a fan of $$ per TB (perpetual licenses start at $800 + $75/TB).

MIRRORED vs RAIDZ1/RAIDZ2 Basically a struggle between IOPS and capacity given the same number of drives. With big disks (1-2TB) if you decide mirroring is too expensive, definitely go with double parity (RAIDZ2) as rebuild times with Multi-TB arrays can easily be longer than a day. (More: ZFS: Mirror vs. RAID-Z). Don't forget redundancy != backups.

DRIVES I recommend you think about breaking your storage out from your server enclosure. SuperMicro make some nice cases, but inevitably you're going to want more storage than fits in your case, why not start with a decent SAS enclosure and buy another when you expand. I'd buy 7200RPM SATA drives over 10k-15k SAS drives, more or mirrored spindles will out-perform fast expensive disks with ZFS for the same $$.

Memory Buy lots of ram. 12-16GB minimum, double/triple that if you want to consider dedup.

SSDs If you're using iSCSI or NFS for virtual machine storage, definitely buy a high end device for ZIL to speed up synchronous writes (see: my answer to a previous question). Buy one/multiple decent MLC SSDs for L2ARC to act as a secondary read cache; if you're doing dedup you'll want SSDs for L2ARC big enough to fit your deduplication tables.

PROVISIONING ZFS makes thin provisioning of a filesystem as simple as creating a directory in most environments. zfs create -V 40g pool/fsname then zfs set shareiscsi=on pool/fsname and you're done. Cloning an existing system as is similarly as easy with a snapshot 'zfs snapshot pool/fsname@snapname; zfs clone pool/fsname@snapname pool/newfsname'. These operations are quick (0 - 5secs).

Update 7/10/2010 to reflect recommendations for how to use your hardware:

Since the Perc6 doesn't support passing the disks directly as just a bunch of disks (discussion), you'll have to create 8 single disk RAID 0 arrays. Use two as mirrored pair and install your root volume there. Use the remaining six as a striped set of 3 mirrored pairs (think RAID10) after 1st boot by running zpool create poolName mirror c0t0d0 c0t1d0 mirror c0t2d0 c0t3d0 mirror c0t4d0 c0t5d0 (substitute your diskid by looking at the output of the 'format' command). Note: Since the PERC may renumber if a failed disk (and thus associated RAID0 set) is missing after reboot, you should note drive serials/cXtXdX/slots and document/label accordingly. Hopefully you won't ever need it, but having that info means makes it less painful should you ever have to migrate the disks or good forbid perform recovery.

Before the Oracle acquisition I would've definitely recommended OpenSolaris over Nexenta Core Platform, but now I'd definitley lean towards Nexenta CP. They are basically the only folks continuing regular updates since OpenSolaris b134 was released in March 2010. Migrating your ZFS pool between is possible, but depends only on ZFS on disk version, which you can specify at pool creation time (discussion, see 3rd msg). I've never used FreeNAS or EON, so can't comment on them.

As for NFS vs COMSTAR iSCSI, you should test both over gigabit with jumbo frames. AFAIK, OpenSolaris/Nexenta don't support hardware TOE for NICs, but if you've got the TOE enabled NICs on the VMWare side they will reduce CPU overhead for iSCSI. You can test with direct cabled crossovers but for multi-host you'll want a Gigabit switch that supports jumbo frames (optimally a iSCSI optimized VLAN on a Layer3 switch). If you've got a Fibre card test COMSTAR Fibre Channel targets too.

To leverage hybrid storage capabilities of ZFS (HDD + SSD), I'd simulate your usage without a dedicate ZIL device and see if performance is good enough (striped/mirrored 15k SAS disks might be enough). If not, with one/multiple NON PRODUCTION VMs setup, temporarily disable the ZIL and measure performance again. If your performance is much better, then the ZIL is bottleneck for your setup and a dedicated ZIL device would be worth the money. The DDRDrive X1 ($2000, $1500 .edu) is designed for ZIL uses just a PCI-E x1 slot instead of drive bay. Alternatively you could consider replacing your mirrored boot disks with two non redundant 2.5inch SATA SSDs. A super-capacitor backed SSD dedicated for ZIL use (Vertex2Pro 32GB $435) and a decent MLC SSD (like the Intel X25-M 80GB $230) split with one small partition for root and the rest for L2ARC. More RAM is well used by ZFS ARC, but 12GB should be enough to start.

I'll leave suggestions for benchmarking tools to another question (heavily dependent on your storage->vm path, guest OSes and workload) but DTrace probes can yield a lot of useful data despite the learning curve (this is where the Sun 7000 Series Analytics shines). Two final notes, update your PERC6 firmware & BIOS before starting and if you get an SSD for L2ARC, it can take hours to get hot so don't just bench it cold.

notpeter
  • 3,505
  • 1
  • 24
  • 44
  • Sun hardware is out of budget. Want to start with experiment using free tools so that i can convince the bosses around the investment. Please suggest h/w, software and any tutorials – JMS77 Jul 10 '10 at 03:02
  • 1
    Extremely comprehensive and well-written, +1 – pauska Jul 11 '10 at 00:55
  • could not afford GB Switch, is it good to start with 100Mbps switch and 2 hosts? what iscsi speeds could be expected? 50Mbps, 100Mbps? – JMS77 Jul 14 '10 at 09:08
  • You're gonna want gigabit. 100mbit will limit you to 12MB/sec max throughput. You can get a cheap unmanaged switch that'll do Gigabit + Jumbo frames for <$200. – notpeter Jul 14 '10 at 19:04
  • "more of mirrored 7.2K SATA outperform SAS 15K" what is the critical no of SATA drives? Mirror RAID1 is good for reads but bad for write perf? are you suggesting RAID10? – JMS77 Jul 15 '10 at 10:10
  • Mirrored with >2 disks, think RAID10. Build your pool as a stripe of three mirrored pairs. Read/Write 'Performance' is 3 figures: throughput, latency and IOPS. Yes, a 6 disk mirror may offer lesser write throughput than a raidz/raidz2 set of the same disks, but it comes with the benefit of 2x read IOPS, 3x write IOPS, higher read throughput, and lower read latency under load all with lower CPU overhead. All likely more important than a little better write throughput. IMHO: SSDs + mirrored 7.2k disks will almost always have better performance & capacity than the same money spent on 15k disks. – notpeter Jul 15 '10 at 14:10
8

Try this Recipe from SUN... ahh.. Oracle:

http://developers.sun.com/openstorage/articles/opensolaris_storage_server.html

lepole
  • 1,723
  • 1
  • 10
  • 17
4

No need for RAID hardware...raid is essential ;)
Supermicro had a nice 8 port sata card (no raid) that was well supported by solaris.

About DIY, give a look here:
http://www.greenm3.com/2009/10/opensolaris-green-home-server-low-power-and-small.html

PiL
  • 1,591
  • 8
  • 6
4

Install a recent development build of OpenSolaris (b134).

If you want performance, create 4 mirrored vdev's with those eight disks you have.

For even better performance, use two mirrored SLC SSDs as a log device and an additional SSD as cache.

Giovanni Tirloni
  • 5,693
  • 3
  • 24
  • 49
  • Opensolaris dev build: what does "ai" stand for? – JMS77 Jul 11 '10 at 17:36
  • "AI" stands for "Automated Installer" which is a system similar to Jumpstart. It's experimental at this point so, unless you want to test it or do automated network install, you can stick to the non-AI ISOs. – Giovanni Tirloni Jul 12 '10 at 14:21
2

Waiting for dedup to show up on FreeNAS' ZFS... The RAM requirements may become more reasonable when that occurs.

user48838
  • 7,393
  • 2
  • 17
  • 14
  • I imagine memory requirements will be similar. Deduplication tables must be accessible quickly, if not main memory (ARC) then some form of fast secondary cache (L2Arc) like an SSD. Of course, if you have a smaller filesystem with large average block size, your dedup tables not be that large to begin with, but I doubt the FreeNAS implementation will impact dedup cache requirements. – notpeter Jul 26 '10 at 03:26
  • Just holding out hope that it might be less memory and processor-type demanding with a little better/broader hardware support than OpenSolaris. – user48838 Jul 26 '10 at 06:03
0

As complement to other answers about FreeNAS, FreeBSD and the lasts versions of ZFS (I can't comment in the thread now).

FreeBSD 9 (beta1 now) will support ZFS v28 http://hub.opensolaris.org/bin/view/Community+Group+zfs/28.

Rufo El Magufo
  • 321
  • 2
  • 12