18

My group currently has two largish storage servers, both NAS running debian linux. The first is an all-in-one 24-disk (SATA) server that is several years old. We have two hardware RAIDS set up on it with LVM over those. The second server is 64 disks divided over 4 enclosures, each a hardware RAID 6, connected via external SAS. We use XFS with LVM over that to create 100TB useable storage. All of this works pretty well, but we are outgrowing these systems. Having build two such servers and still growing, we want to build something that allows us more flexibility in terms of future growth, backup options, that behaves better under disk failure (checking the larger filesystem can take a day or more), and can stand up in a heavily concurrent environment (think small computer cluster). We do not have system administration support, so we administer all of this ourselves (we are a genomics lab).

So, what we seek is a relatively low-cost, acceptable performance storage solution that will allow future growth and flexible configuration (think ZFS with different pools having different operating characteristics). We are probably outside the realm of a single NAS. We have been thinking about a combination of ZFS (on openindiana, for example) or btrfs per server with glusterfs running on top of that if we do it ourselves. What we are weighing that against is simply biting the bullet and investing in Isilon or 3Par storage solutions.

Any suggestions or experiences are appreciated.

seandavi
  • 283
  • 2
  • 6

3 Answers3

16

I hope this is gonna help a little. I tried to not let it turn into a full wall of text. :)

3Par/Isilon

If you can and will dedicate a fixed amount of man-hours for someone who takes the SAN admin role and wanna enjoy a painless life with night-sleep instead of night-work then this is the way I'd go.

A SAN lets you do all the stuff where a single "storage" would limit you (i.e. connect a purestorage flash array and a big 3par sata monster to the same server), but you also have to pay for it and keep it well maintained all the time if you wanna make use of the flexibility.

Alternatives

Amplidata

Pros: Scale out, cheap, designed with a nice concept and dedicated read/write cache layers. This might actually be the best thing for you.

RisingTideOS

Their target software is used in almost all linux storages now and it's allowing for a little better management than plain linux / gluster stuff could. (Imho) The commercial version might be worth a look.

Gluster/btrfs

PRO: Scales out and "Bricks" give you an abstraction layer that is very good for management.

CON: The first has been a total PITA for me. It was not robust, and failures could be either local to one brick or take out everything. Now, with RedHat in control it might actually turn into something working and i've even met people who can tame it so that it works for years. And the second is still half-experimental. Normally a FS needs 3-4 years after it's "done" till it's proven and robust. If you care for the data, why would you ever consider this? Talking of experimental, Ceph commercial support is almost out now, but you'd need to stick to the "RBD" layer, the FS is just not well-tested enough yet. I wanna make it clear though that Ceph is much more attractive in the long run. :)

ZFS

Pro: Features that definitely put a nail in other stuff's coffin. Those features are well-designed (think L2ARC) and compression/dedup is fun. Have more "storage clusters" meaning having also just small failures instead of one large consolidated boom

Con: Maintaining many small software boxes instead of a real storage. Need to integrate them and spend $$$ hours to have a robust setup.

Kyle Smith
  • 9,563
  • 1
  • 30
  • 32
Florian Heigl
  • 1,440
  • 12
  • 19
  • 3
    +1. I hope you don't mind that I made it a bit less wall-y. – Kyle Smith Mar 25 '12 at 14:01
  • @florian-heigl Could we have a few links to follow as I am having no luck finding some of solutions you mention (e.g. 3Par, Isilon, RisingTideOS). TIA. – ossandcad Mar 28 '12 at 15:45
7

The XFS + LVM route is indeed one of the best options for a scaled out pure-Linux storage solution in the past few years. I'm encouraged you're there already. Now that you need to grow more, you do have a few more options available to you.

As you know, the big hardware vendors out there do have NAS-heads for their storage. This would indeed give you a single vendor to work with to make it all happen, and it would work pretty well. They're easy solutions to get in (compared to DIY), and their maintainability is lower. But, they cost quite a lot. On the one hand you'll have more engineering resources for solving your main problems rather than infrastructure problems; on the other hand, if you're like most University departments I've known man-power is really cheap relative to paying cash for things.

Going the DIY route you already have a good appreciation of the DIY options available to you. ZFS/BTRFS are the obvious upgrade path from XFS + LVM for scaled out storage. I'd steer clear of BTRFS until it gets declared 'stable' in the Linux mainline kernel, which should be pretty soon now that several of the major free distros are using it as the default filesystem. For ZFS, I'd recommend using a BSD base rather than OpenIndiana simply because it's been around longer and has the kinks (more) worked out.

Gluster was designed for the use-case you describe here. It can do replication as well as present a single virtual server with lots of storage attached. Their Distributed Volumes sound exactly what you're looking for, since they spread the files over all the storage-servers on the declared volume. You can continue to add discrete storage servers to continue to expand the visible volume. Single name-space!

The gotcha with Gluster is that it works best when your clients can use the Gluster Client to access the system rather than the CIFS or NFS options. Since you're running a small cluster-compute cluster, you may just be able to utilize the GlusterFS client.

You're on the right track here.

sysadmin1138
  • 131,083
  • 18
  • 173
  • 296
  • A do it yourself solution will mean that if you break it yourself, you have to fix it yourself. This becomes expensive as you grow past the limits of a couple of servers. If there's any kind of business pressure to make this storage highly available, you'll spend less money buying a wheel than re-inventing one yourself. Storage software running on servers can be made to do anything real storage can do, but not more cheaply. – Basil Mar 26 '12 at 16:53
1

As far as i understand you could use a SAN solution based on the Linux SCST + FibreChannel or infiniband, Which is something im building right now. As a base for the LUNs you could use LVM on top of hardware RAIDs and take care of snapshots/replication (take DRBD as an example) below the file system level. As a filesystem I'm not aware of any good solution for concurency as I'm putting ESXi on top of the nodes, so the datastores are managed by ESX concurrent FS. I think GFS2 might work with that environment but im not 100% sure, as you should check your precise requirements. Anyway once you have a robust SAN underneath your nodes, it's pretty easy to get things done.

Martino Dino
  • 1,145
  • 1
  • 10
  • 17