3

We are planning to build NAS solution which will be primarily used via NFS and CIFS and workloads ranging from various archival application to more “real-time processing”. The NAS will not be used as a block storage for virtual machines, so the access really will always be file oriented.

We are considering primarily two designs and I’d like to kindly ask for any thoughts, views, insights, experiences.

Both designs utilize “distributed storage software at some level”. Both designs would be built from commodity servers and should scale as we grow. Both designs involve virtualization for instantiating "access virtual machines" which will be serving the NFS and CIFS protocol - so in this sense the access layer is decoupled from the data layer itself.

First design is based on a distributed filesystem like Gluster or CephFS. We would deploy this software on those commodity servers and mount the resultant filesystem on the “access virtual machines” and they would be serving the mounted filesystem via NFS/CIFS.

Second design is based on distributed block storage using CEPH. So we would build distributed block storage on those commodity servers, and then, via virtualization (like OpenStack Cinder) we would allocate the block storage into the access VM. Inside the access VM we would deploy ZFS which would aggregate block storage into a single filesystem. And this filesystem would be served via NFS/CIFS from the very same VM.

Any advises and insights highly appreciated. I should also say that we are internally inclined towards the "monster VM" approach due to seemingly simpler architecture (data distribution on block layer rather than on file system layer).

Cheers, Prema

prema
  • 33
  • 1
  • 3
  • 1
    If you need only file storage, why do you consider block storage as a possible solution? It sounds much more complex. – Michael Hampton Nov 12 '18 at 13:41
  • Well, it sounded easier to build distributed block layer and then just keep adding block volumes to a single VM with normal non-distributed filesystem. I mean - it sounded that distributed filesystem is more difficult and more fragile paradigm than disitributed block storage. We don't have enough experience with either of it so it is just our impression. – prema Nov 12 '18 at 14:04
  • How large of a system? Attaching 100 TB to a single not-distributed host and exporting it is possible, and simpler. Ceph is great for petabyte and exabyte scale systems, its complexity may not be worth it for mere tens of TB of NFS shares. – John Mahowald Nov 13 '18 at 05:06
  • John, what if we want a single share to gradually grow to up to 1PB? Do you think doing this with non-distributed host would be reasonable? The growth would be by means of attaching additional block volumes over the time and expanding the filesystem. – prema Nov 13 '18 at 09:24
  • A key element of such a design is how much storage do you plan to use. scale-out systems like Ceph and Gluster are ideal if you plan to scale the storage, but add complexity that is not needed if you just need a fixed amount of storage. Also, using software-defined storage for less than 100TB is kind of an overkill. – 0xF2 Nov 18 '18 at 04:03

1 Answers1

2

First design:

Gluster + (NFS OR GaneshaNFS) in cluster

No access VM. In this case Gluster has simpler architecture than CephFS. Gluster has some rules regarding adding nodes and capacity. It's ok, just plan for it from start.

Second design:

If your goal is to have single access VM to provide NFS/CIFS, Linux can mount Ceph as block device. So you have stack like this:

NFS/CIFS in Linux -- Ceph RBD

If you require HA for access VM, then add HA cluster:

NFS/CIFS in Linux HA cluster -- Ceph RBD

Or, instead of Ceph RBD you can use Ceph iSCSI gateway.

Things to consider:

  1. scaling up
  2. data protection: 2 or 3 copies, erasure/sharding
  3. for decent performance use enterprise SATA and SSD disks
  4. online/offline upgrade
  5. other solutions: eg. DRBD
dario
  • 131
  • 4
  • Thanks - just an update - we are currently leaning towards Design 2 but we want to use BTRFS instead of ZFS due to its ability to dynamically resize the filesystem. – prema Dec 24 '18 at 17:40