6

We are preparing to replace our storage servers (iSCSI+NFS). The current servers are Debian Wheezy using mdadm+lvm2 for storage, and failover using drbd and heartbeat (never got heartbeat to work).

For our replacement servers, I would like to use ZFS, but it has the limitation of not being able to reconfigure the raid set live. The server would be only partially populated initially. To add a drive, would need to export the whole file system, reconfigure, then import the file system.

I was originally planning on a pure FreeBSD system, using HAST+CARP to handle the nodes. HAST can only run on GEOM devices, which leaves out a zpool, so it would probably have to be run on a per-drive basis. There is a limit to the number of HAST devices on a system, but I have not been able to find out what this limit is.

Instead, I have come up with what could be a total kludge or could be a good answer. Here is the proposed system. It has one enterprise grade SSD for the OS, and 25 available hotswap bays for data.

I build the machine using Linux+mdadm. 2x120G SSD, RAID-1 and 8x500G SSD as RAID-6. Install xen, and build a FreeBSD virtual that has the the RAID-1 for operating system and the RAID-6 as a device for a zpool. This virtual is the storage server.

This gives the benefits of mdadm, zfs, hast and carp, at the cost of an additional layer (xen) using resources. HAST would run on the RAID-6 to the secondary machine (built exactly the same).

I'd love any feedback, from "you are an idiot" to "sounds ok to me," but preferably with reasoning behind it.

Thank you,

Rod

Rod
  • 61
  • 1

1 Answers1

4

I think it's too many layers of abstraction to be tenable.

ZFS is fine. You don't necessarily need all of the data in a single namespace, do you?

You can expand ZFS zpools, if done carefully, but you should plan on your desired storage need and growth.

High availability in ZFS is also possible. See: https://github.com/ewwhite/zfs-ha/wiki

ewwhite
  • 194,921
  • 91
  • 434
  • 799
  • Too many layers of abstraction is what I was worried about, plus the added complexity. But when I thought about it more, mdadm+lvm2 has a lot of abstraction also, especially since I ended up creating a vg, then an lv which became a vg for the actual exported lv's. This was to get around issues with drbd, though there might have been a better way. – Rod Jun 08 '19 at 20:04
  • Sorry, it would not let me edit the above. Under mdadm, you can take a RAID-5 of 4 drives, add a fifth drive and expand the md to be a RAID-5 of 5 drives, while the drives are in use. You maintain the redundancy, but only the minimal. raid-z does not have that capability. with zfs raid-z, you would need to create an independently redundant set, then expand the zpool with that. I've looked at the link (found it during my research) and it is great, but still gives you a single point of failure. – Rod Jun 08 '19 at 20:11
  • What is the single point of failure? – ewwhite Jun 08 '19 at 20:26
  • The shared JBOD container, ie the D3600 JOBD in his HP config. That is a single shared storage machine which, if it dies, loses the entire setup. They don't die very often, and I did consider this when I was doing the initial planning since that setup does allow you to perform possibly dangerous work on one of the nodes while keeping the system online. But, from what I understand from the article, if the D3600 JBOD dies, everything goes down. – Rod Jun 08 '19 at 20:47
  • @Rod Why would the JBOD enclosure "die"? There are internal redundancies in any unit like that. You're designing something that has more fault domains and operational/support complexity. It's more likely to have problems than a simple shared JBOD cluster. – ewwhite Jun 08 '19 at 21:50
  • There is one controller handling 8 drives. If the controller were to die, the drive access for 8 of the drives would go away. Very unlikely since they are HP, but still possible, and I had it happen about 10 years ago (not an HP, though). I'm also assuming there are other components (cpu, ram) which are common for the system. Mainly, I don't know. – Rod Jun 09 '19 at 01:25
  • For anyone reading this later, just so you know, the article he mentions is excellent. I recommended that configuration to a client after reading it a few months ago. I'm just paranoid for this particular application. – Rod Jun 09 '19 at 01:26
  • I wrote that article. You shouldn’t be more paranoid than that. If the requirement is more critical than what my solution provides, a commercial storage appliance with support and an SLA is the responsible choice. – ewwhite Jun 10 '19 at 19:23