Will FreeNAS/ZFS let me manage my storage this way

6

1

I'm considering FreeNAS; and want to confirm that my understanding of what I can/can't do with storage in it is correct beforehand.

  1. Initial build out: 1 storage pool with 2 x 6TB drives mirroring each other; for a total effective capacity of 6TB (ignoring rounding and overheads).

  2. First expansion (2.5 - 3 years out): Add 2 x 12TB drives to the server. Configure them as a second mirrored pair, and add them to the existing storage pool; increasing my available storage to 18TB.

  3. Second expansion phase 1 (5.5 - 7.5 years out): Add 2 x 24TB drives to the server. Configure them as a mirrored pair, and add them to the existing storage pool; increasing my available storage to 42TB.

  4. Second expansion phase 2 (immediately after phase 1): Have all data re-balanced off of the 6TB drives, remove them from the pool, and then physically from the server. Remaining available storage 36TB.

My thoughts here are:

  • A doubling of needed storage capacity every not quite 3 years is a continuation of my experiences with WHS servers from 2008 to present.

  • After adding the 24 TB drives, the 6 TB ones will only provide a negligible fraction of total storage (1/7th) and are getting old enough that reliability will become a more significant concern if I keep them in (wrong side of bathtub curve). If they survived, at my rate of growth would only extend the time before I'd need to buy 48TB drives by a bit more than half a year; so they wouldn't really get me much time.

  • Limiting myself to 4 drives lets me use a compact mini-ITX form factor for my nas. Going above that means a larger and more expensive setup. (2 drives sitting on the top of an open case with wires snaking out is acceptable for a day or two of transition time; but not longer term.)

  • I'm also assuming business as usual for the availability of larger drives like they have been for my previous about 3 year upgrades: 1.5-3tb (2012) and 3-6TB (now/near future). And that whatever new drives do become available will be reliable enough to be usable (ie the raid apocalypse never happens).

Dan is Fiddling by Firelight

Posted 2015-12-05T19:36:02.730

Reputation: 2 677

It's also possible to replicate zfs volumes. This is used for backing up the zfs, and can be cross-machine or on the same machine. You ought to ask this on the Freenas forum. – JDługosz – 2015-12-06T10:06:57.177

Answers

16

First off: I'm not going to speculate about development 6-7 years into the future. This is about today, and the nearest future. For the TL;DR, see the bottom of this answer.

ZFS today does not allow you to remove a vdev from a pool. It also does not have any native "rebalance" functionality (search for block-pointer rewrite, bp rewrite or bpo rewrite to learn more about this). ZFS does allow you to reduce the redundancy level of a mirrored, but not raidzN, vdev, but that's not what you want. (In ZFS, striping is what you get when you don't say anything.)

Basically, pools can be thought of as striped sets made up of one or more storage devices each, only the latter of which (vdevs) can be arranged in a redundant configuration. You can configure each vdev for arbitrarily high levels of redundancy, but every vdev in the pool has to remain above its redundancy threshold in order for the pool to be fully functional. If a vdev fails, at best you lose only the data that was stored on that vdev, and there is no way to actively control which data gets stored on which vdevs (other than putting them in separate pools, which has other drawbacks).

When you have a "mirrored" pool such as the one you describe after the first expansion, what you really have is two vdevs, each made up of a mirrored pair of physical storage devices, where the two vdevs are striped to form the pool. A two-vdev two-drives-each-mirror pool can be brought down by one failed drive and one unfortunate error on the other drive in that mirror set, even if the other mirror set is working perfectly. In the case of such a failure, no matter what else happens, you lose some data.

The normal way of raising capacity in a ZFS pool is to replace the drives in-place with larger ones, allow the pool to resilver onto the new drive, and then physically remove the old drive that is no longer used. Normally one wants to use zpool replace for this with both the old and the new drive attached, to maintain the desired redundancy level throughout the process. The normal alternative is to add vdevs with the same redundancy level as the existing vdevs in the pool. Again, note that since ZFS doesn't support removing a part of a striped set, and pools are made up strictly of striped vdevs, you can't remove a vdev once you have added it. Lots of newcomers to ZFS fall into this trap, and it's easy to mess up if you don't pay attention to the exact commands you use.

Because of how ZFS resilvers work, resilvering is excruciatingly painful for the drives involved. While traditional RAID resilvers are usually mostly sequential with small amounts of random I/O interspersed from user activity, ZFS resilvers are almost completely random I/O interspersed with random I/O from user activity. A mirror set has seen largely the same activity throughout its life; if one drive is marginal, or even has died, it is quite possible that the other drive is also at best marginal. Having it suffer through the resilvering ordeal for days on end may very easily push it over the edge. (Based on personal experience, I would guesstimate that, to resilver 6 TB of data, you are looking at an approximately week-long resilver process. Even if you turn all the knobs up to 11, based on pure sequential throughput -- which is totally unrealistic given ZFS' on-disk format and resilvering strategy -- you are looking at at least about 17 hours of sheer terror for the drive. My best guess is that there would be no way to get a 6 TB resilver down to less than maybe twice that, or a day and a half, of outright drive abuse.)

I would also have very serious doubts about a 2x24TB or even a 2x12TB mirror configuration, assuming such drives materialize; we are already pushing the boundaries of physics a bit (no pun intended) with current drive sizes. Assuming drive reliability, in terms of URE, remains similar to today (10^-14 to 10^-15 sector errors per bit read, which is where it has hovered for like forever in manufacturer data sheets), statistically you won't be able to do a full read of your 12TB drive without encountering at least one error (12 TB is approximately 9×10^13 bits, which rounds nicely to 10^14). By the time you push this to 24 TB drives, statistically you will hit at least one or two full-sector errors in one full read pass (because you are reading about 2*10^14 bits). Even if you go with 10^-15 URE drives, that won't buy you very much (you are then looking at five read passes, rather than a half read pass). This, of course, is statistics and specifications; in practice, read errors have a tendency to cluster together, and a drive may give trouble-free service for many, many full reads before it, at some point, starts throwing errors left right and center.

Given that most non-server spinners are warrantied only for 1-3 years, I wouldn't count on any set to keep working for much longer than that, while your plan calls for them to remain functional for at least six years before being superseded. Even server spinners are usually only warrantied for five years, although normally five years of 24x7 operation. My personal experience is that high-end consumer drives, such as Seagate's Constellation ES series, can provide nearly this level of duty before giving up the ghost and going to data heaven. (Personal anecdote: a 1 TB Constellation ES.2 I had had something like 4 years 9-10 months under its belt by the time it started acting funny, although I never ran it to failure.)

In general, consider that lots of people who run ZFS go with double redundancy once spinner sizes hit the 3-4 TB mark, or smaller for more important data. There is good reason for this: storage devices fail and it happens with alarming frequency. In the case of ZFS, you are also looking at far more expensive data recovery services if you want anything beyond giving you back the raw bits that happen to still be readable off the platters, and I'm not aware of any data recovery software that can even remotely handle ZFS pools. If ZFS can't import your pool, the most practical answer is often to consider as lost any data on it that you didn't back up elsewhere.

To answer the title question, and TL;DR

Will FreeNAS/ZFS let me manage my storage this way

In a strictly technical sense, yes, it should. However, I really wouldn't recommend it at all. You are pushing the operational envelope of the hardware far too much for me to feel comfortable with it, let alone endorse it. (Note that this answer would pretty much be exactly the same regardless of the file system you were asking about.)

a CVn

Posted 2015-12-05T19:36:02.730

Reputation: 26 553

If real world URE rates on current HDDs were as bad as the specsheet rates; rebuilds of 4 drive raid5 arrays would already be failing on a regular basis. The silence in the tech press makes it clear that the raid apocalypse hasn't happened. . – Dan is Fiddling by Firelight – 2015-12-05T21:05:29.043

While predicting the future isn't possible; assuming HDD makers continue to sell larger drives, real world URE rates will need to stay much lower than one read/drive or the products will be destroyed by review failures. At the point of drive makers not being able to sell larger sized drives; there're enough different possibilities that going beyond planning at the level of "I'll have to do something different" isn't really feasible. – Dan is Fiddling by Firelight – 2015-12-05T21:05:34.437

Am I correctly understanding that give or take what real world URE rates look like at the time; that I could accomplish the drive replacements in my 2nd phase expansion by doing in place replacements of the 6tb drives vs the add followed by remove plan I was originally contemplating? – Dan is Fiddling by Firelight – 2015-12-05T21:10:53.037

You could always use a temporary 3-way mirror to minimize your risk. – qasdfdsaq – 2015-12-08T13:55:14.900

@qasdfdsaq That's basically what zpool replace does, automatically. – a CVn – 2016-01-15T09:17:16.753