18

Obviously, if the entire drive dies, then RAID-Z on a single disk will not help. But what about other types of errors?

From my experience, I sometimes have a file that I can not read. On Mac OS X, the system will hang for a period of time and then come back with an error. I move the file somewhere out of the way, and I assume that that file has a bad sector or a bad block or perhaps even an entire bad track.

I date back to floppy disk days where managing your disk failures by hand was just a common activity. Of course you would replace the bad floppy as soon as possible, but sometimes you could not do that immediately so the practice was to find the bad area, allocate it to a file and then never delete that file.

The first question is how do hard drives fail? Is my assumptions above valid? Is it true that a bad block goes bad but the entire drive is still mostly usable? If that is the case, then it seems like RAID-Z could repair the bad block or the bad area of the disk using the parity of the other blocks (areas).

The use case is for backup. If I push data off to an 8 TB drive once a week, would it make sense to consider it a 7 TB drive of data plus 1 TB of parity in hopes that the extra parity will help me recover from bit rot, bad sectors, or other localized drive failures?

If the theory isn't flawed technically, then can ZFS be configured to do this?

Edit: I saw the other question before I posted this question. Splitting into separate partitions where each partition is grouped together is one option. But in concept, it could be possible to have the block map for the N partitions intertwined with one another so that one stripe, while logically would be across N partitions would physically be very close together. This was the gist of my question "can ZFS be configured to do this?" i.e. just ZFS ... not ZFS with trickery of partitions.

pedz
  • 291
  • 2
  • 7
  • 7
    If you want single-disk redundancy, consider making [par2](https://en.wikipedia.org/wiki/Parchive) archives. With ZFS, you *could* set `copies=2`, but you'll incur a 50% storage penalty by doing that. Additionally, I'm not a ZFS expert, but my intuition (which may be wrong) tells me that ZFS would not be happy with your proposed solution. PAR2 is a mature, flexible technology. Using it would not only satisfy your parity requirement, but would also let you set the amount of parity on a per-archive basis if desired. – EEAA May 22 '17 at 13:32
  • 7
    Possible duplicate of [Drawbacks of a single drive split into partitions and partitions joined into a ZFS raidz1, vs. single drive ZFS with data duplication?](https://serverfault.com/questions/709939/drawbacks-of-a-single-drive-split-into-partitions-and-partitions-joined-into-a-z) – At least the answer there includes everything you should know. – Esa Jokinen May 22 '17 at 13:41
  • Tracks on HDDs are called cylinders (since there may be more than one platter). – LawrenceC May 22 '17 at 17:52
  • 1
    When drives go bad, it tends to be an accelerating process - the rate of sectors going bad increases until the whole drive is unusable. If the drive starts to go bad and RAID-Z is silently compensating for you, you could have a successfully written backup but when you go to recover from it, it's past the point of usability. I'd rather my backup device fail loudly while I'm backing up to it, as soon as it starts to go bad, so I can replace it and continue to trust that what is successfully written is confidently readable. – Anthony X May 22 '17 at 22:14
  • @EEAA -- sounds like what I will use. Thank you. And, thank you to all who commented. – pedz May 27 '17 at 14:31

1 Answers1

23

Since RAID-Z parity works by having the parity block on another device in the pool, you need to partition your device to N+1, N+2 or N+3 equal sized partitions, where N partitions contain the data and 1 / 2 contain the parity bits.

On top of these partitions, you would create a zpool, with RAID-Z you have selected, and then create a file system on the zpool.

So, in theory this works. In practice this would make filesystem performance terrible, since consequent blocks in files would be located in different ZPool devices, which are in different partitions. So, after reading a block, the HDD must move to a different area of the HDD platters, read next block, etc.

The answer is: Yes, it would be stupid.

Peter Mortensen
  • 2,319
  • 5
  • 23
  • 24
Tero Kilkanen
  • 34,499
  • 3
  • 38
  • 58
  • 2
    What about doing this on a SSD? – JFL May 22 '17 at 18:05
  • 7
    @JFL: NAND flash, which is used in SSD's has exactly the problem that this solutions tries to solve, namely individual blocks dying. In fact, with NAND it's such a big problem that all SSD controllers explicitly manage it for you. As a result, the most common _visible_ fault with SSD's is a total loss of the entire device, probably followed by a device failing read-only. Neither problem benefits from the solution prevented here. – MSalters May 22 '17 at 21:16
  • 6
    Moreover, SSDs reorder blocks. One of the wear-leveling optimizations applied is to group writes together. Consecutive writes to different partitions would end up in the same erase block, so in case of a failure of an erase block, you lose exactly the benefit of storing the same data multiple times. – liori May 23 '17 at 01:21