23

I have an NAS server with 4x 2TB WD RE4-GP drives in a RAID10 configuration (4TB usable). I'm running out of space (<1TB usable space left). I have $0 to spend on bigger/more drives/enclosures.

I like what I've read about the data-integrity features of ZFS, which - on their own - are enough for me to switch from my existing XFS (software) RAID10. Then I read about ZFS's superior implementation of RAID5, so I thought I might even get up to 2TB more usable space in the bargain using RAIDZ-1.

However, I keep reading more and more posts saying pretty much to just never use RAIDZ-1. Only RAIDZ-2+ is reliable enough to handle "real world" drive failures. Of course, in my case, RAIDZ-2 doesn't make any sense. It'd be much better to use two mirrored vdevs in a single pool (RAID10).

Am I crazy wanting to use RAIDZ-1 for 4x 2TB drives?

Should I just use a pool of two mirrored vdevs (essentially RAID10) and hope the compression gives me enough extra space?

Either way, I plan on using compression. I only have 8GB of RAM (maxed), so dedup isn't an option.

This will be on a FreeNAS server (about to replace the current Ubuntu OS) to avoid the stability issues of ZFS-on-Linux.

Andrew Ensley
  • 912
  • 2
  • 16
  • 30
  • Not sure how this is off-topic. I'm asking for advice about the proper file system configuration for a server. – Andrew Ensley Oct 07 '14 at 19:15
  • Also RAIDZ1or2 write speed sucks compared to RAID10 – JamesRyan Dec 16 '14 at 23:52
  • 2
    If you have enough CPU to calculate parity without slowdown, RAIDZ should be as fast or faster than RAID10 for most writes. RAIDZ writes everything in a full RAID stripe, there is no read-modify-write cycle like with RAID5. So you'll get more disk bandwidth (more data, less overhead), and the writes should be faster than RAID10. However, this has the disadvantage that *reads* often end up slower. "Write a full stripe every time" leads to fragmentation, and doesn't give you the benefit of reading only a subset of the disks for many small reads. This was a conscious design decision. – Dan Pritts Dec 23 '14 at 16:33
  • What I said above is only partly true. RAID10 will be much faster if you have concurrent small writes, e.g., a database server. In RAIDZ all disks are active for all writes; RAID10 splits them up. The point I was trying to get across was that RAIDZ does away with the performance-killing and potentially unsafe read-modify-write cycle of RAID5. – Dan Pritts Mar 03 '15 at 15:17
  • 1
    RAIDZ2 is more reliable than RAID10. With RAIDZ2, any two disks can fail and you will still have your data. With RAID10, two failed disks (in a four disk array) *may* cause data loss. – Klaws Jan 13 '19 at 10:22

3 Answers3

31

Before we go into specifics, consider your use case. Are you storing photos, MP3's and DVD rips? If so, you might not care whether you permanently lose a single block from the array. On the other hand, if it's important data, this might be a disaster.

The statement that RAIDZ-1 is "not good enough for real world failures" is because you are likely to have a latent media error on one of your surviving disks when reconstruction time comes. The same logic applies to RAID5.

ZFS mitigates this failure to some extent. If a RAID5 device can't be reconstructed, you are pretty much out of luck; copy your (remaining) data off and rebuild from scratch. With ZFS, on the other hand, it will reconstruct all but the bad chunk, and let the administrator "clear" the errors. You'll lose a file/portion of a file, but you won't lose the entire array. And, of course, ZFS's parity checking means that you will be reliably informed that there's an error. Otherwise, I believe it's possible (although unlikely) that multiple errors will result in a rebuild apparently succeeding, but giving you back bad data.

Since ZFS is a "Rampant Layering Violation," it also knows which areas don't have data on them, and can skip them in the rebuild. So if your array is half empty you're half as likely to have a rebuild error.

You can reduce the likelihood of these kinds of rebuild errors on any RAID level by doing regular "zpool scrubs" or "mdadm checks"of your array. There are similar commands/processes for other RAID's; e.g., LSI/dell PERC raid cards call this "patrol read." These go read everything, which may help the disk drives find failing sectors, and reassign them, before they become permanent. If they are permanent, the RAID system (ZFS/md/raid card/whatever) can rebuild the data from parity.

Even if you use RAIDZ2 or RAID6, regular scrubs are important.

One final note - RAID of any sort is not a substitute for backups - it won't protect you against accidental deletion, ransomware, etc. Although regular ZFS snapshots can be part of a backup strategy.

Dan Pritts
  • 3,181
  • 25
  • 27
  • 1
    Thanks for that explanation. That makes a lot of sense and matches with what I've learned about ZFS so far. I have actually reloaded my server with FreeNAS already and went with the RAIDZ-1 configuration. I have it set to scrub once a month. Do you think that is often enough or would you recommend more frequent scrubs? The rampant layering violation is my favorite feature of ZFS :-) – Andrew Ensley Dec 09 '14 at 17:20
  • 1
    I have a raidz1 running on 7 consumer drives of various ages. I have it scrub every 2 weeks. It often finds an error and corrects it. I recently lost a drive and lost a file which had a latent error. Luckily, it was a media file that I can easily replace. For my important data I still, of course, have backups. – Dan Pritts Dec 12 '14 at 20:06
  • I will point out - 'home' drives have 2 orders of magnitude worse unrecoverable bit error rate when squared off against 'enterprise' grade. I'm still quite happy that the compound failure rate on RAID-5 is acceptable on decent FC/SAS drives. Wouldn't do it on SATA though. – Sobrique Dec 23 '14 at 16:33
  • 1
    Only one order of magnitude comparing two seagate drives: Seagate ST2000DM001: 1 in 10E14. ST2000NM0033: 1 in 10E15. Really, though, tough to say for sure whether the drive mechanisms are any different. I've heard credible sources give opposing answers. – Dan Pritts Dec 23 '14 at 16:41
  • 2
    I discovered a bad SATA cable on my system - since it was replaced, my scrubs have found zero errors. – Dan Pritts Sep 20 '15 at 16:08
  • I'd like to challenge the assumptions of your "might not care" use case here. Sure, mp3s and dvd rips might be easily-replaceable (assuming you're saving the original media), but most people consider photos to be irreplaceable -- and here's an example of how a single bit flip can destroy a photo irreparably: https://upload.wikimedia.org/wikipedia/commons/thumb/1/1a/Bitrot_in_JPEG_files%2C_1_bit_flipped.jpg/300px-Bitrot_in_JPEG_files%2C_1_bit_flipped.jpg – ghostly_s Feb 28 '22 at 02:20
  • I'll stand by "MIGHT not care." – Dan Pritts Mar 05 '22 at 00:34
4

There is a little bit of a misconception at work here. A lot of the advice you're seeing is based on an assumption which may not be true. Specifically, the unrecoverable bit error rate of your drive.

A cheap 'home user' disk has 1 per 10^14 unrecoverable error rate.

http://www.seagate.com/gb/en/internal-hard-drives/desktop-hard-drives/desktop-hdd/#specs

This is at a level where your're talking a significant likelihood of an unrecoverable error during a RAID rebuild, and so you shouldn't do it. (A quick and dirty calculation suggests that 5x 2TB disks RAID-5 set will actually have around a 60% chance of this)

However this isn't true for more expensive drives: http://www.seagate.com/gb/en/internal-hard-drives/enterprise-hard-drives/hdd/enterprise-performance-15k-hdd/#specs

1 per 10^16 is 100x better - meaning 5x 2TB is <1% chance of failed rebuild. (Probably less, because for enterprise usage, 600GB spindles are generally more useful).

So personally - I think both RAID-5 and RAID-4 are still eminently usable, for all the reasons RAID-0 is still fairly common. Don't forget - the problem with RAID-6 is it's hefty write penalty. You can partially mitigate this with lots of caching, but you've still got some pain built in, especially when you're working with slow drives in the first place.

And more fundamentally - NEVER EVER trust your RAID to give you full resilience. You'll lose data more often to an 'oops' than a drive failures, so you NEED a decent backup strategy if you care about your data anyway.

Sobrique
  • 3,697
  • 2
  • 14
  • 34
  • I'm using 4 WD RE4-GP drives, which have <1 in 10^15 non-recoverable read errors. – Andrew Ensley Dec 23 '14 at 19:51
  • 1
    The RAID6 write penalty is very real. However, RAID-Z2 does not suffer from it; zfs makes all writes full-stripe. This has other negative effects, though - it tends to reduce read performance for several reasons. – Dan Pritts Jan 09 '19 at 16:50
  • Wow exceptional reminder about unrecoverable error rates, thinking about that would normally give my head spins but this is a very important warning! – Tmanok Feb 16 '21 at 21:59
3

Hmmm, some bad information here. For 4 disks, there's really nothing wrong with XFS. I tend to avoid ZFS RAIDZ for performance and expandability reasons (low reads/writes, can't be expanded). Use ZFS mirrors if you can. However, with 4 disks and nowhere to place your OS, you'll either lose a lot of capacity or have to go through odd partitioning games to fit your OS and data onto the same four disks.

I'd probably not recommend ZFS for your use case. There's nothing wrong with XFS here.

ewwhite
  • 194,921
  • 91
  • 434
  • 799
  • 3
    Forgot to mention that the OS lives on a separate drive. Sorry. What I'm wanting from ZFS that the XFS RAID10 doesn't have is checksum data verification that can detect (and transparently fix) silent data errors (a bit flipped on the drive, and the HDD has no idea). I don't believe XFS is able to do this. – Andrew Ensley Oct 07 '14 at 18:56
  • For four disks, use ZFS mirrors if there's any chance you'll need to expand or if performance matters. I'd also avoid FreeNAS and just use straight ZFS on Linux. – ewwhite Oct 07 '14 at 21:20
  • Why avoid FreeNAS? The reason I intend to switch is because ZFS on Linux uses the Solaris Emulation Layer, which can break with a simple Linux kernel update and potentially nuke the zpool. ZFS runs natively on Unix-/BSD-based OSes and doesn't have that problem. http://confessionsofalinuxpenguin.blogspot.com/2012/09/btrfs-vs-zfsonlinux-how-do-they-compare.html – Andrew Ensley Oct 07 '14 at 23:53
  • DKMS takes care of the kernel updates and ZFS package changes in Linux. I've been using ZFS on Linux in production since 2012, though. FreeNAS does some [quirky things to the pool disks](http://serverfault.com/a/586917/13325), and we've had a ton of misconfiguration and questions about weird FreeNAS failure modes. I don't think it's worth using just to get a GUI. Just an opinion, though. ZFS on Linux works well. – ewwhite Oct 07 '14 at 23:59
  • 1
    I'm a terminal guy myself, so I'm definitely not switching for the GUI. Mostly, I just need a stable file system that (as much as is possible) guarantees the integrity of the files stored on it. And I was hoping to gain some space in the process. I've seen a lot of issues reported for ZoL, many of them relating to Ubuntu OS upgrades. https://groups.google.com/a/zfsonlinux.org/forum/#!searchin/zfs-discuss/ubuntu$20upgrade Not trying to be a pain. Just explaining why I think what I think. I'm certainly open to correction. – Andrew Ensley Oct 08 '14 at 00:09
  • 1
    That's fine. I've seen far more issues with FreeNAS (not FreeBSD), so it goes both ways. There's info out there. I don't use Ubuntu, [but I *do* know ZFS](http://serverfault.com/search?tab=newest&q=user%3a13325%20zfs). My ZFS on Linux is usually with RHEL or CentOS. Here's [a sample workflow](http://serverfault.com/questions/617648/transparent-compression-filesystem-in-conjunction-with-ext4/617791#617791). – ewwhite Oct 08 '14 at 00:14
  • Thank you for the link to your workflow. Very informative. I now feel more undecided than ever, ha. – Andrew Ensley Oct 08 '14 at 00:25
  • Let us [continue this discussion in chat](http://chat.stackexchange.com/rooms/17725/discussion-between-andrew-and-ewwhite). – Andrew Ensley Oct 08 '14 at 20:59
  • Thanks for your help in Chat. Going to go with RAIDZ since I need the space and performance isn't as much of a concern. – Andrew Ensley Oct 09 '14 at 18:16
  • 1
    I use ZFS on Linux and Centos 6. I don't allow automatic updates of the kernel or of ZFS. I've had issues with ZFS/SCL borking, but I have never had data loss. For the record, btw, FreeBSD has a similar set of solaris compatibility routines, but they and ZFS are fully integrated into the distribution, which makes it a lot simpler to make things all work together. If I only wanted ZFS and file service, I'd probably run FreeBSD. In fact, that's what I used to do, but I use the box for other random stuff, which made ZoL more appealing. – Dan Pritts Dec 12 '14 at 20:09