8

I am setting up a JBOD containing 44 4TB 7200 RPM SAS HDs. I chose RAID 60 as I prefer drive failure protection over the performance improvements offered by RAID 10. My issue is how to choose the optimal disks per span that would results in a reasonable rebuild time. For example, assuming I leave 4 hot-spares, this results in 40 disks for the following possible RAID setups:

  • 2 spans with 20 disks, ~144 TB usable capacity.
  • 4 spans with 10 disks, ~128 TB usable capacity.
  • 5 spans with 8 disks, ~120 TB usable capacity.
  • 8 spans with 5 disks, ~96 TB usable capacity.

I am leaning towards 4 spans of 10 disks as it seems to offer best balance of fault-tolerance (2 of 10 drive failures per span tolerated) and usable capacity (80%, down from 90% for 2 spans of 20 disks).

However, what can I expect rebuild time to be for a single 10 disk span? Web search reveals that even a 10 disk span might not be feasible as rebuild may take too long, thus risks additional drive failure during rebuild. However, many resources on the internet are based on fewer disks or lower capacity disks.

Any thoughts as to what is the optimal setup for this relatively large number of disks?

NOTE: There is backup policy for about 10 TB of data, but not feasible to backup all data. Hence my leaning towards RAID 60 over RAID10. I realize this is not a substitute for backup, but better recovery from drive failure does make system more robust by providing opportunity to rebuild then migrate data to other storage should multiple disk failures occur.

EDIT: Specifications:

  • Disks: Seagate 4TB SAS 3.5" HDD 7200 RPM, enterprise grade.
  • Controller: ServerRAID M5016 controller, including RAID6 enabled, LSI2208 chipset. See: https://www.broadcom.com/products/storage/raid-on-chip/sas-2208.
  • Enclosure: Supermicro 4U storage JBOD 45x3.5 with 2x1400W redundant power modules.
  • OS: CentOS Linux release 7.1.1503 (Core).

Thank you for the help.

Vince
  • 371
  • 5
  • 16
  • 2
    Honestly, 4TB 7200 RPM disks are never going to have a "reasonable" rebuild time (depending on your classification of "reasonable" I guess). I guarantee the rebuild time is going to be "long" – Mark Henderson Dec 08 '17 at 16:14
  • I am ok with system downtime of many hours or even 1 day. Issue is re-build of few days to weeks. I will edit question for clarity once we can determine if RAID60 is even feasible given this definition of "long". – Vince Dec 08 '17 at 16:21
  • 2
    No mention of the hardware type, controllers, HBAs/RAID controllers, enclosure, operating system or anything. More details are better. How can anyone give a specific recommendation? – ewwhite Dec 08 '17 at 16:22
  • Thanks @ewwhite. added details. Please advise if I missed anything. – Vince Dec 08 '17 at 16:35
  • 1
    Don't discount RAID5. If 8+2 RAID6 is acceptable, 4+1 RAID5 should be, too. I'd think a 4+1 RAID5 array would rebuild faster than an 8+2 RAID6 array. I'd think the odds of 2 disk failures in a 4+1 RAID5 array wouldn't be much if any larger than the odds of 3 disk failures in an 8+2 RAID6 array. 7 spans of 4+2 RAID6 might also be an option that would give faster rebuild times. That'd give you 112 TB usable space, but you'd only get a couple of hot spares. – Andrew Henle Dec 08 '17 at 16:49
  • 2
    One part of me says to use ZFS... or at least the rules of ZFS. But in lieu of that, 5 spans of 8 disks. – ewwhite Dec 08 '17 at 17:14
  • Clearly, a 8+2 raid6 is equivalent in capacity to a 4+1 RAID5. All things being equal, would seem like a 4+1 RAID5 will rebuild faster so is the preferred option among these two scenarios? 7 spans of 4+2 RAID6 is also appealing, albeit less so as it further reduces capacity and # hot spares and seems overly conservative for drive failure (which may be good!). – Vince Dec 08 '17 at 17:50
  • Do you have the time on a system of this spec to test rebuild in multiple configurations? As a bonus, any drives that survived such a benchmark probably are not prone to early failure. – John Mahowald Dec 09 '17 at 13:41
  • Is 3 parity 2 span array on the table? What would be your sever load during a rebuild? It would be rebuilding for a long time, but with triple parity it should protect against write holes and UREs as well due to the "voting" the controller can do for just about any error. I would probably implement this in software via ZFS if possible. Also you might segregate your arrays depending on how your file structure is. – Damon Dec 10 '17 at 04:47
  • @john-mahowald, there was a rebuild using 20 disk spans that took 30 hours (reason why I stated this question). I assume if I bring down span size to 4+1 then rebuild will decrease in linear relationship? – Vince Dec 11 '17 at 15:45
  • A 4+1 RAID5 seems like the optimal solution, as rebuild will be less than 8+2 ... Unless there is a reason to pick 8+2 I am not seeing. – Vince Dec 11 '17 at 15:57
  • @Damon, What is CPU overhead associated with parity on ZFS? I have 3 JBODs on a single file server (2 on a SAS2 controller, and other on it's own SAS3 controller). I would prefer using hardware parity, but I am open to thinking about the benefits of ZFS as I hear good things about it, but have been reluctant to use it as XFS is the "de facto" filesystem on CentOS. – Vince Dec 11 '17 at 16:22
  • 1
    I am not sure directly as I do not use ZFS; really this would heavily dependent on your actual write loads. ZFS should give maximum space though. It is robust but from past reading it still has its caveats. In reality, it would be best to try some test setups and benchmark under a variety of simulated scenarios. – Damon Dec 11 '17 at 17:21

4 Answers4

3

With 4 TB 7.2k drives, I'd recommend making the subarrays as small as possible - actually, 5 drives don't really justify using RAID 6 at all.

My 2c are to use RAID 10 where you can expect a rebuild to finish within 12 hours which a 5-drive 20-TB RAID 6 array most probably won't.

Make sure you enable monthly data scrubbing/media patrol/whatever-it's-called-here to detect read errors before they have a chance to stop a rebuild. Most often when a rebuild fails, the cause is not a completely failing drive but a rather old, yet undetected read error that could have been repaired with a regular scrubbing.

Zac67
  • 8,639
  • 2
  • 10
  • 28
  • thanks for tip in patrol read. This was enabled, but always good to know other think it is important. I am leaning towards 4+1 RAID5, with caveat that rebuild may take more than 12 hours, but likely less that 1 day. – Vince Dec 11 '17 at 16:00
  • 3
    _In theory_, a 5x 4TB R5 rebuilt could be done within 10 hours - assuming 120 MB/s average from/to each drive and that the controller can handle the flows in parallel. _In practice_, the controller is much slower and I'd expect some 30 hours. – Zac67 Dec 11 '17 at 18:34
1

Based on excellent comments received I have attempted a RAID60 consisting of 5 spans of 8 disks each for the following reasons:

  1. Based on recent rebuild that included 2 spans of 20 disks, I estimate the rebuild time for the 8+2 configuration to be reasonable.

  2. Usable capacity is reduced marginally compared to spans with larger number of disks (eg. 10 or 20 disks per span). While loss of 20TB seems considerable, smaller span size means rebuild will be achievable is acceptable trade-off.

I will update this answer with any additional information I gather.

Edit: Removed RAID5 as viable option.

Vince
  • 371
  • 5
  • 16
  • Don't EVER use RAID-5. RAID-5 is not to be used under ANY circumstances with disks larger than 1 TB. Double parity is obligatory with very large disks and it has been so for a full decade, see http://www.zdnet.com/article/why-raid-5-stops-working-in-2009/ – wazoox Dec 14 '17 at 21:15
  • 1
    You can use RAID5 with 1.2TB and 1.8TB enterprise SAS (10k) disks... but yeah, in general, don't do it for large, slow SATA and near line SAS drives. – ewwhite Dec 14 '17 at 22:54
  • 2
    If you care for some real world info, my 5 span 12 disk (2TB 7.2k NL SAS) RAID60 on a ServeRAID card will rebuild a disk in 12-16hrs. – brent Dec 15 '17 at 17:34
1

With modern hardware RAID controllers from Avago (LSI) or Microsemi (Adaptec), 20+2 disks RAID arrays are perfectly fine. The rebuild time is reasonable (less than 24 hours). Current drives have very low failure rates, anyway. I'd definitely use 2 spans.

wazoox
  • 6,782
  • 4
  • 30
  • 62
  • My past experience would tend to agree with this: when I migrated the 44 disk JBOD to a 20+2 config one of the drives failed and rebuild took ~30 hours. However, your advice is in contrast to other recommendations here. It would be interesting to get your take as to what the disadvantages are to smaller RAID6 spans, such as 8+2 or 10+2, apart from loss of disk capacity. Keep in mind that the JBOD and drives are 3 years old. – Vince Dec 15 '17 at 00:45
  • @Vince yes I forgot that you used old hardware. But from my experience during the past 15 years, out of the horrible Seagate Barracuda debacle, I never had any noticeable problem with several hundreds 20~24 drives RAID arrays. Furthermore for the past 8 years I've used only HGST drives and the reliability is so ridiculously high that I don't sweat it anymore (with Helium drives it's even more ridiculous; not a single one failed on me in the past 3 years). – wazoox Dec 18 '17 at 15:34
  • 1
    Thanks. Model of drives on each JBOD is WD WD4001FYYG and TOSHIBA MG03SCA400, both are enterprise level. I have 4 hot-spares per JBOD, plus additional on-hand disk. Good to know you have had good experience with spans on 20 disks. I have had as well, albeit limited to a handful of 44 disk JBODs. We also have policy here of replacing JBOD and storage every 5-6 years. – Vince Dec 18 '17 at 16:25
  • 1
    I will accept this as answer. However, others who read this should note that this depends on: (1) using a good hardware RAID controller, and (2) enterprise-level disks. Also, note that rebuild time is **relatively** lengthy. – Vince Dec 18 '17 at 16:28
1

On such a big array, I would really use RAID10, or the equivalent ZFS mirrored setup. You could setup a 42-disk RAID10 + 2 global hot spares (for ~82 TB usable space), and it will provide excellent protection against disk failures with very fast rebuild time.

If you really, really want to use RAID6, I lean toward 5x 10-disks spans.

shodanshok
  • 44,038
  • 6
  • 98
  • 162
  • I was using RAID10 previously and decided on RAID60 because from what I understand RAID10 does not support more than 1 disk failure.with 100% guarantee of recovery, ie if 2 disks fail and both contain the same data. Is this correct? If assume this is correct and led me to decide in favor of RAID60 with 8+2 span size. – Vince Dec 19 '17 at 01:19
  • A RAID10 array will fail only if two *paired* disk concurrently fail. The odds of such an event are low and constant, not depending on array size. On large array this means RAID10 matches or exceeds the resiliency of a similar RAID6 array, with greater performance and low resilver time. On the other hand, you lose much more space than a RAID6 equivalent. – shodanshok Dec 19 '17 at 22:35