What are the pitfalls of having an "unbalanced" RAID1?

Question

Context: I have a server with two 3TB NAS drives in a RAID1 (Linux dmraid) and I am looking to double storage capacity, but I only have one free drive bay. I could purchase two 6TB drives, but I had the thought that I could possibly get away with purchasing only one and reshaping the array to be:

6TB RAID1
- 6TB disk (new)
- 6TB RAID0
  - 3TB disk (existing)
  - 3TB disk (existing)

I am not concerned about the reshaping process as that should be rather straightforward:

Back up the contents, of course.
Grow the existing RAID1 to 3 devices by adding the 6TB disk.
Wait for resync to complete.
Fail the two 3TB devices out of the array. (Array becomes degraded.)
Reduce the array to 2 devices.
Create the RAID0 across the two 3TB devices.
Add the RAID0 as the second device to the RAID1.
The RAID1 resyncs and is no longer degraded.
Resize the RAID1, growing it from 3TB to 6TB.

This should all be doable online.

However, I want to make sure this is sane. All drives would be the same manufacturer and series (WD Red). Performance is less of a concern to me than reliability.

What problems might I encounter operating and maintaining such an array? (I am not asking about the migration/reshaping process; I'm quite comfortable with that procedure.)

Would there be a performance or reliability advantage to having either RAID1 device (the 6TB disk or the 3TB+3TB RAID0) flagged as write-mostly? For example, since the RAID0 contains older drives, would write-mostly on the RAID0 device extend the life of those drives?

score 1 · Answer 1 · 2018-09-11T06:33:54.140

1

tl;dr: Make sure those smaller drives are extra reliable.

You'd be doing like the old SunOS metadisk, with interesting upgrades, when SCSI drives were prohibitively $$$. ;) Whether drives are of the same mfgr has little-to-no impact on what the OS cares about. It's nice to standardize on one exact model, factory origin and board rev because then controller boards can be swapped should a board go out. (If you have a clean box like Louis Rossmann, you can even swap platters between drives.)

Pf = probability of failure

D# = disk number #

RAID0 reliability is Pf(D0) * Pf(D1)

RAID1 reliability is 1 - (1 - Pf(D0)) * (1 - Pf(D1))

Which leads to an overall Pf = 1 - (1 - Pf(D0)) * (1 - Pf(D10)*Pf(D11))

For future reference, check out BackBlaze's blog for current specific, very reliable drive models in order to get quality drives for relatively cheap and stay away from problematic drives. There are "retail consumer" drives out there with provably better MTTF/MTBF's than enterprise drives, and they're a whole lot cheaper too.

References

http://www.eventhelix.com/RealtimeMantra/FaultHandling/system_reliability_availability.htm#.W5deNaRlCEc

https://www.backblaze.com/blog/

edited Sep 11 '18 at 06:33

answered Sep 11 '18 at 06:27

The 3TB drives would be 2+ years older than the 6TB drive. This probably disqualifies them as "extra reliable," but is there any wisdom in "running them into the ground" so to speak such that I still get usage out of them, but that one of them is more or less guaranteed to fail while the 6TB drive is still young? At that point I can buy another 6TB drive to replace them. I've always been a bit suspicious of the advice that you want all drives in a mirror to be from the same batch and therefore the same age -- that sounds like you want them to all fail at the same time. – cdhowie Sep 12 '18 at 01:44
Leave the paranoia for InfoWars. :) I never said anything about batches, and why would I to sabotage anything or anyone with failures? That accusation is professionally insulting. 2 year old drives are new enough unless they're turned on and off or power-cycled often. If you don't want the help of someone with 30+ years experience in sysadmin, devops, embedded systems engineering, then look the gift horse in the mouth. – Sep 12 '18 at 01:54
What did I say that's insulting? Where did I imply that I don't want your advice? I'm legitimately confused by your reaction. I'm only asking for your thoughts, and if you read any insult or derision into my reply then it was unintended. I'm not trying to argue, I'm trying to understand. If you spell out exactly what I said that is offensive to you, perhaps I can explain what I meant by it. – cdhowie Sep 12 '18 at 02:52

score 1 · Answer 2 · answered Sep 28 '18 at 07:43

What you are doing is exactly what we did in our (small) company. We increased the size of our RAID1 array from an initial 1TB to 4TB similarly to what you are describing, initially buying a larger drive or two, and then completing the new big array with more larger drives as some of the smaller ones were removed from the array because of a failure, or because they were needed in order to replace other (even smaller) failed drives in other PCs in the company.

We aren't too much worried about performance, and we seen different performance during those years, so I cannot speak about that.

We are more worried about reliability (like you), and I can tell you that our RAID array is a 3-RAID1 array, plus an external spare that gets resynced every day and swapped out with one of the 3-RAID1 drives. The external spare is for disaster recovery: in case some event destroys all the drives in the array, we can start with the data of the previous day immediately.

With a 2-drive RAID1, you have to be worried about the resync time needed when you put in a new drive to replace a failed one, which can be several hours for a 6TB array. You are left with the reliability of a single drive alone during those hours. With those big drives, I think it's better to have at least a 3-drive RAID1.

What are the pitfalls of having an "unbalanced" RAID1?

2 Answers2