19

I'm setting up a RAID1 array of two new 4TB hard drives.

I heard somewhere previously, that making a RAID1 array of new identical hard drives bought at the same time, increased the chance that they would fail at a similar point in time.

I am therefore considering using one of the hard drives for a period of time (maybe a couple weeks) on its own, in an attempt to reduce the likelihood of both failing within a short amount of time. (the unused drive would be kept disconnected in a drawer)

Does this seem like a reasonable approach, or am I more likely just wasting my time?

a_henderson
  • 291
  • 1
  • 6
  • 2
    It is an often heard claim, but I have yet to see any documentation supporting it. A much more real risk is, that one of your disks may develop some bad sectors, which go unnoticed for a while. But once the other disk fail, you are going to notice those bad sectors during rebuild. – kasperd Mar 17 '15 at 13:00
  • 8
    If you were working with dozens of drives, it might be worth considering sourcing from a few batches. For a two drive set, it's not worth the hassle to do this. The failure rate just isn't that similar or predictable... one could last 3 months, the other could last 5 years. – jlehtinen Mar 17 '15 at 13:14
  • I personally wouldn't raid with just two drives. Using more drives gives better capacity. For example, 3 drives would yield 8 TB of total storage, unlike 2 drives, giving only 4 TB. Any one drive can fail in the set of three, and if they come from three sources, odds of failure at the same time are low. – phyrfox Mar 17 '15 at 14:45
  • 3
    @phyrfox - RAID-5 (and -6) has different performance characteristics than RAID-1 that may not be compatible with his application. With large drives (especially consumer quality drives), if I were going to use higher RAID levels, I'd definitely go with RAID-6 to protect against a second disk failure while rebuilding the array after a single disk failure. I've been running a 5 disk RAID-6 array for 2 years using a set of drives purchased at the same time -- one disk failed a month in, all of the rest haven't shown any problem. – Johnny Mar 17 '15 at 15:36
  • 1
    @phyrfox RAID5 will decrease the cost per megabyte but will actually INCREASE the chance of experiencing a failure as there are more drives to fail. – Caltor Mar 17 '15 at 20:37
  • Nasty consumer SATA drives have 100x higher UBER rate than enterprise FC/SAS drives. The case for RAID 6 there is considerably stronger. But it's a bit of moot point IMO. You can't protect against 'oops' scenarios, so your impact is outage/restore not 'total loss of data'. – Sobrique Mar 18 '15 at 10:20
  • I don't run in new disks, but replacement disks, which were 'repaired' by the manufacturer have been unreliable to me in the past, so I 'stress test' them for 2 days, see also this answer on a related question: http://serverfault.com/questions/501838/best-way-to-test-new-hdds-for-a-cheap-storage-server/502874#502874 – Jens Timmerman Mar 23 '15 at 08:40

6 Answers6

16

It's a waste of time.

You won't be able to induce failure or stress the drives in a meaningful manner. You have RAID, and that's a good start. Just make sure you have monitoring in place to actually detect failures as they occur and backups to protect against disaster.

ewwhite
  • 194,921
  • 91
  • 434
  • 799
  • 2
    Agree for conventional HDS, but for ssds its a very different story. Thought this was worth noting now before 4tb ssds become cheap and available and readers don't realise we're talking about spinning rust here, but maybe by then they'll handle more writes. – symcbean Mar 17 '15 at 23:32
  • 3
    Yes - certainly any 'enterprise' drive will have already been soak tested to get it past the early life failures on the bathtub curve anyway. Although I do know if you buy a pair of generators, the advice is to alternate 66% to 33%, because that way they both don't wear out concurrently. With drives though, the MTBF has quite a large standard deviation, so it's much less a concern. – Sobrique Mar 18 '15 at 10:18
5

It may be better to use different brands or series of disk together if you're worried about this.

I have seen disks of similar type and age fail in clusters, so IMHO it's not an urban leend.

wurtel
  • 3,806
  • 12
  • 15
2

Great Question - However, unlike automobile headlights, this is a waste of time. The MTBF [mean time between failures] rating for 4 GB drives [WD Red in this example] is 1,000,000 hours. The odds of two drives going bad in a mirror at the same time is extremely rare. When I have seen this happen, it is has been because the first drive failed without anyone noticing. More useful to protect with backups than to bother burning-in one drive first. If you do mix drive types, make certain the drives are the same speed. If you are paranoid, then RAID 10 is for you.

DocB
  • 31
  • 4
  • MTBF assumes the disks are independent, which they are not in the same RAID set. There are other reasons this is a waste of time, but a ridiculous number released by the manufacturer which has a weak correlation with reality is not one of them. – HopelessN00b Mar 17 '15 at 21:08
  • 5
    If a HDD really did have the stated *mean* time between failures, then why are warranty periods so short? 1M hours is 114 years, give or take. The WD Red Pro (because I picked one out of the lot) looks to come with a five years warranty. Even if you take *half* the mean time to failure, Western Digital still doesn't believe it'll be reliable for more than about *one tenth* of the stated MTBF period. Now, which would you be more inclined to believe; some random statistic with no obligations, or where the money actually is? (Warranty returns, refunds, refurbs and replacements cost real money.) – user Mar 17 '15 at 22:06
  • 1
    @MichaelKjörling: If they warrantied the MTBF, they'd be replacing over 50% (yes, over -- long tail on the distribution) of the drives under warranty. Sure you should look where the money is, but I see no reason to believe the MTBF isn't an order of magnitude longer than the warranty, and several to believe that it is. – Ben Voigt Mar 17 '15 at 23:59
  • @MichaelKjörling I have seen hardware with a published MTBF of 100k hours which would consistently wear out after 1k hours of operation. Next generation of the hardware had a published MTBF of 200k hours. When the first batch of the new hardware had been in operation for 48 hours more than 50% of them had failed. – kasperd May 15 '15 at 07:12
1

While it makes sense in theory, the data doesn't support the need to work in your drive.
Not only will a few weeks not really make an impact, the failure percentages don't really work when looking at only two drives.

While there has been some indication of more normalized failure rates when it comes to drives of the same model.

Most age-related results are impacted by drive vintages... Interestingly, this does not change our conclusions. In contrast to age-related results, we note that all results shown in the rest of the paper are not affected significantly by the population mix. (emphasis mine)

As such, age related failures, which is only a small subset of failures, can be somewhat correlated to drive vintages. But the majority of failures can't.
If you add to this the overall failure percentages, which can peak at 8% for a given year, the odds of both drives failing in the same year are small, them failing in the same week is negligible.
And this is if you look at every possible cause of failure, not only age related failures.

If you want to minimize the risk, but two drives of a different vintage.
If you want assurances, buy an insurance.
And as ewwhite's answer already stated, backups and monitoring are a must.

Reaces
  • 5,547
  • 4
  • 36
  • 46
0

This is usually an argument for SSDs more than HDDs in my experience. SSDs have limited write cycles, therefore if you use a RAID1 with two SSDs of the same model, both should run out of write cycles near the same time.

As for general failures, unless you have a serious issue like mass vibration, static, or high heat; I don't suspect you'll see 2 out of 2 drives fail at the same time.

A main concern with RAID1 (and RAID10) with larger drives like 4TB is the rebuild. With a 2 drive mirror, when one drive fails, the other drive is then carrying twice the work load. Then when you rebuild, that drive is getting even more load. If there was anything wrong with that drive, it is likely to fail in those conditions especially considering rebuilding a 4TB mirror under load can take a long time.

Devon
  • 780
  • 1
  • 9
  • 20
0

You can do, but it won't help too much.

For example, if there is a needle in the input power, the same needle will kill both disks.

What is important: you need to have a good backup. Raid doesn't make up for a good backup. Actually, if you have a good backup, maybe a mirroring raid isn't surely needed (if you can tolerate a system collapse once around 2-3 years).

peterh
  • 4,914
  • 13
  • 29
  • 44
  • 3
    RAID is about availability, not about backing up data. The point is to keep the system available if a drive fails, not to protect the data on the drive. – HopelessN00b Mar 18 '15 at 15:33
  • @HopelessN00b This is exactly what I tried to explain in the answer, maybe I wasn't enough clear? – peterh Mar 19 '15 at 14:55
  • Your sentence at the end there muddies the waters. – HopelessN00b Mar 19 '15 at 15:12
  • @HopelessN00b Raid also protects from the data loss caused by disk failures. This leads often to the false conclusion, that it is can be used as a backup. But using raid, and using backups, these are situation-dependent things. There are cases, where even a professional sysadm environment doesn't need both of them. On my opinion, the goal is not to force _both_ to an inexperienced sysadm, but to make him clear, that mirroring the disks and backing up its data are different solutions for different problems. – peterh Mar 19 '15 at 15:28