Safely replacing two failing disks in a RAID 6

1

In a Synology DS1812+ with eight 4TB disks set up in a RAID 6, two of the disks report they are failing: Both have almost 2000 reallocated sectors according to SMART, with one having ~400 pending sectors, and the other ~40.

I'd like to replace them as safely as possible. Is either of the following options better than the other, and if so, how?

  • Replace both at the same time.
  • Replace them one at a time.

The NAS has been running pretty much non-stop for years until a year or so ago, since then I've rarely started it.

Daniel Beck

Posted 2017-11-17T10:08:58.483

Reputation: 98 421

That would be up for you to decide. Your RAID level has a failure tolerance of two drives. You remove two drives and another fails => you got a huge problem. You would also need to consider its current workload (sounds like it would be low) as removing drive might impact performance. – Seth – 2017-11-17T10:12:17.023

If there's any data on there you care about that's not backed up, make a backup immediately before you do anything else. – David Schwartz – 2017-11-17T10:17:35.637

@DavidSchwartz The irreplaceable stuff I have backed up elsewhere, but a data loss would still incur quite a bit of work I wish to avoid. Hence this question. – Daniel Beck – 2017-11-17T11:33:58.750

1@Seth So you'd replace them one at a time, with the leftover "failing" drive serving to protect against a sudden failure of an entire drive? Should this be an answer? – Daniel Beck – 2017-11-17T11:35:32.497

Personally yes, I'd likely do this assuming there is no clicking or other values/sounds that would worry me. But it's more of an opinion and something for you to think about. Hence I didn't make it an answer. – Seth – 2017-11-17T12:00:32.310

And if you have a sense of which drive is "worse", replace that one first. – David Schwartz – 2017-11-20T11:36:57.657

@Seth I've gone ahead replacing one drive first (the one that 'seemed worse'). In hindsight I can't believe I considered replacing both at the same time, as they're only (pre-)failing, not failed, and having no protection for days while they new drives are integrated is a bad idea. Since you brought this up first, I suggest you post this as an answer. – Daniel Beck – 2017-11-20T13:30:19.677

Answers

2

This is going to be my opinion as you're the one that's carrying the risk and as the comments show there are multiple ways to handle it. After all you already presented two of your own.

If you look at the Wikipedia article on RAID 6 it states:

According to the Storage Networking Industry Association (SNIA), the definition of RAID 6 is: "Any form of RAID that can continue to execute read and write requests to all of a RAID array's virtual disks in the presence of any two concurrent disk failures. Several methods, including dual check data computations (parity and Reed-Solomon), orthogonal dual parity check data and diagonal parity, have been used to implement RAID Level 6."

So RAID level should be fine even if you remove two disks at once. That said I wouldn't do it. Removing disks could effect the performance of the disk array (depending on the load) and also would mean that with an additional error you could jeopardize the whole array.

As such I'd opt to replace the disk that seems worse, wait for the rebuild to finish and replace the second disk.

Seth

Posted 2017-11-17T10:08:58.483

Reputation: 7 657