0

We woke up this morning with 2 failed disks on RAID5 with a single hot-spare configuration.

Hot-spare disk didn't replace any damaged disk maybe because there are 2 disks failed at the same time.

However, I added two new disks and the parity is initializing now but the partition filesystem changed to RAW. shall I wait to finish the initialization? or I lost all data on the logical volume... Do you recommend using commercial recovery software to restore (VHDX files) from RAW FS? please advise.

Maroon
  • 1

2 Answers2

3

Do you recommend using commercial recovery software to restore (VHDX files) from RAW FS? please advise

R5 doesn't recover from two disks failing, it's also dangerously bad these days anyway, please don't use it again. Anyway you can try to recover them but it'll be expensive, take a while and unlikely to help - best just recover from backup - much quicker - and onto R1/10 R6/60 please :)

Chopper3
  • 100,240
  • 9
  • 106
  • 238
1

It's not uncommon to have one drive fail in a RAID5 and then have a second drive fail on rebuild - if you haven't taken care of the array.

The core of the problem is that some unused data blocks may slowly degrade (bit rot). It simply isn't detected (and automatically repaired/remapped by the drive) because it's not been read back. However, on a rebuild all data needs to be read and if it can't rebuild fails. Bummer.

Using RAID classes with multiple redundancy like levels 6 or 60 is a good way to avoid this kind of problem - in short: RAID 6 is practically immune to bit rot and a much better choice than RAID 5 + hot spare.

RAID levels 1 and 10 can also exhibit the bit-rot problem, but probability is lower than with R5.

Sometimes, you cannot run anything but RAID levels 5 or 50. In that case it's essential (and a good idea for the other RAID levels as well) that you run a regular media scan aka disk scrubbing, media patrol, patrol read, surface scan. That ensures that all soft errors are fixed before they become hard errors. Strangely, scrubbing is not active by default on most controllers.

In your case, either the data has been corrupted or is zeroed anyway. Simply recreate partitioning, format and restore from backup. Of course, a regular backup is even more essential than disk scrubbing. Even a well-groomed RAID is no replacement for a good backup strategy.

Zac67
  • 8,639
  • 2
  • 10
  • 28