3

I have a qnap ts-1679u-rp running on RAID 6. It has 2 disk error. Therefore, I replaced the 2 disk (Disk 3 and Disk 13) with new disk of the same model and capacity.

It failed to rebuild.
1. I tried put back the old disk but i forgot the order. So i anyhow insert it into (Disk 3 and Disk 13).
2. It fail to start up.
3. I swapped the location and try again. it failed to start up.
4. I connect a VGA to the QNAP to see the console screen.
5. It said, cannot connect because Disk 6 read error.
6. I tried pull out all 3 disk (Disk 3, Disk 13, Disk 6). It start up and web interface is finally available but disk format is not recognized. I can't access my disk.
7. with the power, I push the disk 3 , disk 13, disk 6 in. It still cannot access my disk.

I guess i have messed up my RAID Configuration. Will I lose my data? How can I recover from this failure?

There are total 16 disk. I used all 16 disk for RAID 6. 16x4TB.

Journeyman Geek
  • 6,969
  • 3
  • 31
  • 49
  • 4
    Welceom to the world where mathematics matter. Raid 6 over 16 discs = high chance of disaster. You won (the disaster lottery). Do not do another Raid 6 over 16 discs. This is where Raid 60 comes handy - split that into smaller independent groups. Your data at this point is lost, but you should not care about that because - being a professional (and remind yourself, this here is a place for professionals) you have... backups. – TomTom Nov 21 '14 at 14:43
  • unfortunately I have no backups. May I know is there is anyway to recover the data? – user1093137 Nov 21 '14 at 14:45
  • 1
    Nope. Take it as a lesson that ignoring common sense and common knowledge will come back and bite you. You got what you paid for - a high risk data storage with no security. Now pay up. – TomTom Nov 21 '14 at 14:51
  • 1
    You're far from alone, though I suspect that's small comfort right now. Lots of people have blind faith in RAID like it's a magical "anti-data corruption" spell. We all know that isn't the case (especially you, now) but there are lots of people out there that _still_ don't get it no matter how many times its said. To use a car analogy (because what's an IT discussion without one of those) protecting your data with RAID is like wearing a seatbelt in your car. It might save your ass if something bad happens but it isn't a reason to stop watching the road & driving safely in the first place – Rob Moir Nov 22 '14 at 09:57
  • @TomTom That might be a little harsh... but definitely not wrong. – Craig Tullis Jan 10 '17 at 22:25

2 Answers2

5

As often as not with RAID arrays, if you can't get it to rebuild itself, you're finished. It sounds like disk 6 might have failed as well. With the loss of three disks (even if the RAID controller is hallucinating that loss), your data is pretty much gone.

I see you have no backups. That's too bad. But, for the rest of your career, I imagine you might start using RAID properly. It's many things - a way for distributing workload to improve performance, and a way to reduce the immediate operational impact of a failure that would otherwise require a restore from backup. It can even be used to limit data loss in the event of a failure, short-term (i.e. less than your backup interval). But, RAID is not:

  1. A substitute for backups. You may have a severe disk failure or the RAID controller might fail, or your data could be lost for innumerable other reasons that result in software or nature destroying it.
  2. A license to ignore disk failures or to use suspect disks. When you suspect a disk failure, you must correct it immediately.

When in the future you design RAID arrays, you should consider very carefully the odds of a catastrophic failure happening before you can correct it. With a RAID 1 array of two disks, the odds of both of them failing at the same time are pretty low, but in your setup only three out of 16 (19%) had to fail. Basic probability suggests that array is fragile. Use arrays with lower numbers of disks or higher numbers of tolerable failures. Multiple volumes might help; aggregate RAID volumes using compound levels like RAID 10 and RAID 60. A RAID 60 array would have tolerated up to 4 failures (up to 2 in one half of it), and you would most likely have been OK.

To extend that concept a little, when you are using RAID, consider using hot spares. Hot spares are awesome because the array can start rebuilding immediately, and get out of the degraded state that much faster. They basically add disks to your array's failure tolerance, as long as the failures aren't so tightly clustered as to prevent rebuilding in time.

Also, consider the time it will take the array to rebuild. It takes a while to copy a 4TB disk, which is one reason disk arrays are usually built with smaller disks than that (there are other reasons).

Finally:

  • Use high-quality disks. Check out the MTTF, if quoted. Use enterprise-class ones. The premium price is there for a reason. Avoid "green" ones that cycle excessively to save power, or similar.
  • Label your disks. Then, you won't forget which order they go in.

Hopefully this lesson wasn't too expensive.

Falcon Momot
  • 24,975
  • 13
  • 61
  • 92
1

I managed to recover my data fortunately. Here is how I do it:

  1. I typed vi /etc/raidtab to see the disk order. I managed to swap the disk order back for disk 3 and disk 13
  2. The error for is
[ 984.796055] ata1.00: cmd 25/00:20:60:04:5a/00:00:5a:00:00/e0 tag 2 dma 16384 in    
[ 984.796055] res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x14 (ATA bus error)    
[ 984.796058] ata1.00: status: { DRDY }    
[ 984.796066] ata1.00: hard resetting link    
[ 985.520017] ata1.01: hard resetting link    
[ 985.996057] ata1.00: SATA link up 3.0 Gbps (SStatus 123 SControl 300)    
[ 985.996068] ata1.01: SATA link down (SStatus 4 SControl 300)    
[ 986.012323] ata1.00: configured for UDMA/133    
[ 986.012331] ata1.00: device reported invalid CHS sector 0    
[ 986.012340] ata1: EH complete    

The Drive 6 has problem setting the link up. So I suspect the link is loose. So I push in the drive slightly with more force and try again. Interestingly, the link is up this time! So I am left with only 2 drive failure.

  1. type mdadm -E /dev/sda3 to check the status of the disk. I do this for all 16 disk. sda3 to sdp3. Disk 3 and Disk 13 is mark as failed.

  2. type storage_boot_init 2 to assemble all the 16 disk. Very lucky, the data is finally available at \share\MD0_DATA and \share\ folder.

I must admit previously, I didn't know much about RAID 6 and blindly use it. Now that I can get my data back, I will back it up somewhere else before I rebuild the 2 faulty disk. I have already Label all my disk order. Learnt my lesson! This is a real data nightmare to me!