0

a little background to this question first: I am running a RAID-6 within a QNAP TS869L external RAID/NAS system. I started with 5 disks of 3 TB each back in the day, and later added another 2 disks of 3TB to the RAID. The QNAP internals handled the growing and re-syncing etc, and everything seemd to be perfectly fine.

About 2 weeks ago, I had one of the disks (disk #5, disk #2 has gone bad in the mean time) fail, and somehow (I have no idea why), also disks 1 and 2 got kicked out of the array. I replaced disk #5, but the RAID didn't start working again.

After some calls to QNAP technical support, they re-created the array (using mdadm --create --force --assume-clean ...), but the resulting array couldn't find a filesystem, and I was kindly referred to contact a data recovery company that I can't afford.

After some digging through old log files, resetting the disk to factory default, etc, I found a few errors that were made during this re-create - I wish I still had some of the original metadata, but unfortunately i don't (I definitely learned that lesson).

I'm currently at the point where I know the correct chunk-size (64K), metadata-version (1.0; factory default was 0.9, but from what I read 0.9 doesn't handle disks over 2 TB, mine are 3 TB), and I now find the ext4 filesystem that should be on the disks.

Only variable left to determine is the right disk order!

I started using the description found in answer #4 of "Recover RAID 5 data after created new array instead of re-using" but am a little confused on what the order should be for a proper RAID-6. RAID-5 is pretty well documented in a number of places, but RAID-6 much less so.

Also, does the layout, i.e. distribution of parity and data chunks across the disks, change after the growing of the array from 5 to 7 disks, or does the re-sync re-organize them in such a way a native 7-disk RAID-6 would have been?

Thanks


some more mdadm output that might be helpful:

mdadm version:

[~] # mdadm --version
mdadm - v2.6.3 - 20th August 2007

mdadm details from one of the disks in the array:

[~] # mdadm --examine /dev/sda3 
/dev/sda3:
          Magic : a92b4efc
        Version : 1.0
    Feature Map : 0x0
     Array UUID : 1c1614a5:e3be2fbb:4af01271:947fe3aa
           Name : 0
  Creation Time : Tue Jun 10 10:27:58 2014
     Raid Level : raid6
   Raid Devices : 7

  Used Dev Size : 5857395112 (2793.02 GiB 2998.99 GB)
     Array Size : 29286975360 (13965.12 GiB 14994.93 GB)
      Used Size : 5857395072 (2793.02 GiB 2998.99 GB)
   Super Offset : 5857395368 sectors
          State : clean
    Device UUID : 7c572d8f:20c12727:7e88c888:c2c357af

    Update Time : Tue Jun 10 13:01:06 2014
       Checksum : d275c82d - correct
         Events : 7036

     Chunk Size : 64K

    Array Slot : 0 (0, 1, failed, 3, failed, 5, 6)
   Array State : Uu_u_uu 2 failed

mdadm details for the array in the current disk-order (based on my best guess reconstructed from old log-files)

[~] # mdadm --detail /dev/md0
/dev/md0:
        Version : 01.00.03
  Creation Time : Tue Jun 10 10:27:58 2014
     Raid Level : raid6
     Array Size : 14643487680 (13965.12 GiB 14994.93 GB)
  Used Dev Size : 2928697536 (2793.02 GiB 2998.99 GB)
   Raid Devices : 7
  Total Devices : 5
Preferred Minor : 0
    Persistence : Superblock is persistent

    Update Time : Tue Jun 10 13:01:06 2014
          State : clean, degraded
 Active Devices : 5
Working Devices : 5
 Failed Devices : 0
  Spare Devices : 0

     Chunk Size : 64K

           Name : 0
           UUID : 1c1614a5:e3be2fbb:4af01271:947fe3aa
         Events : 7036

    Number   Major   Minor   RaidDevice State
       0       8        3        0      active sync   /dev/sda3
       1       8       19        1      active sync   /dev/sdb3
       2       0        0        2      removed
       3       8       51        3      active sync   /dev/sdd3
       4       0        0        4      removed
       5       8       99        5      active sync   /dev/sdg3
       6       8       83        6      active sync   /dev/sdf3

output from /proc/mdstat (md8, md9, and md13 are internally used RAIDs holding swap, etc; the one I'm after is md0)

[~] # more /proc/mdstat 
Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] [multipath] 
md0 : active raid6 sdf3[6] sdg3[5] sdd3[3] sdb3[1] sda3[0]
      14643487680 blocks super 1.0 level 6, 64k chunk, algorithm 2 [7/5] [UU_U_UU]

md8 : active raid1 sdg2[2](S) sdf2[3](S) sdd2[4](S) sdc2[5](S) sdb2[6](S) sda2[1] sde2[0]
      530048 blocks [2/2] [UU]

md13 : active raid1 sdg4[3] sdf4[4] sde4[5] sdd4[6] sdc4[2] sdb4[1] sda4[0]
      458880 blocks [8/7] [UUUUUUU_]
      bitmap: 21/57 pages [84KB], 4KB chunk

md9 : active raid1 sdg1[6] sdf1[5] sde1[4] sdd1[3] sdc1[2] sda1[0] sdb1[1]
      530048 blocks [8/7] [UUUUUUU_]
      bitmap: 37/65 pages [148KB], 4KB chunk

unused devices: <none>
rkotulla
  • 1
  • 1
  • When you ask "does the layout, i.e. distribution of parity and data chunks across the disks, change after the growing of the array from 5 to 7 disks" and say "or does the re-sync re-organize them in such a way a native 7-disk RAID-6 would have been?" it's actually the same question twice. The answer is Yes & Yes. The layout does change when you grow the array (the first Yes) and when complete the resync has adjusted the layout to be the way a native 7 disk RAID6 would have been (the second Yes). – Ian Macintosh Jun 11 '14 at 15:12
  • Are you saying you lost Disks #1,2, and 5 in the array at the same time? – Rex Jun 11 '14 at 15:25
  • First, I lost disk #5, this is what triggered this mess. #1 & #2 got somehow kicked out of the array at the same time, or at least were marked as "missing" by the time I installed the replacement for #5. Both drives were working just fine at the time. In the mean time, #2 has developed some read-issues, so will be replaced soon as well. #1 is perfectly fine, as far as I can tell. – rkotulla Jun 11 '14 at 18:15
  • That should have been #3 with the read-issues in the comment above, that's why #3 and #5 are listed as "missing" in the mdadm output. – rkotulla Jun 11 '14 at 18:21
  • You said "best guess reconstructed from old log files"? Share the relevant portion of these old log files with us! :-) – Ian Macintosh Jun 12 '14 at 15:06

1 Answers1

1

I would suggest using the same order as the other arrays because they most likely were created under the identical conditions as the array in question.

Remember to always "--assume-clean" on any assemble or create - you probably know this well enough but worth re-mentioning.

Ideally you should actually be working off of images (dd) of the original drives not the actual drives themselves. I realise things aren't always ideal :-)

Finally, if you can, "mount -o ro" if you can for just another level of "Don't write to the drives please" security :-)

Ian Macintosh
  • 945
  • 1
  • 6
  • 12
  • Good point for the --assume-clean, so far I religiously used it when creating the array. As for using the same order as the other disks, md9 and md13 already have different orders (md9: abcdefg, md13: abdgfed), so which one to pick? I tried running e2fsck -n /dev/md0 in both configurations, and both turned up a TON of checksum errors, etc. – rkotulla Jun 11 '14 at 18:20
  • It just strikes me that mdadm --monitor normally emails root when the raid suffers a failure. You should have that email? It contains the original disk layout. cat /var/mail/root or check your email logs? – Ian Macintosh Jun 12 '14 at 15:03