15

Let me acknowledge first off that I have made mistakes, and that I have a backup for most but not all of the data on this RAID. I still have hope of recovering the rest of the data. I don't have the kind of money to take the drives to a recovery expert company.

Mistake #0, not having a 100% backup. I know.

I have a mdadm RAID5 system of 4x3TB. Drives /dev/sd[b-e], all with one partition /dev/sd[b-e]1. I'm aware that RAID5 on very large drives is risky, yet I did it anyway.

Recent events

The RAID become degraded after a two drive failure. One drive [/dev/sdc] is really gone, the other [/dev/sde] came back up after a power cycle, but was not automatically re-added to the RAID. So I was left with a 4 device RAID with only 2 active drives [/dev/sdb and /dev/sdd].

Mistake #1, not using dd copies of the drives for restoring the RAID. I did not have the drives or the time. Mistake #2, not making a backup of the superblock and mdadm -E of the remaining drives.

Recovery attempt

I reassembled the RAID in degraded mode with

mdadm --assemble --force /dev/md0, using /dev/sd[bde]1.

I could then access my data. I replaced /dev/sdc with a spare; empty; identical drive.

I removed the old /dev/sdc1 from the RAID

mdadm --fail /dev/md0 /dev/sdc1

Mistake #3, not doing this before replacing the drive

I then partitioned the new /dev/sdc and added it to the RAID.

mdadm --add /dev/md0 /dev/sdc1

It then began to restore the RAID. ETA 300 mins. I followed the process via /proc/mdstat to 2% and then went to do other stuff.

Checking the result

Several hours (but less then 300 mins) later, I checked the process. It had stopped due to a read error on /dev/sde1.

Here is where the trouble really starts

I then removed /dev/sde1 from the RAID and re-added it. I can't remember why I did this; it was late.

mdadm --manage /dev/md0 --remove /dev/sde1
mdadm --manage /dev/md0 --add /dev/sde1

However, /dev/sde1 was now marked as spare. So I decided to recreate the whole array using --assume-clean using what I thought was the right order, and with /dev/sdc1 missing.

mdadm --create /dev/md0 --assume-clean -l5 -n4 /dev/sdb1 missing /dev/sdd1 /dev/sde1

That worked, but the filesystem was not recognized while trying to mount. (It should have been EXT4).

Device order

I then checked a recent backup I had of /proc/mdstat, and I found the drive order.

md0 : active raid5 sdb1[0] sde1[4] sdd1[2] sdc1[1]
      8790402048 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/4] [UUUU]

I then remembered this RAID had suffered a drive loss about a year ago, and recovered from it by replacing the faulty drive with a spare one. That may have scrambled the device order a bit...so there was no drive [3] but only [0],[1],[2], and [4].

I tried to find the drive order with the Permute_array script: https://raid.wiki.kernel.org/index.php/Permute_array.pl but that did not find the right order.

Questions

I now have two main questions:

  1. I screwed up all the superblocks on the drives, but only gave:

    mdadm --create --assume-clean
    

    commands (so I should not have overwritten the data itself on /dev/sd[bde]1. Am I right that in theory the RAID can be restored [assuming for a moment that /dev/sde1 is ok] if I just find the right device order?

  2. Is it important that /dev/sde1 be given the device number [4] in the RAID? When I create it with

    mdadm --create /dev/md0 --assume-clean -l5 -n4 \
      /dev/sdb1 missing /dev/sdd1 /dev/sde1
    

    it is assigned the number [3]. I wonder if that is relevant to the calculation of the parity blocks. If it turns out to be important, how can I recreate the array with /dev/sdb1[0] missing[1] /dev/sdd1[2] /dev/sde1[4]? If I could get that to work I could start it in degraded mode and add the new drive /dev/sdc1 and let it resync again.

It's OK if you would like to point out to me that this may not have been the best course of action, but you'll find that I realized this. It would be great if anyone has any suggestions.

masegaloeh
  • 17,978
  • 9
  • 56
  • 104
Peter Bos
  • 151
  • 1
  • 1
  • 5
  • 1
    +1 This is a very well thought out and documented question. I wish I had an answer for you. – Grant Sep 14 '13 at 13:51
  • Thank you for your comment, I guess this is a tough one. – Peter Bos Sep 16 '13 at 11:58
  • Have you given up on this, or are you still working on it? If you are working on it, my advice, scrounge up all the drives you have laying around and create a JBOD on another machine you can create DD images to, it's way better to deal with it that way since you can keep trying over and over. (Use LVM and then use snapshots once it's finished, so you can keep deleting the snapshot and not have to re-copy the whole thing). I have been in a similar boat, and I managed to recover the array with most of the data intact. – Regan Oct 27 '13 at 08:21
  • Thanks for your reaction. After a while I did give up on this, replaced two drives with new ones, recovered 98% from backup, accepted the 2% data loss and moved on. I'm now using RAID-Z and have updated my backup-strategy. So far so good. – Peter Bos Oct 27 '13 at 14:05

3 Answers3

3

To answer your questions,

  1. Can it be restored?

    • First thing's first - STOP, sit back and just think a little. Yes, algorithm, chunk size and disk order is vital to getting whatever filesystem that was present, to properly re-assemble. But since you've overwritten the superblocks, you're now left with trial and error.
    • Second, is there any way you can retrieve the previous disk layout? I always do an mdadm --detail > backupfile just to keep that disk layout somewhere safe. Check dmesg, /var/log for any evidence of how the disks were configured in the raid.
    • Lastly, if you match the previous chunk size and disk order, you may have damaged the ext4 superblock - there are ways to quckly scan for other superblocks (and there's a nifty program called TestDisk that scans for superblocks of existing filesystems and tries to browse them manually: http://www.cgsecurity.org/wiki/Main_Page)
  2. Since sdc is new, I would continue to try and assemble manually via the missing clause, and yes, sde must be in the correct order for it to assemble in degraded mode. Once you find the correct layout - copy all data off the array and start again, documenting the layout (so you don't run in to this issue again).

Good Luck

Litch
  • 316
  • 1
  • 3
  • 10
  • 1
    ext3/4 writes redundant superblocks. You can pass the superblock offset as an argument to mount or fsck to use the backup superblocks instead. Still, two drives down in a RAID 5=game over. – dmourati Nov 11 '13 at 02:15
1

Before you do ANYTHING else, capture an 'mdadm --examine /dev/sdX1' for each of the drives that WERE in your array, and an 'mdadm --detail /dev/md0' from that, you should be able to determine the exact layout.

I just had to do this myself to recover a Synology array in a separate question:

How to recover an mdadm array on Synology NAS with drive in "E" state?

Edit: Sorry, just saw that you said you lost the superblocks on all the drives.

Your later commands LOOK correct. Simplest option might be to run the creates with each possible ordering, and then see if you can mount and access the filesystem on them read-only.

Nathan Neulinger
  • 597
  • 1
  • 5
  • 16
1

This question is old and I'm sure nobody can help you now, but for others reading:

the most dangerous mistake you made is not one you numbered, which was to run:

mdadm --create ...

on the original disks, before you were prepared knowing what to do. This has overwritten the metadata, so you have no record of drive order, data offset, chunk size, etc.

To recover from this, you need to overwrite those again with the correct values. The easiest way to know this is to look at the metadata, but you destroyed that already. The next way is to guess. Guess at the different combinations of a command like this, with different values for any of the options except what you know (4 devices, level 5), and also different disk order:

mdadm --create /dev/md0 --assume-clean --metadata=1.2 --raid-devices=4 --level=5 --layout=... --chunk=512 --data-offset=128M /dev/sdb1 missing /dev/sdd1 /dev/sde1

But since you DO NOT know the correct result, again, you should not run that on the old disks destroying them further, making the same fatal mistake. Instead, use an overlay; for example this procedure should work to keep the originals safe.

Once you have found some arguments that produce a working array that you can fsck or mount and verify (eg. check the checksum of a file large enough to span across all the raid members like an iso which you should have stored with its checksum/pgp signature, or unzip -t or gunzip -t a large archive)

Peter
  • 2,546
  • 1
  • 18
  • 25
  • Thank you. Meanwhile, I have moved on to using ZFS (RAIDZ2). However, it was very interesting to read your notes. I realise now that the **create** command _did_ overwrite the metadata, while at the time I assumed it wouldn't. Also, I did not know about overlay files. That is really neat! Thanks! – Peter Bos Oct 09 '15 at 13:44