Recovering a mdadm+lvm+ext4 partition with read error

Question

One of disks in my NAS has failed. The NAS is running Linux, and it uses mdadm + LVM technology for its filesystems.

I do have backup for most of the contents, but not for the very last changes, and if possible, I'd like to recover that from this failing disk.

The disk (a 'green drive' WD10EARS 1TB in size) throws this kind of errors:

Oct  3 12:00:41 kernel: [ 3625.620000] ata5.00: read unc at 9453282
Oct  3 12:00:41 kernel: [ 3625.620000] lba 9453282 start 9453280 end 1953511007 
Oct  3 12:00:41 kernel: [ 3625.620000] sde5 auto_remap 0
Oct  3 12:00:41 kernel: [ 3625.630000] ata5.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x6
Oct  3 12:00:41 kernel: [ 3625.630000] ata5.00: edma_err_cause=00000084 pp_flags=00000003, dev error, EDMA self-disable
Oct  3 12:00:41 kernel: [ 3625.640000] ata5.00: failed command: READ FPDMA QUEUED
Oct  3 12:00:41 kernel: [ 3625.650000] ata5.00: cmd 60/40:00:e0:3e:90/00:00:00:00:00/40 tag 0 ncq 32768 in
Oct  3 12:00:41 kernel: [ 3625.650000]          res 41/40:00:e2:3e:90/12:00:00:00:00/40 Emask 0x409 (media error) <F>
Oct  3 12:00:41 kernel: [ 3625.660000] ata5.00: status: { DRDY ERR }

However, while testing with 'dd', I noticed that if I skip the first 4kB, the read seems to be ok, i.e. a command like. dd if=/dev/sde5 of=dev/null bs=4k count=1000 skip=1 doesn't return any read error.

Supposing that there is no other read failure in the rest of the disk, would I be able to recover this 900 GB partition (as I mentioned before, it's a 'linux raid autodetect' partition, that contains a a LVM2 volume that contains a ext4 filesystem) if I copy-clone the partition somewhere else, but the first 4kB?

If the RAID is not degraded already, than you really don't need to recover anything. Just replace the drive with the new one and do resync. — Adam Ryczkowski, Oct 04 '12 at 19:52
(Obviously?) I would have not posted the question if there would have been another mirror to use for resyncing. — bitwelder, Oct 05 '12 at 22:02

score 1 · Answer 1 · answered Oct 04 '12 at 20:01

1

...Otherwise (i.e. if you already have degraded array) you still should be able to do it.

If your mdadm which created the array is < v.3.0, than the payload starts in offset 0x22000 (and further up if it is v.3.0 or later), which as well before the deleted first 4KB.

So only the linux-raid superblock got corrupted, but that is not that difficult to restore (especially if you have other devices in that array intact).

answered Oct 04 '12 at 20:01

Adam Ryczkowski

690
1
9
29

Thanks, so there is some hope. The disk in question is (or rather was) the last one of the array, although I have another disk belonging to a different array that has a similar structure. In that case how do you suggest to repair the superblock? – bitwelder Oct 05 '12 at 22:08
Well, this is a good question on its own. Try the answer on [this question](http://serverfault.com/questions/427683/what-parameters-to-mdadm-to-re-create-md-device-with-payload-starting-at-0x2200). In short: if you know the array parameters (including the relative position of the device in the array) **and** you use mdadm with the version that puts the superblock in the same offset as the mdadm used to create it originally, (and of course you `--assume-clean`), than I believe you are safe. – Adam Ryczkowski Oct 05 '12 at 22:56

score 0 · Accepted Answer · answered Oct 10 '12 at 10:16

Considering that it was a relatively simple case, an md device working as single physical device for a LVM volume linearly allocated, and that wraps a single ext4 fs, my solution was to go directly after the ext4 fs and mount it as a loop filesystem for the time necessary to the recovery.

My steps:

As mentioned above, with dd I copied the whole damaged partition to a similar-sized partition on a healthy disk (here below named /dev/sdd5)
I copied the first MB of /dev/sdd5 to a file and imported to my Ubuntu desktop computer, where I opened it with hex editor ghex
I saw from ext wiki https://ext4.wiki.kernel.org/index.php/Ext4_Disk_Layout#The_Super_Block that EXT4 superblock has at offset 0x38 the magic number 0xEF53 and from my hex dump I could find that signature.
Redoing the same with a different, healthy ext4 filesystem, I found that the beginning of superblock is 1024 bytes after beginning of partition, so I calculated what would be the offset from begin of partition sdd5 to beginning of the ext4 partition I was looking for.
I created the loop partition with losetup -o <offset> /dev/loop0 /dev/sdd5
Verified with fsck -n /dev/loop0 that offset was right as I indeed got a valid ext fs and that the filesystem was at least in a consistent state.
I mounted the loop fs in read-only mode to a temporary directory: mount -o ro /dev/loop0 /tmp/recovery

At this point I could begin to access the content I wanted to recover from /tmp/recovery/.

Recovering a mdadm+lvm+ext4 partition with read error

2 Answers2