Recover RAID 1 after unclean unmount

Question

I have two drives in my desktop in RAID 1. My computer locked up on me yesterday, so I used SysRq REISUB for a "safe" restart. When the machine booted back up, I realized there was no data on my RAID drives (a few empty directories, but no actual data). Fearing the worst, I took the following actions:

Reboot, and check again. Still no data.
Shutdown, physically unplug one drive, and reboot. Still no data.
Ran sudo fsck -y /dev/md0. (md0 is the RAID) The output from this command is pasted below.
Mounted md0

At this point, I have my data back! I have copied the critical data to an external drive. But now I want to fix the RAID (since I'm currently operating on a single drive).

What is the best way to fix my setup, and get the second drive added back to the array? I would assume that I could wipe the second drive, reformat, and add it back to the array, at which point I would expect it to rebuild the array (by copying all the data from the existing, repaired drive). I am hoping, however, that this is not necessary, and that there is a simpler, faster way to recover.

sudo fsck -y /dev/md0 output

chris@compy:/home/chris (23:14:54)
$ sudo fsck -y /dev/md0
fsck from util-linux 2.30.1
e2fsck 1.43.5 (04-Aug-2017)
/dev/md0: recovering journal
JBD2: Invalid checksum recovering block 127 in log 
JBD2: Invalid checksum recovering block 127 in log 
JBD2: Invalid checksum recovering block 127 in log 
JBD2: Invalid checksum recovering block 128 in log 
JBD2: Invalid checksum recovering block 129 in log 
JBD2: Invalid checksum recovering block 129 in log 
JBD2: Invalid checksum recovering block 129 in log 
JBD2: Invalid checksum recovering block 129 in log 
JBD2: Invalid checksum recovering block 130 in log 
JBD2: Invalid checksum recovering block 130 in log 
JBD2: Invalid checksum recovering block 130 in log 
JBD2: Invalid checksum recovering block 130 in log 
JBD2: Invalid checksum recovering block 130 in log 
JBD2: Invalid checksum recovering block 130 in log 
Journal checksum error found in /dev/md0
/dev/md0 was not cleanly unmounted, check forced.
Pass 1: Checking inodes, blocks, and sizes
Inode 215875711 extent tree (at level 1) could be narrower.  Fix? yes 

Pass 1E: Optimizing extent trees
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
Free blocks count wrong (715836081, counted=708804782).
Fix? yes 

Free inodes count wrong (244146193, counted=244145069).
Fix? yes 


/dev/md0: ***** FILE SYSTEM WAS MODIFIED *****
/dev/md0: 42067/244187136 files (31.7% non-contiguous), 267916834/976721616 blocks

score 0 · Answer 1 · answered May 20 '18 at 14:05

There's no magic bullet. Since you've started up the array with one drive missing, and written to it, there's no going back to that instant when the array was healthy with two drives. You're going to be resyncing. Simply follow any one of the numerous procedures out on the Internets for recovering a MD RAID 1 from a drive failure.

However, there's no need to wipe the failed drive. The kernel knows that you've run the array without that drive in it, and that the drive's contents are not trustworthy.

If you have a bitmap configured (mdadm --detail /dev/md0 will tell you) , the kernel may be able to use it to copy only the modified ranges of blocks when it resyncs. If you've had to remove the drive and re-add it, the bitmap can't be used.

Recover RAID 1 after unclean unmount

1 Answers1