Ok - something was bugging me about your issue, so I fired up a VM to dive into the behavior that should be expected. I'll get to what was bugging me in a minute; first let me say this:
Back up these drives before attempting anything!!
You may have already done damage beyond what the resync did; can you clarify what you meant when you said:
Per suggestions I did clean up the superblocks and re-created the array with --assume-clean option but with no luck at all.
If you ran a mdadm --misc --zero-superblock
, then you should be fine.
Anyway, scavenge up some new disks and grab exact current images of them before doing anything at all that might do any more writing to these disks.
dd if=/dev/sdd of=/path/to/store/sdd.img
That being said.. it looks like data stored on these things is shockingly resilient to wayward resyncs. Read on, there is hope, and this may be the day that I hit the answer length limit.
The Best Case Scenario
I threw together a VM to recreate your scenario. The drives are just 100 MB so I wouldn't be waiting forever on each resync, but this should be a pretty accurate representation otherwise.
Built the array as generically and default as possible - 512k chunks, left-symmetric layout, disks in letter order.. nothing special.
root@test:~# mdadm --create /dev/md0 --chunk=512 --level=5 --raid-devices=3 /dev/sdb1 /dev/sdc1 /dev/sdd1
mdadm: Defaulting to version 1.2 metadata
mdadm: array /dev/md0 started.
root@test:~# cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md0 : active raid5 sdd1[3] sdc1[1] sdb1[0]
203776 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/3] [UUU]
unused devices: <none>
So far, so good; let's make a filesystem, and put some data on it.
root@test:~# mkfs.ext4 /dev/md0
mke2fs 1.41.14 (22-Dec-2010)
Filesystem label=
OS type: Linux
Block size=1024 (log=0)
Fragment size=1024 (log=0)
Stride=512 blocks, Stripe width=1024 blocks
51000 inodes, 203776 blocks
10188 blocks (5.00%) reserved for the super user
First data block=1
Maximum filesystem blocks=67371008
25 block groups
8192 blocks per group, 8192 fragments per group
2040 inodes per group
Superblock backups stored on blocks:
8193, 24577, 40961, 57345, 73729
Writing inode tables: done
Creating journal (4096 blocks): done
Writing superblocks and filesystem accounting information: done
This filesystem will be automatically checked every 30 mounts or
180 days, whichever comes first. Use tune2fs -c or -i to override.
root@test:~# mkdir /mnt/raid5
root@test:~# mount /dev/md0 /mnt/raid5
root@test:~# echo "data" > /mnt/raid5/datafile
root@test:~# dd if=/dev/urandom of=/mnt/raid5/randomdata count=10000
10000+0 records in
10000+0 records out
5120000 bytes (5.1 MB) copied, 0.706526 s, 7.2 MB/s
root@test:~# sha1sum /mnt/raid5/randomdata
847685a5d42524e5b1d5484452a649e854b59064 /mnt/raid5/randomdata
Ok. We've got a filesystem and some data ("data" in datafile
, and 5MB worth of random data with that SHA1 hash in randomdata
) on it; let's see what happens when we do a re-create.
root@test:~# umount /mnt/raid5
root@test:~# mdadm --stop /dev/md0
mdadm: stopped /dev/md0
root@test:~# cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
unused devices: <none>
root@test:~# mdadm --create /dev/md1 --chunk=512 --level=5 --raid-devices=3 /dev/sdb1 /dev/sdc1 /dev/sdd1
mdadm: /dev/sdb1 appears to be part of a raid array:
level=raid5 devices=3 ctime=Sat Jan 7 21:07:06 2012
mdadm: /dev/sdc1 appears to be part of a raid array:
level=raid5 devices=3 ctime=Sat Jan 7 21:07:06 2012
mdadm: /dev/sdd1 appears to be part of a raid array:
level=raid5 devices=3 ctime=Sat Jan 7 21:07:06 2012
Continue creating array? y
mdadm: Defaulting to version 1.2 metadata
mdadm: array /dev/md1 started.
root@test:~# cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md1 : active raid5 sdd1[2] sdc1[1] sdb1[0]
203776 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/3] [UUU]
unused devices: <none>
The resync finished very quickly with these tiny disks, but it did occur. So here's what was bugging me from earlier; your fdisk -l
output. Having no partition table on the md
device is not a problem at all, it's expected. Your filesystem resides directly on the fake block device with no partition table.
root@test:~# fdisk -l
...
Disk /dev/md1: 208 MB, 208666624 bytes
2 heads, 4 sectors/track, 50944 cylinders, total 407552 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 524288 bytes / 1048576 bytes
Disk identifier: 0x00000000
Disk /dev/md1 doesn't contain a valid partition table
Yeah, no partition table. But...
root@test:~# fsck.ext4 /dev/md1
e2fsck 1.41.14 (22-Dec-2010)
/dev/md1: clean, 12/51000 files, 12085/203776 blocks
Perfectly valid filesystem, after a resync. So that's good; let's check on our data files:
root@test:~# mount /dev/md1 /mnt/raid5/
root@test:~# cat /mnt/raid5/datafile
data
root@test:~# sha1sum /mnt/raid5/randomdata
847685a5d42524e5b1d5484452a649e854b59064 /mnt/raid5/randomdata
Solid - no data corruption at all! But this is with the exact same settings, so nothing was mapped differently between the two RAID groups. Let's drop this thing down before we try to break it.
root@test:~# umount /mnt/raid5
root@test:~# mdadm --stop /dev/md1
Taking a Step Back
Before we try to break this, let's talk about why it's hard to break. RAID 5 works by using a parity block that protects an area the same size as the block on every other disk in the array. The parity isn't just on one specific disk, it's rotated around the disks evenly to better spread read load out across the disks in normal operation.
The XOR operation to calculate the parity looks like this:
DISK1 DISK2 DISK3 DISK4 PARITY
1 0 1 1 = 1
0 0 1 1 = 0
1 1 1 1 = 0
So, the parity is spread out among the disks.
DISK1 DISK2 DISK3 DISK4 DISK5
DATA DATA DATA DATA PARITY
PARITY DATA DATA DATA DATA
DATA PARITY DATA DATA DATA
A resync is typically done when replacing a dead or missing disk; it's also done on mdadm create
to assure that the data on the disks aligns with what the RAID's geometry is supposed to look like. In that case, the last disk in the array spec is the one that is 'synced to' - all of the existing data on the other disks is used for the sync.
So, all of the data on the 'new' disk is wiped out and rebuilt; either building fresh data blocks out of parity blocks for what should have been there, or else building fresh parity blocks.
What's cool is that the procedure for both of those things is the exact same: an XOR operation across the data from the rest of the disks. The resync process in this case may have in its layout that a certain block should be a parity block, and think it's building a new parity block, when in fact it's re-creating an old data block. So even if it thinks it's building this:
DISK1 DISK2 DISK3 DISK4 DISK5
PARITY DATA DATA DATA DATA
DATA PARITY DATA DATA DATA
DATA DATA PARITY DATA DATA
...it may just be rebuilding DISK5
from the layout above.
So, it's possible for data to stay consistent even if the array's built wrong.
Throwing a Monkey in the Works
(not a wrench; the whole monkey)
Test 1:
Let's make the array in the wrong order! sdc
, then sdd
, then sdb
..
root@test:~# mdadm --create /dev/md1 --chunk=512 --level=5 --raid-devices=3 /dev/sdc1 /dev/sdd1 /dev/sdb1
mdadm: /dev/sdc1 appears to be part of a raid array:
level=raid5 devices=3 ctime=Sat Jan 7 23:06:34 2012
mdadm: /dev/sdd1 appears to be part of a raid array:
level=raid5 devices=3 ctime=Sat Jan 7 23:06:34 2012
mdadm: /dev/sdb1 appears to be part of a raid array:
level=raid5 devices=3 ctime=Sat Jan 7 23:06:34 2012
Continue creating array? y
mdadm: Defaulting to version 1.2 metadata
mdadm: array /dev/md1 started.
root@test:~# cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md1 : active raid5 sdb1[3] sdd1[1] sdc1[0]
203776 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/3] [UUU]
unused devices: <none>
Ok, that's all well and good. Do we have a filesystem?
root@test:~# fsck.ext4 /dev/md1
e2fsck 1.41.14 (22-Dec-2010)
fsck.ext4: Superblock invalid, trying backup blocks...
fsck.ext4: Bad magic number in super-block while trying to open /dev/md1
The superblock could not be read or does not describe a correct ext2
filesystem. If the device is valid and it really contains an ext2
filesystem (and not swap or ufs or something else), then the superblock
is corrupt, and you might try running e2fsck with an alternate superblock:
e2fsck -b 8193 <device>
Nope! Why is that? Because while the data's all there, it's in the wrong order; what was once 512KB of A, then 512KB of B, A, B, and so forth, has now been shuffled to B, A, B, A. The disk now looks like jibberish to the filesystem checker, it won't run. The output of mdadm --misc -D /dev/md1
gives us more detail; It looks like this:
Number Major Minor RaidDevice State
0 8 33 0 active sync /dev/sdc1
1 8 49 1 active sync /dev/sdd1
3 8 17 2 active sync /dev/sdb1
When it should look like this:
Number Major Minor RaidDevice State
0 8 17 0 active sync /dev/sdb1
1 8 33 1 active sync /dev/sdc1
3 8 49 2 active sync /dev/sdd1
So, that's all well and good. We overwrote a whole bunch of data blocks with new parity blocks this time out. Re-create, with the right order now:
root@test:~# mdadm --stop /dev/md1
mdadm: stopped /dev/md1
root@test:~# mdadm --create /dev/md1 --chunk=512 --level=5 --raid-devices=3 /dev/sdb1 /dev/sdc1 /dev/sdd1
mdadm: /dev/sdb1 appears to be part of a raid array:
level=raid5 devices=3 ctime=Sat Jan 7 23:11:08 2012
mdadm: /dev/sdc1 appears to be part of a raid array:
level=raid5 devices=3 ctime=Sat Jan 7 23:11:08 2012
mdadm: /dev/sdd1 appears to be part of a raid array:
level=raid5 devices=3 ctime=Sat Jan 7 23:11:08 2012
Continue creating array? y
mdadm: Defaulting to version 1.2 metadata
mdadm: array /dev/md1 started.
root@test:~# fsck.ext4 /dev/md1
e2fsck 1.41.14 (22-Dec-2010)
/dev/md1: clean, 12/51000 files, 12085/203776 blocks
Neat, there's still a filesystem there! Still got data?
root@test:~# mount /dev/md1 /mnt/raid5/
root@test:~# cat /mnt/raid5/datafile
data
root@test:~# sha1sum /mnt/raid5/randomdata
847685a5d42524e5b1d5484452a649e854b59064 /mnt/raid5/randomdata
Success!
Test 2
Ok, let's change the chunk size and see if that gets us some brokenness.
root@test:~# umount /mnt/raid5
root@test:~# mdadm --stop /dev/md1
mdadm: stopped /dev/md1
root@test:~# mdadm --create /dev/md1 --chunk=64 --level=5 --raid-devices=3 /dev/sdb1 /dev/sdc1 /dev/sdd1
mdadm: /dev/sdb1 appears to be part of a raid array:
level=raid5 devices=3 ctime=Sat Jan 7 23:21:19 2012
mdadm: /dev/sdc1 appears to be part of a raid array:
level=raid5 devices=3 ctime=Sat Jan 7 23:21:19 2012
mdadm: /dev/sdd1 appears to be part of a raid array:
level=raid5 devices=3 ctime=Sat Jan 7 23:21:19 2012
Continue creating array? y
mdadm: Defaulting to version 1.2 metadata
mdadm: array /dev/md1 started.
root@test:~# fsck.ext4 /dev/md1
e2fsck 1.41.14 (22-Dec-2010)
fsck.ext4: Superblock invalid, trying backup blocks...
fsck.ext4: Bad magic number in super-block while trying to open /dev/md1
The superblock could not be read or does not describe a correct ext2
filesystem. If the device is valid and it really contains an ext2
filesystem (and not swap or ufs or something else), then the superblock
is corrupt, and you might try running e2fsck with an alternate superblock:
e2fsck -b 8193 <device>
Yeah, yeah, it's hosed when set up like this. But, can we recover?
root@test:~# mdadm --stop /dev/md1
mdadm: stopped /dev/md1
root@test:~# mdadm --create /dev/md1 --chunk=512 --level=5 --raid-devices=3 /dev/sdb1 /dev/sdc1 /dev/sdd1
mdadm: /dev/sdb1 appears to be part of a raid array:
level=raid5 devices=3 ctime=Sat Jan 7 23:21:51 2012
mdadm: /dev/sdc1 appears to be part of a raid array:
level=raid5 devices=3 ctime=Sat Jan 7 23:21:51 2012
mdadm: /dev/sdd1 appears to be part of a raid array:
level=raid5 devices=3 ctime=Sat Jan 7 23:21:51 2012
Continue creating array? y
mdadm: Defaulting to version 1.2 metadata
mdadm: array /dev/md1 started.
root@test:~# fsck.ext4 /dev/md1
e2fsck 1.41.14 (22-Dec-2010)
/dev/md1: clean, 12/51000 files, 12085/203776 blocks
root@test:~# mount /dev/md1 /mnt/raid5/
root@test:~# cat /mnt/raid5/datafile
data
root@test:~# sha1sum /mnt/raid5/randomdata
847685a5d42524e5b1d5484452a649e854b59064 /mnt/raid5/randomdata
Success, again!
Test 3
This is the one that I thought would kill data for sure - let's do a different layout algorithm!
root@test:~# umount /mnt/raid5
root@test:~# mdadm --stop /dev/md1
mdadm: stopped /dev/md1
root@test:~# mdadm --create /dev/md1 --chunk=512 --level=5 --layout=right-asymmetric --raid-devices=3 /dev/sdb1 /dev/sdc1 /dev/sdd1
mdadm: /dev/sdb1 appears to be part of a raid array:
level=raid5 devices=3 ctime=Sat Jan 7 23:32:34 2012
mdadm: /dev/sdc1 appears to be part of a raid array:
level=raid5 devices=3 ctime=Sat Jan 7 23:32:34 2012
mdadm: /dev/sdd1 appears to be part of a raid array:
level=raid5 devices=3 ctime=Sat Jan 7 23:32:34 2012
Continue creating array? y
mdadm: Defaulting to version 1.2 metadata
mdadm: array /dev/md1 started.
root@test:~# cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md1 : active raid5 sdd1[3] sdc1[1] sdb1[0]
203776 blocks super 1.2 level 5, 512k chunk, algorithm 1 [3/3] [UUU]
unused devices: <none>
root@test:~# fsck.ext4 /dev/md1
e2fsck 1.41.14 (22-Dec-2010)
fsck.ext4: Superblock invalid, trying backup blocks...
Superblock has an invalid journal (inode 8).
Scary and bad - it thinks it found something and wants to do some fixing! Ctrl+C!
Clear<y>? cancelled!
fsck.ext4: Illegal inode number while checking ext3 journal for /dev/md1
Ok, crisis averted. Let's see if the data's still intact after resyncing with the wrong layout:
root@test:~# mdadm --stop /dev/md1
mdadm: stopped /dev/md1
root@test:~# mdadm --create /dev/md1 --chunk=512 --level=5 --raid-devices=3 /dev/sdb1 /dev/sdc1 /dev/sdd1
mdadm: /dev/sdb1 appears to be part of a raid array:
level=raid5 devices=3 ctime=Sat Jan 7 23:33:02 2012
mdadm: /dev/sdc1 appears to be part of a raid array:
level=raid5 devices=3 ctime=Sat Jan 7 23:33:02 2012
mdadm: /dev/sdd1 appears to be part of a raid array:
level=raid5 devices=3 ctime=Sat Jan 7 23:33:02 2012
Continue creating array? y
mdadm: Defaulting to version 1.2 metadata
mdadm: array /dev/md1 started.
root@test:~# fsck.ext4 /dev/md1
e2fsck 1.41.14 (22-Dec-2010)
/dev/md1: clean, 12/51000 files, 12085/203776 blocks
root@test:~# mount /dev/md1 /mnt/raid5/
root@test:~# cat /mnt/raid5/datafile
data
root@test:~# sha1sum /mnt/raid5/randomdata
847685a5d42524e5b1d5484452a649e854b59064 /mnt/raid5/randomdata
Success!
Test 4
Let's also just prove that that superblock zeroing isn't harmful real quick:
root@test:~# umount /mnt/raid5
root@test:~# mdadm --stop /dev/md1
mdadm: stopped /dev/md1
root@test:~# mdadm --misc --zero-superblock /dev/sdb1 /dev/sdc1 /dev/sdd1
root@test:~# mdadm --create /dev/md1 --chunk=512 --level=5 --raid-devices=3 /dev/sdb1 /dev/sdc1 /dev/sdd1
mdadm: Defaulting to version 1.2 metadata
mdadm: array /dev/md1 started.
root@test:~# fsck.ext4 /dev/md1
e2fsck 1.41.14 (22-Dec-2010)
/dev/md1: clean, 12/51000 files, 12085/203776 blocks
root@test:~# mount /dev/md1 /mnt/raid5/
root@test:~# cat /mnt/raid5/datafile
data
root@test:~# sha1sum /mnt/raid5/randomdata
847685a5d42524e5b1d5484452a649e854b59064 /mnt/raid5/randomdata
Yeah, no big deal.
Test 5
Let's just throw everything we've got at it. All 4 previous tests, combined.
- Wrong device order
- Wrong chunk size
- Wrong layout algorithm
- Zeroed superblocks (we'll do this between both creations)
Onward!
root@test:~# umount /mnt/raid5
root@test:~# mdadm --stop /dev/md1
mdadm: stopped /dev/md1
root@test:~# mdadm --misc --zero-superblock /dev/sdb1 /dev/sdc1 /dev/sdd1
root@test:~# mdadm --create /dev/md1 --chunk=64 --level=5 --raid-devices=3 --layout=right-symmetric /dev/sdc1 /dev/sdd1 /dev/sdb1
mdadm: Defaulting to version 1.2 metadata
mdadm: array /dev/md1 started.
root@test:~# cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md1 : active raid5 sdb1[3] sdd1[1] sdc1[0]
204672 blocks super 1.2 level 5, 64k chunk, algorithm 3 [3/3] [UUU]
unused devices: <none>
root@test:~# fsck.ext4 /dev/md1
e2fsck 1.41.14 (22-Dec-2010)
fsck.ext4: Superblock invalid, trying backup blocks...
fsck.ext4: Bad magic number in super-block while trying to open /dev/md1
The superblock could not be read or does not describe a correct ext2
filesystem. If the device is valid and it really contains an ext2
filesystem (and not swap or ufs or something else), then the superblock
is corrupt, and you might try running e2fsck with an alternate superblock:
e2fsck -b 8193 <device>
root@test:~# mdadm --stop /dev/md1
mdadm: stopped /dev/md1
The verdict?
root@test:~# mdadm --misc --zero-superblock /dev/sdb1 /dev/sdc1 /dev/sdd1
root@test:~# mdadm --create /dev/md1 --chunk=512 --level=5 --raid-devices=3 /dev/sdb1 /dev/sdc1 /dev/sdd1
mdadm: Defaulting to version 1.2 metadata
mdadm: array /dev/md1 started.
root@test:~# cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md1 : active raid5 sdd1[3] sdc1[1] sdb1[0]
203776 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/3] [UUU]
unused devices: <none>
root@test:~# fsck.ext4 /dev/md1
e2fsck 1.41.14 (22-Dec-2010)
/dev/md1: clean, 13/51000 files, 17085/203776 blocks
root@test:~# mount /dev/md1 /mnt/raid5/
root@test:~# cat /mnt/raid5/datafile
data
root@test:~# sha1sum /mnt/raid5/randomdata
847685a5d42524e5b1d5484452a649e854b59064 /mnt/raid5/randomdata
Wow.
So, it looks like none of these actions corrupted data in any way. I was quite surprised by this result, frankly; I expected moderate odds of data loss on the chunk size change, and some definite loss on the layout change. I learned something today.
So .. How do I get my data??
As much information as you have about the old system would be extremely helpful to you. If you know the filesystem type, if you have any old copies of your /proc/mdstat
with information on drive order, algorithm, chunk size, and metadata version. Do you have mdadm's email alerts set up? If so, find an old one; if not, check /var/spool/mail/root
. Check your ~/.bash_history
to see if your original build is in there.
So, the list of things that you should do:
- Back up the disks with
dd
before doing anything!!
- Try to
fsck
the current, active md - you may have just happened to build in the same order as before. If you know the filesystem type, that's helpful; use that specific fsck
tool. If any of the tools offer to fix anything, don't let them unless you're sure that they've actually found the valid filesystem! If an fsck
offers to fix something for you, don't hesitate to leave a comment to ask whether it's actually helping or just about to nuke data.
- Try building the array with different parameters. If you have an old
/proc/mdstat
, then you can just mimic what it shows; if not, then you're kinda in the dark - trying all of the different drive orders is reasonable, but checking every possible chunk size with every possible order is futile. For each, fsck
it to see if you get anything promising.
So, that's that. Sorry for the novel, feel free to leave a comment if you have any questions, and good luck!
footnote: under 22 thousand characters; 8k+ shy of the length limit