1
1
A colleague and me set up a software RAID 1 with mdadm consisting of two physical disks, with two partitions on the virtual device. The set up went fine, and booting directly from one of the RAID disks yielded:
# cat /proc/mdstat
Personalities : [raid1] [linear] [multipath] [raid0] [raid6] [raid5] [raid4] [raid10]
md127 : active raid1 sda1[0] sdb1[1]
92094464 blocks super 1.2 [2/2] [UU]
md1 : active (auto-read-only) raid1 sda2[0] sdb2[2]
4069376 blocks super 1.2 [2/2] [UU]
unused devices: <none>
To test our setup, we then shut the machine down, disconnected one of the disks, and restarted. The system came up fine, naturally in a degraded state:
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md127 : active raid1 sda1[1]
92094464 blocks super 1.2 [2/1] [_U]
md1 : active (auto-read-only) raid1 sda2[2]
4069376 blocks super 1.2 [2/1] [_U]
unused devices: <none>
Next, we shut the machine down again, reconnected the disconnected disk, and disconnected the other disk. Again, everything went fine, with the following expected state:
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md127 : active raid1 sda1[0]
92094464 blocks super 1.2 [2/1] [U_]
md1 : active (auto-read-only) raid1 sda2[0]
4069376 blocks super 1.2 [2/1] [U_]
unused devices: <none>
Finally, we shut down for the final time, reconnected everything, but what we got was this:
Personalities : [raid1] [linear] [multipath] [raid0] [raid6] [raid5] [raid4] [raid10]
md1 : active (auto-read-only) raid1 sdb2[2] sda2[0]
4069376 blocks super 1.2 [2/2] [UU]
md127 : active raid1 sdb1[1]
92094464 blocks super 1.2 [2/1] [_U]
unused devices: <none>
As you can see, the first partition (second entry, they were swapped for some reason) is in a degraded state (the second is not, but that's just a swap partition). We weren't particularly worried by this. After all, it's expected that the two partitions aren't exactly equal any more after the simulated alternating failure of the disks. We added the missing partition like this:
# mdadm --manage /dev/md127 --add /dev/sda1
mdadm: re-added /dev/sda1
We expected for the partition on /dev/sda
to sync (be overwritten by) the one on /dev/sdb
. Instead, we ended up with a corrupt file system (numerous errors within seconds).
After this experience, I rebooted from a third disk, reinitialised the file system on /dev/md127
(with the -c
option to mkfs.ext4 for good measure), and rebooted back into the now again functioning RAID. Then once more, we shut down, disconnected one disk, booted, shut down again, reconnected the disk, and this time we also left the other disk connected, and booted. Now we got this:
Personalities : [raid1] [linear] [multipath] [raid0] [raid6] [raid5] [raid4] [raid10]
md127 : active raid1 sda1[0]
92094464 blocks super 1.2 [2/1] [U_]
md1 : active (auto-read-only) raid1 sdb2[2] sda2[0]
4069376 blocks super 1.2 [2/2] [UU]
unused devices: <none>
Now we're afraid the same thing will happen again if we just use the --add
option as above.
I have two questions:
- What caused the file system corruption after simulating the alternating failure? My guess is that is has something to do with both disks diverging from the state just before the first disconnection, and this somehow tricked
mdadm --add
in not doing a resync. What would have been the correct sequence of commands to tell mdadm to use the mounted state as authoritative and sync the added disk to it? - In our current situation (one simulated failure and then reconnect, i. e. only one of the disks diverged from the state just before disconnection), what is the proper way to re-add the missing device? Can I just use the add command as above and it will resync? Why didn't it resync automatically?
If it helps, here is the current output from mdadm --examine
:
# mdadm --examine /dev/sda1
/dev/sda1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : 726d9204:889a4c89:b7a1bdb9:a77d8130
Name : testhost:0 (local to host testhost)
Creation Time : Mon Feb 4 14:39:21 2019
Raid Level : raid1
Raid Devices : 2
Avail Dev Size : 184188928 (87.83 GiB 94.30 GB)
Array Size : 92094464 (87.83 GiB 94.30 GB)
Data Offset : 131072 sectors
Super Offset : 8 sectors
Unused Space : before=130984 sectors, after=0 sectors
State : clean
Device UUID : 46077734:6a094293:96f92dc3:0a09706e
Update Time : Tue Feb 5 13:36:59 2019
Bad Block Log : 512 entries available at offset 72 sectors
Checksum : 139d1d09 - correct
Events : 974
Device Role : Active device 0
Array State : A. ('A' == active, '.' == missing, 'R' == replacing)
# mdadm --examine /dev/sdb1
/dev/sdb1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : 726d9204:889a4c89:b7a1bdb9:a77d8130
Name : testhost:0 (local to host testhost)
Creation Time : Mon Feb 4 14:39:21 2019
Raid Level : raid1
Raid Devices : 2
Avail Dev Size : 184188928 (87.83 GiB 94.30 GB)
Array Size : 92094464 (87.83 GiB 94.30 GB)
Data Offset : 131072 sectors
Super Offset : 8 sectors
Unused Space : before=130984 sectors, after=0 sectors
State : clean
Device UUID : dcffbed3:147347dc:b64ebb8d:97ab5956
Update Time : Tue Feb 5 10:47:41 2019
Bad Block Log : 512 entries available at offset 72 sectors
Checksum : e774af76 - correct
Events : 142
Device Role : Active device 1
Array State : AA ('A' == active, '.' == missing, 'R' == replacing)
Did you allow a significant amount of time between each to test to allow the RAID to rebuild itself? – Ramhound – 2019-02-05T13:02:52.780
@Ramhound No. Between the 1st and 2nd test, there was nothing to rebuild. Should we have waited after the 2nd test before executing
--add
? OTOH, would a member that is no longer considered part of the array by the kernel even be rebuilt? – Alexander Klauer – 2019-02-05T13:07:42.383