2
1
Over the weekend, I got several emails from our network storage server (just a custom box with CentOS 5 and 2 2tb drives software raid 1) indicating SMART detected issues with one of the drives.
I did a status and 2 of the raided partitions were marked failed:
[root@aapsan01 ~]# cat /proc/mdstat
Personalities : [raid1]
md1 : active raid1 sdb1[1] sda1[0]
104320 blocks [2/2] [UU]
md0 : active raid1 sdb3[1] sda3[2](F)
4064320 blocks [2/1] [_U]
md3 : active raid1 sdb5[1] sda5[0]
1928860160 blocks [2/2] [UU]
md2 : active raid1 sdb2[1] sda2[2](F)
20482752 blocks [2/1] [_U]
So, I set all sda's partitions to "failed," removed all sda mirrors successfully, put a brand new 2tb identical drive in (after shutdown) and booted. Now, I cannot reach the login because error messages keep repeating after md: autodetect raid array is reached during the boot process. At first the errors were something like:
DRDY err (UNC) -- exception emask media error
Now I get I/O errors. I tried with the corrupt drive removed and then with it in again. Same show. The write ups I've found show this to be a simple recovery process. What gives? Anyone encounter anything similar? It appears as though the boot process is still continuing, though it's taking eons to go through each step. Has anyone ever had to wait so long to reach the prompt? Hopefully, if I can't get to the prompt I can get somewhere with the rescue cd.
1Isn't it some sdb partitions that have failed? – Linker3000 – 2010-12-06T19:37:26.473
How can you tell from the stat message? The email I got from the mdadm daemon said "It could be related to component device /dev/sda3." – Flotsam N. Jetsam – 2010-12-06T21:01:49.377
Look at md2 - it has two partitions in the array listed in order [sdb2] [sda2] and the status of the pair is listed as [_U], which means that the first partition ([sdb2]) has dropped out of the pairing. Have a read here: http://www.howtoforge.com/replacing_hard_disks_in_a_raid1_array
– Linker3000 – 2010-12-06T23:03:08.900