2

A while ago I had a raid10 config crap out on me, and am now just getting around to trying to salvage the array so I can rebuild and move on with my life. Basically a drive in each subset failed, which means (in theory) I can recover. Versus if I lost two disks in the same subset, then recovery is not possible.

I removed the two bad drives and added two new drives to the system. For the raid controller card the system is using a promise fasttrak 4310. When I booted the system I jumped into the raid controller card bios and noticed that all 4 drives were found, but the two new ones (obviously) were not assigned to the raid configuration. Unfortunately there is no way for me to remove the two old drives and add the two new drives from the config via the bios. Promise does provide a WebPAM installer, but it ancient (6 years old) and will not install on CentOS 6.4.

So I did some digging around and came across "dmraid". dmraid looks promising as it was returning information about my raid config, based on what I know about it:

root@service1 ~ # -> dmraid -s -s
ERROR: pdc: wrong # of devices in RAID set "pdc_fbdbhaai-0" [1/2] on /dev/sdb
ERROR: pdc: wrong # of devices in RAID set "pdc_fbdbhaai-1" [1/2] on /dev/sde
ERROR: pdc: wrong # of devices in RAID set "pdc_fbdbhaai-0" [1/2] on /dev/sdb
ERROR: pdc: wrong # of devices in RAID set "pdc_fbdbhaai-1" [1/2] on /dev/sde
*** Superset
name   : pdc_fbdbhaai
size   : 976642080
stride : 32
type   : raid10
status : ok
subsets: 2
devs   : 2
spares : 0
--> Subset
name   : pdc_fbdbhaai-0
size   : 976642080
stride : 32
type   : stripe
status : broken
subsets: 0
devs   : 1
spares : 0
--> Subset
name   : pdc_fbdbhaai-1
size   : 976642080
stride : 32
type   : stripe
status : broken
subsets: 0
devs   : 1
spares : 0

root@service1 ~ # -> dmraid -r
/dev/sde: pdc, "pdc_fbdbhaai-1", stripe, ok, 976642080 sectors, data@ 0
/dev/sdb: pdc, "pdc_fbdbhaai-0", stripe, ok, 976642080 sectors, data@ 0

As of now, it looks like all I need to do is update the raid metadata to disregard the old drives, and add the new drives. Then (hopefully) I can issue a rebuild command and theoretically the raid will salvage itself with the two remaining drives.

I did read "man dmraid", but I wanted to be absolutely sure the commands I issue will accomplish what I am trying to do. Unfortunately I was unable to find any good docs online regarding how to add/remove drives from raid metadata using dmraid.

My proposed command set will look like:

root@service1 ~ # -> dmraid --remove pdc_fbdbhaai-0 /dev/sda1
root@service1 ~ # -> dmraid --remove pdc_fbdbhaai-1 /dev/sda2

With old drives removed, time to add new ones:

root@service1 ~ # -> dmraid -R pdc_fbdbhaai-0 /dev/sdc
root@service1 ~ # -> dmraid -R pdc_fbdbhaai-1 /dev/sdd

Anyone with experience in working with dmraid able to confirm these steps? Or should I go another route?

Mike Purcell
  • 1,688
  • 7
  • 30
  • 53
  • 1
    If your setup is that badly broken, then you might want to do is make backup images of the good drives before you try anything. – Zoredache Jul 24 '13 at 23:02

1 Answers1

1

Holy crap. Was able to figure it out. After some more research I stumbled across a few posts that indicated that dmraid is no longer actively maintained, and to use mdadm instead. So I started working with mdadm and figured out the commands to get the raid rebuilding and hopefully back online again. Here's what I did:

According to mdadm docs, issuing an assemble command will create the logical volume from two physical drives, IF they have superblock information, so lets add the two drives that didn't fail:

$ -> mdadm --assemble /dev/md0 /dev/sdb /dev/sde
mdadm: /dev/md0 assembled from 2 drives - need all 4 to start it (use --run to insist).

Easy enough, lets add the two new drives to the logical volume:

$ -> mdadm --add /dev/md0 /dev/sdc /dev/sdd
mdadm: cannot get array info for /dev/md0

At this point I did some googling around to find out what this message indicates. There were a myriad of different situations which may cause the given response, so I mulled over the assemble command again. The key with re-examining the assemble command the second time around was the message given; "use --run to insist". Figured, why not lets give it a shot:

$ -> mdadm --run /dev/md0
mdadm: started /dev/md0

Ok, good so far, now can I add the two new drives?

$ -> mdadm --add /dev/md0 /dev/sdc
mdadm: added /dev/sdc

$ -> mdadm --add /dev/md0 /dev/sdd
mdadm: added /dev/sdd

Whoa cool! Lets check the status:

$ -> cat /prod/mdstat
Personalities : [raid10]
md0 : active raid10 sdd[4](S) sdc[5] sdb[1] sde[2]
  976772992 blocks 64K chunks 2 near-copies [4/2] [_UU_]
  [>....................]  recovery =  2.2% (10762688/488386496) finish=131.5min speed=60498K/sec

unused devices: <none>

Hell yes! According to the status, the raid is rebuilding from the two drives which didn't crash and burn.

-- EDIT --

To make sure the raid configuration persists between reboot/shutdowns, I had to do the following:

$ -> mdadm --detail --scan >> /etc/mdadm.conf
Mike Purcell
  • 1,688
  • 7
  • 30
  • 53