0

I'm using Dell Precision T3610 towers as Linux servers for various applications. These have Intel "Rapid Storage Technology" controllers onboard, with a rudimentary RAID functionality, commonly referred to as FakeRAID.

My ultimate goal here is to have backup drives I can just pop in if a drive fails in an existing server. I made a copy of the live server's drive (it only has 1) using dd, which worked fine. Then I put 2 blank disks in a new T3610, configured the onboard RAID controller for raid1, and booted from a CD. I created the partitions on the raid array (md126) to be roughly the same size as the source drive, and connected it via USB. I then used dd to copy the data from the bare drive into the array's partitions. Once completed, I mounted the array, chrooted into it, and ran grub2-mkconfig and grub2-install.

Note: grub2-install complained about having no BIOS boot partition (these disks use gpt partition tables), but the source disk doesn't have one either, and definitely had GRUB installed. grub2-install --force worked fine.

I pulled the USB-connected source disk, and the liveCD, and rebooted the system. It booted fine. Its drives looked like:

# ls /dev/?d*
/dev/cdrom  /dev/md125  /dev/md126  /dev/md126p1  /dev/md126p2  /dev/md127  /dev/sda  /dev/sdb

/dev/fd:
0  1  2  3

I think shut down the system (cleanly), pulled the drives, and placed them in another T3610 (all hardware identical). First I loaded the onboard RAID config utility, which automatically saw the RAID1 array (complete with the name I assigned it at create time, "ARRAY0"). Thinking this was a good sign, I exited without making any changes. The system booted.

However, Linux did not see the RAID array. It appears to be booting off of only 1 drive. The drives now look like this:

# ls /dev/?d*
/dev/cdrom  /dev/md125  /dev/sda  /dev/sda1  /dev/sda2  /dev/sdb

/dev/fd:
0  1  2  3

md125 is the imsm container, which shows the same as it did before.

I can provide pictures showing the two controller config screens seeing the array, or any command output. I'm more-or-less confident I've set this up right (as "right" as it can be when using fakeraid), but I'm running into some quirk or shortcoming involving moving an array from one system to another.

Any idea why this second system won't see the RAID array? Even though the controller does?

Thank you all.

Tero Kilkanen
  • 34,499
  • 3
  • 38
  • 58
wes
  • 101
  • Weird... As a first test: what happens if you put all back in the first system and boot it? Also did you check for the same bios/firmware revision in both systems?! – matteo nunziati Aug 18 '20 at 13:33
  • When I put the 2 drives back in the system in which I created the RAID array, the drives appear as they did in the second system. Like this: /dev/cdrom /dev/md127 /dev/sda /dev/sda1 /dev/sda2 /dev/sdb. The firmware revisions are indeed identical. – wes Aug 18 '20 at 22:01
  • Discovery! If I boot the original system from the liveCD I used to add the partitions, it sees the array again. I'm going to let it finish rebuilding and then boot to the drives again to see what happens. – wes Aug 18 '20 at 22:07
  • 2
    I would use Linux RAID system instead of vendor-specific RAID. You are locking yourself to a vendor-specific solution without any real benefits, only limiting your upgrade choices. – Tero Kilkanen Aug 18 '20 at 22:08
  • I also discovered the UUIDs in /etc/mdadm.conf do not match the output of mdadm --detail --scan when booted from the liveCD. I've updated them. I will still wait for the resync to finish before testing. @tero-kilkanen The benefit to the hardware RAID controller is that it syncs the whole drives, including the boot sector. AFAIK this is not possible with software RAID. This way, when a drive fails, I can simply replace it. If I was using software RAID, I would have to replace it and copy over the boot sector or run grub[2]-install on it. – wes Aug 18 '20 at 22:26
  • It's not a hardware RAID controller; it's fakeRAID. The hardware doesn't do anything but sit there and look pretty; the OS "syncs the whole drives" based on whatever information IMSM provides it. You're trading an extra step in the case of hardware failure, for reduced reliability and manageability all the time. I personally think that's not a good tradeoff. And in modern Linux the `/boot` partition can use software RAID anyway, but you still would need to copy the boot sector or the EFI system partition. – Michael Hampton Aug 18 '20 at 22:29
  • @MichaelHampton there is at least some additional functionality present. For one thing, the BIOS offers an option to boot to the RAID array rather than an individual drive. – wes Aug 19 '20 at 15:24
  • @wes then it seems a misconfiguration between the live and the installed system – matteo nunziati Aug 20 '20 at 07:52
  • @matteonunziati I think you're right. I may have been chasing the wrong problem, assuming it's related to the controller. It seems to me the issue is simply that the OS was originally installed on a bare drive - thus its initrd was not configured for RAID - thus I need to figure out how to confirm that and fix it. – wes Aug 20 '20 at 20:55
  • I still haven't figured out what the exact missing piece is, but I believe I'm narrowing it down to some degree. I believe the issue is that the kernel is not starting the array during the initrd phase of bootup. I am not sure why. This guy seems to have had a similar experience: https://unix.stackexchange.com/questions/266119/md-raid-not-mounted-by-dracut but in my case, I cannot start the array even at a dracut shell prompt (it says one of the disks is "busy" though I can't think of any reason it should say that). I tried the suggested additional parameter (rd.md.uuid) but no luck yet. – wes Aug 25 '20 at 09:36
  • I tried something new: I did a fresh install of centos onto the RAID array. the newly installed OS booted fine. so then I copied the data from the drive I'm trying to convert from bare to RAID, and that boots too. I then ran grub2-mkconfig and rebooted. it broke! so there's some magic in grub that makes this work. here are the two versions of grub.cfg: https://dpaste.com/65DFE2CCJ.txt https://dpaste.com/2CULKBLH6.txt – wes Aug 27 '20 at 08:37
  • After much thrashing and gnashing, I finally got it to work. I am not sure exactly what combination of steps actually did it. Ultimately it came down to twiddling grub.cfg. I am not the only one struggling with this. https://bugzilla.redhat.com/show_bug.cgi?id=1015204 https://bugzilla.redhat.com/show_bug.cgi?id=1201962 – wes Sep 01 '20 at 05:56

0 Answers0