2

I have RAID1 md0 for /boot consisting of 4 partitions (sda2, sdb2, sdc2, sdd2). I'm using GPT on 2TB HDDs, so first partitions on each disk (sda1, ...) are 1-megabyte bios_grub partitions.

I also have RAID10 md1 for LVM (containing /) and RAID0 md2 for swap, both built from partitions on all 4 drives.

mdadm persistant superblock version is 0.9.

Grub was installed with something like grub-install --modules="mdraid lvm" '(hd0)' on all 4 drives (hd0, hd1, hd2, hd3).

The problem.

On reboot, grub2 fails with "error: no such disk" and displays "grub rescue>" prompt. ls command only shows 4 disks and their partitions - but no md* devices. Trying insmod normal again gives "error: no such disk.". Examining 'root' and 'prefix' shows something like '(md0)/grub', which is correct. Doing set prefix=(hd0,2)/grub and then insmod normal allows to boot normally.

The question.

Why grub2 doesn't see md0?

So far the only solution I can see is to manually build grub image with hard-coded working prefix (grub-mkimage --prefix='(hd0,2)/grub'), then use grub-setup to write the image to each disk. However, this solution is ugly, and error-prone (to avoid errors, will need to investigate how grub-install calls these two commands). I will appreciate better solutions. (Note: this is a remote server, so cannot really do 'reboot debugging'.)

chronos
  • 568
  • 5
  • 13
  • After a while (and a few forgotten manual grub installs, as described above, and also an extra HDD which changed drive letters), I've settled on booting from just a single drive's `/boot` partition - no RAID. Works beautifully. – chronos Jun 19 '12 at 08:50

1 Answers1

3

RAID is still one of the gray areas of bootloaders IMHO.

I recently built a RAID1 system and after a few hours trying to get LILO/GRUB/GRUB2 to detect my raid i gave up and just told it to use the first partition of the first HDD detected and made sure that if a HDD failed the next HDD was already lined up with the correct MBR/bootloader ect...

So what it does is it boots, grabs the kernel and initfs off the first HDD (no raid) and then boots the kernel and leaves all the RAID stuff to the kernel. Because GRUB/LILO do not physically write to the drives this wont damage them.

Basically i just ignored RAID all together for the bootloader stage.

the kernel needs to re-assemble the raid arrays even if grub does it first. there's no real reason for grub to need to be raid aware for a RAID1 system unless a drive fails during boot.

P.S. You dont need to raid0 SWAP, this ability is already in the kernel. Just set the priority for both swap devices to 1 in FSTAB

/dev/sda2         none                    swap  sw,pri=1        0 0
/dev/sdb2         none                    swap  sw,pri=1        0 0
ect....

And if a single swap drive fails during normal operations there's a very good chance your system will fail. (you can raid1 swap, just not from fstab like above)

Silverfire
  • 780
  • 4
  • 14
  • Thanks for the swap hint, will try that. As for GRUB2: I guess chainloading you describe is a little messier (need to manually prepare for that) than just have a hard-coded prefix for the "first HDD" - there will always be some "first HDD" on the system (assuming they all have GRUB2 installed to the bios_grub). – chronos Sep 27 '11 at 10:38
  • I wonder if removing devices from md0 and installing grub2 onto them individually (one-by-one) will create/write correct grub images in bios_grub. Otherwise grub2 detects that it is being installed onto a raid component partition, and writes 'md0' into image. – chronos Sep 27 '11 at 10:40
  • I tried with a raid10 boot and raid10 lvm root, and it would boot fine until any disk was removed. With 2 disks removed (which should be degraded), grub rescue> would say "error: no such disk.". With 1 disk removed, it would actually get past that, and get to an eternal black screen. But changing /boot to raid1 and leaving the raid10 lvm root alone, it now works with 2 disks removed. (Ubuntu 12.04, mdadm 0.90 metadata on /boot, grub2, GPT) – Peter May 09 '12 at 08:16