7

A server set up with Debian 6.0/squeeze. During the squeeze installation, I configured the two 500GB SATA disks (/dev/sda and /dev/sdb) as a RAID1 (managed with mdadm). The RAID keeps a 500 GB LVM volume group (vg0). In the volume group, there's a single logical volume (lv0). vg0-lv0 is formatted with extfs3 and mounted as root partition (no dedicated /boot partition). The system boots using GRUB2.

In normal use, the systems boots fine.

Also, when I tried and removed the second SATA drive (/dev/sdb) after a shutdown, the system came up without problem, and after reconnecting the drive, I was able to --re-add /dev/sdb1 to the RAID array.

But: After removing the first SATA drive (/dev/sda), the system won't boot any more! A GRUB welcome message shows up for a second, then the system reboots.

I tried to install GRUB2 manually on /dev/sdb ("grub-install /dev/sdb"), but that doesn't help.

Appearently squeeze fails to set up GRUB2 to launch from the second disk when the first disk is removed, which seems to be quite an essential feature when running this kind of Software RAID1, isn't it?

At the moment, I'm lost whether this is a problem with GRUB2, with LVM or with the RAID setup. Any hints?

flight
  • 384
  • 3
  • 14
  • Instead of `grub-install`, might want to try `dpkg-reconfigure grub-pc`. https://wiki.debian.org/DebianInstaller/SoftwareRaidRoot says of installing to multiple drives 'your system will still boot correctly even if you reorder your drives'. In theory. I also want to cross-reference to my question: https://serverfault.com/questions/869559/grub-hangs-before-menu-after-a-raid-upgrade-how-to-debug – Cedric Knight Aug 23 '17 at 06:46

4 Answers4

4

You need to install GRUB to the MBR of both drives, and you need to do it in a way that GRUB considers each disk to be the first disk in the system.

GRUB uses its own enumeration for disks, which is abstracted from what the Linux kernel presents. You can change which device it thinks is the first disk (hd0), by using a "device" line in the grub shell, like so:

device (hd0) /dev/sdb

This tells grub that, for all subsequent commands, treat /dev/sdb as the disk hd0. From here you can complete the installation manually:

device (hd0) /dev/sdb
root (hd0,0)
setup (hd0)

This sets up GRUB on the first partition of the disk it considers to be hd0, which you've just set as /dev/sdb.

I do the same for both /dev/sda and /dev/sdb, just to be sure.

Edited to add: I always found the Gentoo Wiki handy, until I did this often enough to commit it to memory.

Cedric Knight
  • 1,098
  • 6
  • 20
Daniel Lawson
  • 5,426
  • 21
  • 27
  • 1
    You're talking GRUB1. GRUB2 doesn't have a a `setup` command in the shell. – flight Mar 09 '11 at 17:10
  • You're probably right there. Carry on :) – Daniel Lawson Mar 09 '11 at 20:30
  • A bit more explanation here would help. Is the implication that on boot without sda, BIOS presents the second drive as ata0 while GRUB legacy tries to load menu.lst etc from hd1? Is there any equivalent caveat for grub2 (now confusingly just referred to as grub)? – Cedric Knight Aug 23 '17 at 06:32
2

Have you considered installing a third drive to serve as just the boot drive? I have seen problems too with raid 1 lvm setups (on CentOS) not being able to boot the second drive. I think the problem stems from grub not being able to handle native lvm partitions, although I'm not entirely sure.

Anyway, that's my answer: install a third small drive solely for the purpose of booting the system. Heck, I bet you could even get clever and do that with some sort of little flash or ssd device.

Phil Hollenback
  • 14,647
  • 4
  • 34
  • 51
  • My current solution is to boot from a USB stick with GRUB2 on it (and a /boot filesystem, which is not exactly necessary, I think). – flight Mar 09 '11 at 17:13
  • 1
    Still I refrain from accepting this answer, since this ought to work without a third drive. From what I can tell, this has to be a bug in GRUB2 (in Debian Squeeze). – flight Mar 09 '11 at 17:14
  • Sure, that's a reasonable assumption. I just wanted to point out that I've seen weird lvm/raid/grub issues before, and solved it via a third drive via beating my head against weird annoying boot-time bugs. – Phil Hollenback Mar 09 '11 at 18:02
1

Grub should be able to recognize RAID1 setups and install to all slave disks when told to install to the MD device.

Simon Richter
  • 3,209
  • 17
  • 17
  • 1
    That's what I thought as well ;-), and yes, the Debconf frontend to grub-pc suggested an install in /dev/sda as well as /dev/sdb (and /dev/dm-0, where it failed to install subsequently). Still, it wouldn't boot with the second disk only. – flight Mar 09 '11 at 17:12
  • I dimly remember that one had to point it at the MD device rather than at the components, but I may be confusing that with LILO here. – Simon Richter Mar 09 '11 at 19:00
  • This appears to be how `grub-install` works with GRUB 1.99 and 2.02. In whatever way sda+sdb RAID1 holds your boot partition, the core is likely to be referenced by UUID (check my linked question to see if it is). So if you `grub-install /dev/sda; grub-install /dev/sdb`, it doesn't matter if you remove one of those drives: so long as the BIOS can load MBR from one of them, it will find the RAID UUID and LV by searching. – Cedric Knight Aug 27 '17 at 19:05
0

Indeed it should work. This appears to be how grub-install works with GRUB 1.99 and 2.02.

In whatever way sda+sdb RAID1 holds your boot partition, the core is likely to be referenced by UUID. Check my linked question to see if it is. In other words if grub-install --debug shows something like --prefix=(md0)/boot/grub/ you might get a problem if another RAID array is found first, which would probably give the grub rescue> prompt, rather than the crash observed here. If it uses --prefix=(mduuid/, it should find it.

So if you grub-install /dev/sda; grub-install /dev/sdb, it doesn't matter if you remove one of those drives: so long as the BIOS can load MBR from one of them, it will find the RAID UUID and LV by searching. The MBR, however, is not mirrored. So point the installer at all components in turn.

All that's in theory....


My interest in this ancient question is that the Welcome to GRUB! banner was shown and then the server rebooted, because I have similar symptoms, possibly caused by the BIOS being unable to read a 4K/sector drive. I don't know if the questioner ever found a solution.

The logic here is that the welcome message is found in kernel.img, so at least part of the core must be loading. However according to the (possibly outdated) manual, the second sector read loads all the rest of the image into memory using a blocklist. If the blocklist is damaged (or perhaps the LBA offset is computed wrongly for sector size or other reasons), then a crash, reboot or hang might occur.

Cedric Knight
  • 1,098
  • 6
  • 20