0

tl;dr: I fat-fingered grub-install, then "correctly" re-issued it targeting the /boot filesystem on /dev/sda1 but it isn't reading grub/grub.conf unless I expressly tell it where to look using the grub prompt tools. How do I fix that?


I have a critical CentOS 5 system with multiple hard drives that aren't hot-swappable. That's a bad idea, by the way.

The first drive contains /boot, then two mirror mdraid partitions for the OS and data. The second drive contains just the two mdraid partitions.

The first drive is very slowly dying, so I added a third drive to prepare for the inevitable. I copied the partition layout of the first drive, added it to the mdraid mirror, then used dd to clone sda1 to sdc1.

I had a hardware maintenance window last night and needed to reboot the machine anyway, so I figured I'd take the chance to switch sdc to the boot drive. As I only copied the partition layout and the first partition, not the entire drive, I figured that sdc wasn't bootable. So after adjusting fstab, I made sdc1 bootable and used grub-install to ensure that grub could take care of things.

Only I fat-fingered the command and typed grub-install /dev/sda.

It gave me a warning about not finding the drive in the BIOS drive list, so I assumed that it didn't do anything harmful. I re-issued the command targetting /dev/sda1 instead, but got the same error. Hmm. Oh well, it probably didn't do anything, right? Yeah. No.

When the system didn't come back up after reboot (printing GRUB GRUB GRUB over and over on the console), I knew I was screwed. Apparently what I did is irritatingly common.

I booted the machine into a live CD, used dd to nuke the MBRs on both sda and sdc, mounted sda1's copy of /boot, issued the correct command (which involves asking it to probe the drive list and giving an actual filesystem location), and rebooted. What came up was the grub shell. I was able to issue root (hd0,0) and configfile grub/grub.conf to get into the boot menu, but I would have assumed that if I'd issued the command correctly to begin with then it would have seen the menu immediately.

So, my critical system is running fine. I'm only going to be able to reboot it once in the near future, so I'd like to get this taken care of correctly.

So, my questions:

  1. Is the current booting-into-grub-but-not-seeing-any-configuration fixable without re-running grub-install? I'm terrified of the thing now.
  2. If I have to invoke grub-install again, what should be the correct way? I used grub-install --recheck --root-directory=/path/to/sda1/boot /dev/sda1 to get it into its current state.
Charles
  • 1,194
  • 2
  • 12
  • 22

1 Answers1

1

I've got similar configuration: usually I'm creating /boot on mirrored mdraid partition and then installing grub on MBR of every single drive so server can boot in case of failure of any drive, the rest (i.e. everything except MBR program stage) is replicated with mdraid anyway,

Just run

grub-install --recheck /dev/sda
grub-install --recheck /dev/sdb

You need to install grub on MBR, not the first partition. It will pickup stage 1.5 files and boot the kernel which will switch to the root of mdraid partitions etc.

Here is how my configuration looks like, actually there's nothing special to be done, yes, it's Centos 6, but it's the same thing:

device map

[root@main ~]# cat /boot/grub/device.map 
# this device map was generated by anaconda
(hd0)     /dev/sda
(hd1)     /dev/sdb
(hd2)     /dev/sdc
(hd3)     /dev/sdd

menu.lst

[root@main ~]# cat /etc/grub.conf 
# grub.conf generated by anaconda
#
# Note that you do not have to rerun grub after making changes to this file
# NOTICE:  You have a /boot partition.  This means that
#          all kernel and initrd paths are relative to /boot/, eg.
#          root (hd0,0)
#          kernel /vmlinuz-version ro root=/dev/mapper/vg_main-lv_main_root
#          initrd /initrd-[generic-]version.img
#boot=/dev/md0
default=0
timeout=10
splashimage=(hd0,0)/grub/splash.xpm.gz
title CentOS (2.6.32-358.14.1.el6.x86_64)
        root (hd0,0)
        kernel /vmlinuz-2.6.32-358.14.1.el6.x86_64 ro root=/dev/mapper/vg_main-lv_main_root rd_NO_LUKS LANG=en_US.UTF-8 rd_MD_UUID=7d8cff6b:744c0786:023226e9:536570ed rd_LVM_LV=vg_main/lv_main_root rd_LVM_LV=vg_main/lv_main_swap SYSFONT=latarcyrheb-sun16 crashkernel=auto  KEYBOARDTYPE=pc KEYTABLE=us rd_NO_DM quiet
        initrd /initramfs-2.6.32-358.14.1.el6.x86_64.img
GioMac
  • 4,444
  • 3
  • 24
  • 41