3

2012-03-31 Debian Wheezy daily build in VirtualBox 4.1.2, 6 disk devices.

My steps to reproduce so far:

  1. Setup one partition, using the entire disk, as a physical volume for RAID, per disk
  2. Setup a single RAID6 mdraid array out of all of those
  3. Use the resulting md0 as the only physical volume for the volume group
  4. Setup your logical volumes, filesystems and mount points as you wish
  5. Install your system

Both / and /boot will be in this stack. I've chosen EXT4 as my filesystem for this setup.

I can get as far as GRUB2 rescue console, which can see the mdraid, the volume group and the LVM logical volumes (all named appropriately on all levels) on it, but I cannot ls the filesystem contents of any of those and I cannot boot from them.

As far as I can see from the documentation the version of GRUB2 shipped there should handle all of this gracefully.

http://packages.debian.org/wheezy/grub-pc (1.99-17 at the time of writing.)

It is loading the ext2, raid, raid6rec, dosmbr (this one is in the list of modules once per disk) and lvm modules according to the generated grub.cfg file. Also it is defining the list of modules to be loaded twice in the generated grub.cfg file and according to quick Googling around this seems to be the norm and OK for GRUB2.

How to get further by getting GRUB2 to actually be able to read the content of the filesystems and boot the system?

What am I wrong about in my assumptions of functionality here?

EDIT (2012-04-01) My generated grub.cfg:

http://pastie.org/3708436

It seems it first makes my /usr logical volume the root and that might be source of the failure? A grub-mkconfig bug? Or is it supposed to get access to stuff from /usr before / and /boot? /boot is on / for me - no separate boot logical volume.

aef
  • 1,705
  • 4
  • 24
  • 41
Rotonen
  • 71
  • 1
  • 9
  • Do you get any error messages? – Allen Mar 31 '12 at 21:06
  • Negative. When stating ls from the GRUB2 rescue mode, it lists, amongst other things, (raid6), the LVM volume group and (raid6-root), the logical volume containing / and /boot for me. When trying to ls (raid6-root) it outputs a newline. So it does see the mdraid, it does manage to see the LVM setup, but it just cannot read the filesystem and I have no idea why. – Rotonen Mar 31 '12 at 21:51

3 Answers3

4

After all, it was a Grub2 bug/issue with a degraded software raid array.

Grub2 1.9x has issues with booting from a degraded array. Booting in rescue mode onto the system and letting the raid recover itself has fixed the issue for the original setup in question.

Incidentally the setup works (at the moment: 2012-06-26) straight out of the box on Fedora 17, Arch (stable) and Gentoo (stable + latest grub2 bzr via Portage): Grub2 2.0+ has fixed the issue. With the Wheezy freeze hitting soon, I'm thoroughly hoping for the issue to be resolved via either jumping to 2.0 or backporting the fix.

For me this still affects Debian 6, 7; Ubuntu 8.04, 10.04, 12.04.

Letting the raid sync in a single user mode recovery setup is an acceptable workaround for a home system, but having a potential extra hitch for rebooting a production server (even a small office file server) makes one think twice.

mgorven
  • 30,036
  • 7
  • 76
  • 121
Rotonen
  • 71
  • 1
  • 9
  • I have also reported this to Debian now. Let's see and hope the backport is trivial and propagates far (I'd also like to see this fixed on Ubuntu 8.04, 10.04, 12.04 and Debian 6). – Rotonen Jun 27 '12 at 08:37
1

Very good post, thanks a lot this helped me out quite a bit for installing an LVM - over - RAID on Debian Wheezy. Here are the steps I took to overcome the problem.

Update Grub2 to V2+

Add these lines to /etc/apt/sources.list

deb http://http.debian.net/debian unstable main
deb-src http://http.debian.net/debian unstable main

apt-get update

apt-get install grub2

0

Perhaps you have made the single partition too large and did not leave space enough for GRUB2 installation and it has overwritten parts of the LVM space. Something of a longshot. Try your steps to recreate your problem except this time use a single disk (skip the RAID), create the single partition exactly as you did before and then the rest of it. If I am right, then you should have the same behavior.

UPDATE: So, this answer is wrong. I was looking through the GRUB2 manual and found this section which states:

If, instead, you only get a rescue shell, this usually means that GRUB failed to load the ‘normal’ module for some reason. It may be possible to work around this temporarily: for instance, if the reason for the failure is that ‘prefix’ is wrong (perhaps it refers to the wrong device, or perhaps the path to /boot/grub was not correctly made relative to the device), then you can correct this and enter normal mode manually:

 # Inspect the current prefix (and other preset variables):
 set
 # Find out which devices are available:
 ls
 # Set to the correct value, which might be something like this:
 set prefix=(hd0,1)/grub
 set root=(hd0,1)
 insmod normal
 normal

However, any problem that leaves you in the rescue shell probably means that GRUB was not correctly installed. It may be more useful to try to reinstall it properly using grub-install device (see Invoking grub-install). When doing this, there are a few things to remember:

  1. Drive ordering in your operating system may not be the same as the boot drive ordering used by your firmware. Do not assume that your first hard drive (e.g. ‘/dev/sda’) is the one that your firmware will boot from. device.map (see Device map) can be used to override this, but it is usually better to use UUIDs or file system labels and avoid depending on drive ordering entirely.
  2. At least on BIOS systems, if you tell grub-install to install GRUB to a partition but GRUB has already been installed in the master boot record, then the GRUB installation in the partition will be ignored.
  3. If possible, it is generally best to avoid installing GRUB to a partition (unless it is a special partition for the use of GRUB alone, such as the BIOS Boot Partition used on GPT). Doing this means that GRUB may stop being able to read its core image due to a file system moving blocks around, such as while defragmenting, running checks, or even during normal operation. Installing to the whole disk device is normally more robust.
  4. Check that GRUB actually knows how to read from the device and file system containing /boot/grub. It will not be able to read from encrypted devices, nor from file systems for which support has not yet been added to GRUB.
Allen
  • 1,315
  • 7
  • 12
  • Negative. The system is still bootable and operates as expected when you boot into it from the debian-installer rescue mode (or appropriately chroot into it from ~any mdraid + LVM supporting livecd for that matter). A single disk LVM setup works. An mdraid RAID1 + LVM setup works too. Managing to boot from mdraid RAID6 + LVM is the issue at hand here. I am left dumbfounded by how to debug this further - memory snapshots of the RAM of the virtual machine do not seem viable since I do not know what exactly I'm hunting for here in the status of the GRUB2 binary either. – Rotonen Mar 31 '12 at 22:55
  • I was reading the [Arch Linux GRUB2 document](https://wiki.archlinux.org/index.php/GRUB2#LVM) where they recommend that the line where root is specified be `set root=(lvm_group_name-lvm_logical_boot_partition_name)`. I see that in your config you have `set root='(raid6-usr)'`. As you noted in your question, it seems to think that /usr is your root. Try changing that, see if it helps. Anyway, shouldn't root be set to wherever /boot files are? Unless your /boot files are on (VG raid6, LV root) that is. – Allen Apr 01 '12 at 15:06
  • Also, compare the grub config from the working instances to see what diffs there are. – Allen Apr 01 '12 at 15:08
  • My /boot is indeed on VG raid6 LV root. It does initially set the /usr partition as root, but then before actually trying to load the kernel, it uses LV root again for the root. I'll try to simply edit the grub.cfg by hand and if it works, I'll consider filing a bug against the generation script. – Rotonen Apr 03 '12 at 19:06