5

A fresh install of Ubuntu Server 13.10 (x64) is having problems booting from its root volume located in md+lvm. I've kludged a solution for now, but I'd like to understand more about what's going on and what better solutions there might be.

Since the objective of this machine is to experiment with Xen (to get a better understanding of commercial VM hosting), the machine is assembled from parts I have to hand, specifically: a Q6600 + Asus P5QL Pro, 1 TB and 500 GB SATA discs (though the 500 GB disc is still in use elsewhere, it will be added later.)

The 1TB disc has three partitions: sda1 is an equal size to sdb1 on the 500 GB disc, sda2 is swap, and the balance in sda3. md0 is a RAID1 volume[1] made up of sda1+sdb1, and is the one PV available to LVM.

Ubuntu is installed in two LVs (dom0_root and dom0_homes) in this VG (vg_mir), and /boot lives in dom0_root.

The specific problem manifests with the following messages, immediately after the discs have initialised:

kernel: [    3.003506] md: bind<sda1>
kernel: [    3.007705] md/raid1:md0: active with 1 out of 1 mirrors
kernel: [    3.007768] md0: detected capacity change from 0 to 499972440064
kernel: [    3.047284]  md0: unknown partition table
kernel: [    3.124709] device-mapper: table: 252:0: linear: dm-linear: Device lookup failed
kernel: [    3.124759] device-mapper: ioctl: error adding target to table
kernel: [    3.125156] device-mapper: table: 252:1: linear: dm-linear: Device lookup failed
kernel: [    3.125196] device-mapper: ioctl: error adding target to table

After a pause, it gives up and drops to an initramfs shell. Issuing the command lvm vgchange -ay successfully initialises LVM, /dev/mapper is populated as expected, and the system boots normally after a ^D.

By making a copy of /lib/udev/rules.d/85-lvm2.rules in /etc and inserting a sleep 1 as shown here:

SUBSYSTEM=="block", ACTION=="add|change", ENV{ID_FS_TYPE}=="lvm*|LVM*", \
    RUN+="watershed sh -c 'sleep 1; /sbin/lvm vgscan; /sbin/lvm vgchange -a y'"

(and rebuilding the initramfs) the system now boots unassisted, but this is a rather awful solution. I've tried fiddling with rootwait=, lvmwait= and scsi_mod.scan=sync kernel parameters as discussed in various bug trackers and blog posts, but none of what I tried worked. A few pages suggest that evms is a problem, but that doesn't appear to be installed. Others suggest timeouts on irrelevant block devices, and I even disabled the DVD drive.

It appears that there is some sort of race condition between md and lvm, and lvm is being invoked by udev before md0 is ready. Those kernel arguments seem to insert delay after lvm is run, therefore no amount of waiting helps because the LVs will never be ready because vgchange has already been run (and failed).

That is as far as I got in drilling into the problem. Can anybody suggest a better solution, or suggest how to drill in to find more of the problem?

[1] since sdb1 is, at this moment, missing, this raid volume is manually configured to be RAID1 with 1 device because Ubuntu doesn't like booting on a degraded volume.

strix
  • 109
  • 5
  • Thank you for sharing this. I had two identical machines with `/` on LVM on mdadm software RAID. One of the machines had a missing RAID1 device. After upgrading the kernel from 3.2 to 3.13 (Ubuntu 12.04 raring HWE stack to trusty HWE stack), the machine with the degraded array gave the same problem as you describe above, the other machine booted just fine. Adding `sleep 1' to the udev rule worked like a charm! – ph0t0nix Feb 23 '15 at 16:02

2 Answers2

3

I just had the same problem, with apparently the same type of hardware and a fresh 13.10 x64 install. Being less experienced, I spent a couple of days pursuing possibilities of missing kernel modules, etc., but after reading your report I do find that vgchange -ay at the initramfs busybox prompt does render the system bootable. I have not yet tried the 1 sec delay workaround you posted (I will), but I also note the following Debian bug report that may be related:

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=633024

Gene Stark
  • 31
  • 2
  • Yes, it does indeed look like that debian bug record is related. Sadly, it doesn't look like there's been any conclusion to it. Pity the only viable solution is to hack the initramfs. I haven't done anything more on the matter since I originally asked this question (too many other distractions), but when I get a moment I'll try with 14.04LTS and see if it's any different. – strix Aug 25 '14 at 15:46
2

I had the same problem and after searching I found out that this solution worked for me. I just had to rename all /dev/md/* devices to /dev/md* devices in /etc/mdadm/mdadm.conf and run update-initramfs -u to update the initramfs.

Mitar
  • 507
  • 4
  • 18