A fresh install of Ubuntu Server 13.10 (x64) is having problems booting from its root volume located in md+lvm. I've kludged a solution for now, but I'd like to understand more about what's going on and what better solutions there might be.
Since the objective of this machine is to experiment with Xen (to get a better understanding of commercial VM hosting), the machine is assembled from parts I have to hand, specifically: a Q6600 + Asus P5QL Pro, 1 TB and 500 GB SATA discs (though the 500 GB disc is still in use elsewhere, it will be added later.)
The 1TB disc has three partitions: sda1 is an equal size to sdb1 on the 500 GB disc, sda2 is swap, and the balance in sda3. md0 is a RAID1 volume[1] made up of sda1+sdb1, and is the one PV available to LVM.
Ubuntu is installed in two LVs (dom0_root and dom0_homes) in this VG (vg_mir), and /boot lives in dom0_root.
The specific problem manifests with the following messages, immediately after the discs have initialised:
kernel: [ 3.003506] md: bind<sda1>
kernel: [ 3.007705] md/raid1:md0: active with 1 out of 1 mirrors
kernel: [ 3.007768] md0: detected capacity change from 0 to 499972440064
kernel: [ 3.047284] md0: unknown partition table
kernel: [ 3.124709] device-mapper: table: 252:0: linear: dm-linear: Device lookup failed
kernel: [ 3.124759] device-mapper: ioctl: error adding target to table
kernel: [ 3.125156] device-mapper: table: 252:1: linear: dm-linear: Device lookup failed
kernel: [ 3.125196] device-mapper: ioctl: error adding target to table
After a pause, it gives up and drops to an initramfs shell. Issuing the command lvm vgchange -ay
successfully initialises LVM, /dev/mapper is populated as expected, and the system boots normally after a ^D.
By making a copy of /lib/udev/rules.d/85-lvm2.rules in /etc and inserting a sleep 1
as shown here:
SUBSYSTEM=="block", ACTION=="add|change", ENV{ID_FS_TYPE}=="lvm*|LVM*", \
RUN+="watershed sh -c 'sleep 1; /sbin/lvm vgscan; /sbin/lvm vgchange -a y'"
(and rebuilding the initramfs) the system now boots unassisted, but this is a rather awful solution. I've tried fiddling with rootwait=
, lvmwait=
and scsi_mod.scan=sync
kernel parameters as discussed in various bug trackers and blog posts, but none of what I tried worked. A few pages suggest that evms is a problem, but that doesn't appear to be installed. Others suggest timeouts on irrelevant block devices, and I even disabled the DVD drive.
It appears that there is some sort of race condition between md and lvm, and lvm is being invoked by udev before md0 is ready. Those kernel arguments seem to insert delay after lvm is run, therefore no amount of waiting helps because the LVs will never be ready because vgchange
has already been run (and failed).
That is as far as I got in drilling into the problem. Can anybody suggest a better solution, or suggest how to drill in to find more of the problem?
[1] since sdb1 is, at this moment, missing, this raid volume is manually configured to be RAID1 with 1 device because Ubuntu doesn't like booting on a degraded volume.