9

Basic working system details:

I used the Ubuntu 12.04 server CD to install a server.

I have 4 disks. On all disks I did the following, similar to this howto :

  • created a 2GB swap partition
  • created a 256 GB /boot partition
  • created a 64 GB RAID10 partition (for root)
  • created a big RAID10 partition taking the rest of the space

I formatted the boot as ext3. I set up RAID10 on the root and big partitions. I formatted the root one ext4. I created a logical volume on the big one, and formatted it ext4.

The resulting system works fine, and boots fine.

Problem details:

Then I decided to document a failure procedure. As the first step, I decided I would reinstall grub.

# grub-install /dev/sda
warn: This GPT partition label has no BIOS Boot Partition; embedding won't be possible!.
error: Embedding is not possible. GRUB can only be installed in this setup by using blocklists. However, blocklists are UNRELIABLE and their use is discouraged..
# grub-install /dev/sdb
warn: This GPT partition label has no BIOS Boot Partition; embedding won't be possible!.
error: Embedding is not possible. GRUB can only be installed in this setup by using blocklists. However, blocklists are UNRELIABLE and their use is discouraged..

So it looks like it failed, but also seems like it gave up and didn't make changes. So I rebooted. The boot failed. It just hangs with a black screen with a blinking cursor about 4 lines down. If I boot holding down "Shift", I get the word "GRUB" to the left of the cursor, but no interactive prompt.

At this point, I used boot-repair-disk to generate this report: http://paste.ubuntu.com/966531/

Note in the above report, it says that the bootloader does not point to the correct sector for core.img. (sda is the virtual cd; sdb is the boot disk; sdc is a mirror of sdb, but boot is not mirrored, just a separate unrelated partition is there and formatted ext3; sdd and sde have space for boot but it is not formatted)

Then I booted from the Ubuntu server CD, started the rescue system, and issued the following commands, which completed without error (where sda is the virtual CD, and b,c,d,e are the disks which were a,b,c,d in the previous grub commands):

# parted /dev/sdb set 2 bios_grub on
# parted /dev/sdc set 2 bios_grub on
# grub-install /dev/sdb
# grub-install /dev/sdc

At this point, I used boot-repair-disk to generate this report: http://paste.ubuntu.com/966561/

Note that in the above report, the problem about core.img is gone. It seems to point to the correct sector.

Now if I try to boot, I get a grub prompt. If I run "set", I see that root is found and set. If I run "ls /" I see my root directory from the raid volume, including the vmlinuz kernel file. If I type "ls /vmlinuz" it says "error: file not found." It says the same error if I use the "linux" command to try to load the kernel. The vmlinuz file is not listed if I use "ls -l /".

Overly verbose details, in case you want to follow:

I noticed there is also no /boot/grub/grub.cfg, so I ran

# grub-mkconfig -o /boot/grub/grub.cfg

But the problem remains.

If I use the "gptsync" tool, there is no change in this behavior.

The boot-repair-disk won't repair the system, because it wants me to boot with an EFI enabled bios. I briefly looked into this, but I don't know how that works. I found a UEFI shell in my boot options, but I don't know anything about it, and don't see how to change the startup from there (eg. to boot the CD from that EFI shell).

I have also read this page, but Ubuntu doesn't come with the "grub" command, so I can't follow it exactly. I could simply install that command, but I am more curious to find out how the Ubuntu installer managed to install it rather than having a different setup. Did it use blocklists?

Here is the output of parted, while booted on the boot-repair-disk (where here the sdb is the first hard disk, sda when booted from disk, and "boot" changes to "bios_grub" in the 2nd paste link):

Model: ATA Hitachi HUA72303 (scsi)
Disk /dev/sdb: 3001GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt

Number  Start   End     Size    File system     Name   Flags
1      17.4kB  2000MB  2000MB  linux-swap(v1)  swap1
2      2000MB  2256MB  256MB   ext3            boot1  boot (this says bios_grub in 2nd link)
3      2256MB  66.3GB  64.0GB                  root1  raid
4      66.3GB  3001GB  2934GB                  data1  raid

Here is an unrelated super old virtual machine for comparison (for anyone unfamiliar with boot-repair-disk): http://paste.ubuntu.com/966799/

Here is the latest paste from the problem system, after running the above grub-mkconfig, and also setting "bios_grub" back to "boot". http://paste.ubuntu.com/966808/

Comparing the two, this looks interesting:

sdb2: __________________________________________________________________________

File system:       
Boot sector type:  Grub2's core.img
Boot sector info: 
Mounting failed:   mount: unknown filesystem type ''

md/bcserver8:0: ________________________________________________________________

File system:       ext4
Boot sector type:  -
Boot sector info: 
Operating System:  Ubuntu 12.04 LTS
Boot files:        /boot/grub/grub.cfg /etc/fstab /boot/grub/core.img

It looks like the raid has the boot files, and the sdb2 is not formatted. (despite this, the system booted before running grub-install). From the rescue CD, "mount -t ext3 /dev/sdb2 /boot" fails. But it makes sense that this would confuse things, since grub uses partition 2 explicitly (the 2 in the parted command that set bios_grub on).

So I did something like this:

# mkfs.ext3 -L boot1 /dev/sdb2
# mv boot boot_on_root
# mkdir boot
# mount /dev/sdb2 boot
# rsync -avHP boot_on_root/ boot/
# parted /dev/sdb set 2 bios_grub on
# parted /dev/sdc set 2 bios_grub on
# grub-install /dev/sdb
# grub-install /dev/sdc

Then rebooted, and I have the black screen again, no prompt. http://paste.ubuntu.com/966848/

So at this point, my guess is that when bios_grub is set, grub is not installing to the MBR, and not to the ext3 file system on ext3, but on the partition itself, as if it was EFI... which would obviously mess up the ext3 file system there. Aand from my brief reading about EFI, it sounded like EFI assumes the first partition is the boot, but in my case the first is swap, and also it should then be FAT rather than something unmountable... so since that makes little/no sense, I'm still completely lost without a clue. [EDIT:now I have a clue... skip down a bit for update]

And now when I click repair in boot-repair-disk, it asks something else. Last time the error was hidden under the window and I had to drag the other away to see it. This time the main window is gone, and the new window says:

GPT detected.       You may want to retry after creating a
BIOS-Boot partition (>1Mo, flag). Do you want to continue?

So I clicked yes, and it said it repaired successfully, and created another paste: http://paste.ubuntu.com/966862/

But I still have a black screen with a blinking cursor.

Now my theory is that boot got overwritten by a non-fat non-EFI thing which is just grub code that would have otherwise been in sectors 0-63 before. I luckily ran into a very clear statement on this page, which probably completed my understanding of what all this means. And then after I found that, Jeremy posted an answer which if true, confirms that this is the missing key concept. http://blog.psych0tik.net/2011/08/grub-embedding-blocklists-and-bios_grub-partitions/

Questions:

What is going on? Why should grub fail to boot? Why does it say "file not found"?

Why doesn't grub want to install without this setting I set with parted (which was not set by the Ubuntu installer)? I thought all I needed to install it was a separate /boot that is not in LVM nor software RAID, since my root is in RAID and the partition table is GPT.

How does the Ubuntu CD installer install it without this problem, and without the bios_grub setting?

I would also consider using EFI. If this is a good idea, and there is a standard way to set it up, I am always up for learning new things.

The quickest answer that would make me happy, even without answering all my questions, would be a set of commands that I could run from the rescue CD to fix the bootloader in the same manner that the install CD did it. It would be also extra nice if I could run them with the booted system, instead of the CD.

Peter
  • 2,546
  • 1
  • 18
  • 25

1 Answers1

8

Solution is to use a bios_grub partition, which is not the same as the /boot partition.

By default the bios_grub partition is 1MiB, and it must be flagged bios_grub. Mine is the first partition on my disk. If your partition 2 is actually /boot as parted suggests, that would not be correct and you should make another 1MiB partition.

With GPT and GRUB2 the minimum filesystem has three partitions: bios_grub, root, swap. (not perfectly sure swap is required)

Why does grub fail to boot after simply running "grub-install"?

Unknown... You'd think it wouldn't modify anything if it says clearly it cannot embed so it can't work.

Why does it say "file not found"?

/vmlinuz is a symlink that uses the boot partition, and the boot partition is corrupt. The bios_grub code was written on top of its ext3 structure. This probably meant that /boot was not mounted, and the grub files seen there were actually on the root system, which didn't contain the kernel.

Why doesn't grub want to install without this setting I set with parted

A GPT partition table has no space for a bootloader, unlike MBR. So a specific partition must be created to hold the boot code. Before running "grub-install", specify this partition with the command:

    parted /dev/sda set 1 bios_grub on

I thought all I needed was a separate /boot. How does the Ubuntu CD installer install it without the bios_grub setting?

This requirement seems to be all that is needed for the Ubuntu installer, but it creates an unstandard system which is broken easily.

When GRUB says "This GPT partition label has no BIOS Boot Partition", it means the bios_grub partition, not /boot.

Peter
  • 2,546
  • 1
  • 18
  • 25
Jeremy
  • 328
  • 2
  • 11
  • Thank you. This is actually very close to what I am working on now. See my "I'm still completely lost without a clue." section above. Now my theory is that boot got overwritten by a non-fat non-EFI thing which is just grub code that would have otherwise been in sectors 0-63 before. I am working on an experiment, and then will let you know how it goes. – Peter May 04 '12 at 14:35
  • Are you using Ubuntu? Is there a way the Ubuntu installer can properly install using the bios_grub partition? – Peter May 04 '12 at 14:50
  • @Peter I use Ubuntu, and if you do a guided partitioning the installer should set it up correctly. I know that it did for me with the 11.10 installer. – Jeremy May 04 '12 at 14:57
  • Thank you very much. This is the answer. Next I will try with more complex setups (raid and lvm on the boot) and then I'll edit your answer with details. – Peter May 04 '12 at 15:29