Can grub boot safely from btrfs raid10?

1

I'm using Ubuntu 16.04.1 LTS. The system was set up with a RAID1 btrfs / (two disks sda1 and sdb1). No separate /boot partition or BIOS boot partitions were created.

After adding two more disks and converting to RAID10, the system refused to boot. However, I was able to repair it by running update-grub from the Live CD by following the instructions on this page.

I don't really know much about how grub works. But in retrospect, it actually seems like a miracle that it was able to boot the system. If I understand it correctly, grub stores the block address of the first block of the next stage in the MBR. Am I correct in that I had to use update-grub because the balance shuffled the blocks around?

Secondly, what would happen if, due to the RAID10, grub's next stage was split across multiple disks? Does it know how to handle this or am I sitting on a time bomb here?

user2323470

Posted 2016-09-06T10:43:00.163

Reputation: 11

Answers

1

Grub loads its own filesystems manager (NTFS, FAT32, EXT*, BTRFS, LUKS; LVM, RAID, etc) if when installed it is tell that modules must be on the boot stage, that way it knows how to access all filesystems yoy want (that are supported), so it loads small MBR code (stored on 1st sector) no matter if it is MBR or GPT pattioned disk (or hibrid), then it loads a 'big' chunk of data hardcoded in a sector and next sectors (can be near 2MiB or more if a lot of modules are put on that stage, i had tested upto near 8MiB) that data is stored on a non-movable part of the disk, could be after MBR (1st megabyte, upto 2047 sectors), on a dedicated partition (biosgrub) without format (in raw mode), on a formatted partition (on a file that must not be moved), or on a chain block (some ext* and not recomended since that can be moved and will not boot till a grub reinstall).

So grub first load a mini-code that has hardcoded where the 'big' code is stored, then ir loads that code, that 'big' code knows how to manage all filesystems that had been told to (when installing grub using modules parameter or with files configurations, etc), that allows grub to know how to access LUKS encripted (multi level allowed), RAID, LVM2, FAT32, NTFS, EXT*, BTRFS, etc., so it knows how to access the filesystem where its files (grub.cfg, etc) are stored.

So yes, GRUB2 can be installed on pure stripping (raid0, LVM, btrfs, etc) without problems; but it is also true that if that 'big' code is koved to another place and where it was is overwrited, GRUB will not be able to boot until a grub reinstall that updates that hard coded position for its 'big' code.

Some filesystems has a flag per file that allows the filesystem to know that file must not be moved and since that file is not re-writted it does not get moved, except for some cases.

It can happen with a btrfs balance that such special grub file (where the 'big' code is stored) be moved away because of COW on btrfs and also that place where it was be overwritted, then grub2 will not boot... i had suffer that when going from 'single' to 'raid1' after adding a second disk.

In that case, grub will show a rescue command line instead of booting. What can be done to fix it is very easy, just boot with a live linux that has grub-install command (no need to do a chroot) and mount the partition where you have grub.cfg as / or /boot depending if your /boot is a separated partition or not from / partition; run grub-install with correct modu,es parameter, unmount and reboot, then redo the grub install from your own linux to have the same verion (if you are paranoid or like to not mess versions).

But the recomended way to fix it is to mount the btfrs, do a chroot and re-do the grub install from your own linux

I preffer another scheme, i allways have my own grub2 (with grub.cfg edited manually) that chainloads all other linux/windows/etc bootloaders, that way each system has its own booloader and each system does not need to depend from others (multi-boot). Also i have that scheme on computers i only hac¡ve one system, so i have ISO loop (so i can boot linux live distros that reside on .iso files), i also add options to jump the linux bootloader (just in case any update damages its own bootloader, etc) and a chaiload to the partition first sector (where the distro bootloader was installed).

That way i can isolate problems, if the linux distro bootloader does not boot anymoe, instead of boot with chaiload i boot with my own entry, then i fix what needs to be fixed, etc.

Since i discover btrfs raid 1 letting me to recover from some KingDian SSD that after long periods without power (more than a week, eigth days and more) say some sectors are not readable (and if let non powered for another eigth days the list of non readable changes and the ones that were not readable become readable again with correct data on them; really weird malfunction on that KingDian SSDs); i only use btrfs raid 1 for my own main GRUB2 bootloader and for all linux.

And yes, i had sometimes to fix grub boot of the linux, but since some months i only had neede to do it only once, and was just after adding the second disk and balance (convert from single to raid1), so i can asume it is not something to be worried about, it is enougth safe, and in case it fails, just booting with SystemRescuCd or whatever distro you want that have grub-install command, you can fix it directly (as emergency) or by doing a chroot (recomended).

Before i knew about btrfs i was allways using grub2 on a NxHDD in RAID0 (over dm-raid long ago and recently over LVM2), and never had any problem because of 'sttriping'... remembering not to forget 'modules' parameter on grub-install command.

So do not be worried about having GRUB2 on a RAID 0, 1 , 10) on btrfs, but all people i know told me a warning if RAID (5 or 6), better not use raid5 neither raid6 on btrfs at all.

Laura

Posted 2016-09-06T10:43:00.163

Reputation: 11

0

You don't give much information. E.g. did GRUB load at all or not?!

I just tested this on a virtual machine. I installed Debian Stretch with kernel 4.9 and btrfs-progs V4.7 on a single disk. The bootloader is the GRUB 2.02~beta3-4 package.

After installing I added 3x extra disks (4 disks (with 1 partition each) in total). I installed grub on all disks and executed update-grub.

I rebalanced both data and metadata to RAID10. I even removed one of the disks in the RAID10 set to see what happened. I had to edit the kernel commandline with rootflags=degraded to boot with a missing disk. After reconnecting the disk again and running a balance (to convert single chunks to raid10 again) I shuffled data around , installed a bit of stuff, rebalanced a few times and still booted successfully. Note that I did NOT run update-grub on any of the disks during these balances.

Also the GRUB manual states that BTRFS RAID0, RAID1, RAID10, gzip and lzo) is supported: https://www.gnu.org/software/grub/manual/grub.html

I don't know much about how GRUB works myself, but I can only assume that BTRFS must have a translation layer... e.g. a virtual block 1234 points to wherever it is really located on disk.

The conclusion based on my tests is that it seems no less scary to boot with btrfs raid10 than raid1. I can't answer for Ubuntu 16.04.1 LTS since I don't use it, but I suggest you experiment yourself because based on my tests it seems to work fine.

Waxhead

Posted 2016-09-06T10:43:00.163

Reputation: 1 092