11

I have a machine with a lot of disks, and an extra SAS controller in HBA mode. This seems to cause Linux to think for at least 8-10 seconds once in initramfs before the disks actually appear. The timeout for disk detection is 10 seconds. This causes BTRFS/MDADM/etc fail to mount a RAID1 that I have in my system, dropping me to an emergency shell from where I can actually mount the disks and continue just fine.

My question is, how do I increase this timeout at boot from 10 seconds? Is it in systemd? is it in udev? somewhere else? I'm not sure where to start looking, and googling about this problem mostly seems to yield people looking to raise the I/O timeout or some other (scsi/lun/etc) timeouts, but I'm not looking for that.

Alex
  • 369
  • 5
  • 22
  • 1
    IDK either, but maybe this [multipath boot delay issue](https://bugs.launchpad.net/ubuntu/+source/multipath-tools/+bug/1467989) is related or could provide leads to your solution? – rickhg12hs Nov 04 '18 at 10:41
  • @rickhg12hs I've tried the udev parameters mentioned in the post, however it still seems to wait 10 seconds for the first device and then occasionally drop into emergency shell when it goes past the threshold.. Thanks for the tip though, I'll try fiddle some more with udev.. – Alex Nov 06 '18 at 23:35
  • I still haven't figured it out, I'm guessing it's a kernel compilation parameter, but I have yet to deepdive the main docs for udev so maybe I'm overlooking something. If anyone has any ideas I'm very open to hearing them, right now I'm just never rebooting the server except when I'm onsite.. – Alex Nov 23 '18 at 02:53
  • Are the drives actually spinning up during POST? Do you have enough power for them to all spin up at once? You may need to set up your HBA for a staggered spinup, if it has this option (any decent one will). – Michael Hampton Dec 28 '18 at 14:48
  • @MichaelHampton Yea the drives are working and showing up in the SAS config tool and BIOS, I have a 1000W EVGA Power supply, the disks are spread out over 2 different rails, but tbh it should have enough power, the second Xeon CPU I added also works fine and I'm drawing about 200-300W from the wall when everything's up and running.. It's kind of like the kernel module for the HBA hangs for 5+ seconds in the initial boot phase, causing all the other disks to show up much later as well. – Alex Dec 28 '18 at 18:04

1 Answers1

3

I've finally found it! It's of course but a simple kernel parameter, found here https://www.kernel.org/doc/html/latest/admin-guide/kernel-parameters.html

The parameter I was specifically looking for is rootdelay, I had already tried rootwait but apparently that wasn't enough, as it still aborted the wait after 10 seconds. Now it actually does not wait the full 30 seconds specified, but only about 10-15 seconds depending on how long it takes for my disks to show up, so setting a really high value doesn't seem to hurt, although I've only set 30 for my use case, which so far seems to have completely resolved the issue!

You can add it to your kernel boot parameters in Grub or systemd-boot.

Grub: /etc/defaults/grub -> GRUB_CMDLINE_LINUX_DEFAULT="rootdelay=30 quiet"

systemd-boot: /boot/loader/entries/yourentry.cfg -> options rootdelay=30 [other options]

Alex
  • 369
  • 5
  • 22