1

Since the beginning of the month, automatic reboot after kernel change takes ... 3 hours.

It is a Lenovo Thinksystem server in a datacenter.

journalctl -b -1 ends with:

juil. 23 03:02:25 host systemd-shutdown[1]: Sending SIGTERM to remaining processes...
juil. 23 03:02:25 host systemd-journald[658]: Journal stopped

and journalctl -b starts with:

juil. 23 06:16:26 host kernel: microcode: microcode updated early to revision 0x2006906, date = 2020-04-24
juil. 23 06:16:26 host kernel: Linux version 4.15.0-112-generic (buildd@lcy01-amd64-027) (gcc version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04)) #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 (U
juil. 23 06:16:26 host kernel: Command line: BOOT_IMAGE=/boot/vmlinuz-4.15.0-112-generic root=UUID=7b9c74b1-d80e-457e-957a-32be0fca891e ro

I have a GRUB_TIMEOUT=0 so I have no idea of what could be the issue.

Edit:

One weird thing is that lshw used to show the server model:

product: ThinkSystem SR590 -[7X99CTO1WW]- (7X99CTO1WW)
vendor: Lenovo
version: 04

And now:

    description: Rack Mount Chassis
    product: -[none]- (none)
    vendor: Lenovo
    version: none
    serial: none
    width: 64 bits
    capabilities: smbios-3.2 dmi-3.2 smp vsyscall32
    configuration: boot=normal chassis=rackmount family=ThinkSystem sku=none
  *-core
       description: Motherboard
       product: -[none]-
       vendor: Lenovo
       physical id: 0
       version: none
       serial: none
       slot: none
     *-firmware
          description: BIOS
          vendor: Lenovo
          physical id: 0
          version: -[TEE128O-1.51]-
          date: 10/30/2018
          size: 64KiB
          capacity: 15MiB
          capabilities: pci pnp upgrade shadowing cdboot bootselect edd int14serial acpi usb biosbootspecification uefi
     *-cpu:0
          description: CPU
          product: Intel(R) Xeon(R) Silver 4110 CPU @ 2.10GHz

Edit 2:

It seems to be stuck on that screen for hours:

enter image description here

Laurent
  • 286
  • 2
  • 11
  • The hardware specs of the server may be interesting. Any external USB disk drives or the like connected to the server? – Gerard H. Pille Jul 23 '20 at 12:32
  • I added the server model from lshw when it was showing correct output – Laurent Jul 23 '20 at 13:17
  • Watch the console during the reboot to see what the system is doing. – Michael Hampton Jul 23 '20 at 13:21
  • I have no access to it in the coming days, it is in a datacenter – Laurent Jul 23 '20 at 13:24
  • Is "onecli" scheduled to run regularly? – Gerard H. Pille Jul 23 '20 at 14:38
  • I have downloaded the onecli tar, but not sure how to install it. They give no instructions for Ubuntu. – Laurent Jul 23 '20 at 14:52
  • It's a commandline utility - https://lenovopress.com/lp0656-lenovo-thinksystem-firmware-and-driver-update-best-practices - so you could very well have run it automatically. But you're probably too wise to do something like that unattended. They have a new batch of updates from 1st of june (https://windows-server.lenovo.com/repo/2020_06/html/SR590_7X98_7X99-Windows_Server_2016-2020_06_01.7z) so you could've installed that. I guess you didn't. The system could've lost its BIOS settings, if a system like that has those. Maybe it waits 3h for someone to hit F1? A power failure recently? – Gerard H. Pille Jul 23 '20 at 14:58
  • No, no power failure at all, ever. OneCli cannot connect to IPMI, there is no IPMI device under /dev – Laurent Jul 24 '20 at 06:29
  • Do you have a journal of the kernel changes you made when the problem started? BTW, this is not the time to start using onecli. – Gerard H. Pille Jul 25 '20 at 09:48
  • It gets stuck on UEFI: DXE INIT, see picture. It doesn't find machine type not serial etc. – Laurent Jul 27 '20 at 00:37
  • Thanks guys I fixed it by resetting the CMOS using the motherboard jumper. – Laurent Jul 27 '20 at 03:32

1 Answers1

0

It was not possible to fix it remotely. It was not possible to enter setup or boot menu without waiting hours. It happened by itself, not power outage or anything. Weird for a server class product ...

I fixed it by resetting the CMOS using the motherboard jumper, the manual explains how to do it.

Now the system model, serial and so on are available to the OS and the onecli tools works as it should.

Laurent
  • 286
  • 2
  • 11