6

My instance was running for years and suddenly stopped responding Jun 1st. I tried to reboot it, but it would not boot. It gave errors in the system log: https://pastebin.com/rSxr1kLs

Linux version 2.6.32-642.11.1.el6.x86_64 (mockbuild@c1bm.rdu2.centos.org) (gcc version 4.4.7 20120313 (Red Hat 4.4.7-17) (GCC) ) #1 SMP Fri Nov 18 19:25:05 UTC 2016
Kernel command line: root=/dev/xvde ro LANG=en_US.UTF-8 KEYTABLE=us
VFS: Cannot open root device "xvde" or unknown-block(0,0)
Please append a correct "root=" boot option; here are the available partitions:
Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(0,0)

I tried to detach the EBS volume and re-attach it as /dev/sda1 according to the documentation: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/TroubleshootingInstances.html#FilesystemKernel

However, it gave an error Error attaching volume: Invalid value '/dev/sda1' for unixDevice. Attachment point /dev/sda1 is already in use and I was unable to attach it. I re-attached it as /dev/sda but it still won't boot and it still gives the error in the system log.


I was able to launch a new instance into the exact same availability zone and attached my EBS volume as /dev/sdf. It shows up inside the instance as /dev/xvdj. I mounted it with mount /dev/xvdj /xvdj. I can see the grub.conf file:

[root@ip-172-31-4-249 grub]# cat /xvdj/boot/grub/grub.conf
default=0
timeout=1

title CentOS (2.6.32-642.11.1.el6.x86_64)
        root (hd0)
        kernel /boot/vmlinuz-2.6.32-642.11.1.el6.x86_64 root=/dev/xvde ro crashkernel=auto LANG=en_US.UTF-8 KEYTABLE=us
title CentOS (2.6.32-504.30.3.el6.x86_64)
        root (hd0)
        kernel /boot/vmlinuz-2.6.32-504.30.3.el6.x86_64 root=/dev/xvde ro crashkernel=auto LANG=en_US.UTF-8 KEYTABLE=us
        initrd /boot/initramfs-2.6.32-504.30.3.el6.x86_64.img
title CentOS (2.6.32-504.3.3.el6.x86_64)
        root (hd0)
        kernel /boot/vmlinuz-2.6.32-504.3.3.el6.x86_64 root=/dev/xvde ro crashkernel=auto LANG=en_US.UTF-8 KEYTABLE=us
        initrd /boot/initramfs-2.6.32-504.3.3.el6.x86_64.img
title CentOS (2.6.32-504.el6.x86_64)
        root (hd0)
        kernel /boot/vmlinuz-2.6.32-504.el6.x86_64 root=/dev/xvde ro crashkernel=auto LANG=en_US.UTF-8 KEYTABLE=us
        initrd /boot/initramfs-2.6.32-504.el6.x86_64.img
title CentOS (2.6.32-431.29.2.el6.x86_64)
        root (hd0)
        kernel /boot/vmlinuz-2.6.32-431.29.2.el6.x86_64 root=/dev/xvde ro crashkernel=auto LANG=en_US.UTF-8 KEYTABLE=us
        initrd /boot/initramfs-2.6.32-431.29.2.el6.x86_64.img
title CentOS (2.6.32-431.23.3.el6.x86_64)
        root (hd0)
        kernel /boot/vmlinuz-2.6.32-431.23.3.el6.x86_64 root=/dev/xvde ro crashkernel=auto LANG=en_US.UTF-8 KEYTABLE=us
        initrd /boot/initramfs-2.6.32-431.23.3.el6.x86_64.img

This compares to the grub.conf of the running instance:

[root@ip-172-31-4-249 grub]# cat /boot/grub/grub.conf
default=0
timeout=1

title CentOS-6-x86_64-20130527-03 2.6.32-358.6.2.el6.x86_64
        root (hd0)
        kernel /boot/vmlinuz-2.6.32-358.6.2.el6.x86_64 root=/dev/xvde ro
        initrd /boot/initramfs-2.6.32-358.6.2.el6.x86_64.img

Does it matter that it doesn't have initrd line in the first option?

I tried to mount the EBS volume to the new instance with /dev/sda, but it still wouldn't boot with the same error Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(0,0).

CentOS 6

Chloe
  • 1,094
  • 4
  • 16
  • 34
  • This may or may not be useful: http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/device_naming.html – Chloe Jun 05 '17 at 00:20
  • First choice is to restore the instance from a snapshot and any data from your backups. If you don't have them then I suggest you pay the $30 for developer support (slow) or $100 for business support for a month. Amazon will be able to help you, and will have access to much more information, tools, and experience than us. If your server isn't worth that then start again. – Tim Jun 05 '17 at 00:51
  • Does it matter that it doesn't have `initrd` line in the first option? – Chloe Jun 05 '17 at 15:18

6 Answers6

6

I created a new instance by going to Images > AMIs > Private Images > Selecting the image the instance was started from > Launch. I launched in exactly the same availability zone, not just US or region, but the 2a, 2b, 2c must match as well. I stopped the new instance. I disconnected the EBS volume from the old instance. I re-attached the EBS volume to the new instance at /dev/sdf. I started the new instance. The EBS volume shows up inside the instance as /dev/xvdj so I mounted it with mkdir /xvdj; mount /dev/xvdj /xvdj. I edited /xvdj/boot/grub/grub.conf and changed default=0 to default=1. I saved the file, stopped the new instance, re-attached the EBS volume to the old instance and it started. I ran yum update in the old instance and double-checked /boot/grub/grub.conf and double-checked that it would reboot.

I also found this regarding updates to CentOS kernel: grub.conf missing initrd path after kernel update

I noticed after I ran yum update I now had 2 entries in grub.conf without initrd. Running # yum reinstall kernel.x86_64 works to fix that.

Chloe
  • 1,094
  • 4
  • 16
  • 34
  • 1
    This is what Jason recommended above. You should accept that answer. – EEAA Jun 05 '17 at 15:59
  • Not really, he suggested copying all the data into a new instance, without fixing the actual problem. That would have required a lot of work to set up and configure everything again including file permissions. I do back up the configuration in /etc/ but it would still be a pain, especially with SELinux. Also I would have to set up the security groups, inbound rules, elastic IP, and possible more stuff. Not only that, but it doesn't include enough details, like the fact that you have to use the exact same AMI that you started with, and it has to be in the exact same availability zone + subnet. – Chloe Jun 05 '17 at 16:16
  • 1
    In our case, we had kernel panic error after migrating VMWare vm (centOS) to AWS. The error was about CPU mismatch etc. Detaching the volume and reattaching to an Amazon Linux instance, doing the suggested file change and reattaching the volume to the original instance worked for us. Thank you for your answer. – Tharaka Jun 26 '19 at 03:32
2

I've had this same issue on several occasions and had to solve it by restoring the instance from EBS snapshot backups. Today I had the same issue and was determined to resolve it without having to restore from backups. I did the following:

  1. Detach the root volume from the failed instance /dev/sda1.
  2. Attach the volume into a working instance and mount the volume (e.g. mount /dev/xvdh /xvdhmount)
  3. Back up the boot folder: mv /xvdhmount/boot /xvdhmount/boot-backup
  4. From a working instance with the same version of OS in my case RHEL 7.4 copy the entire contents of the /boot folder via SCP or WinScp into /xvdhmount/.
  5. Detach the volume from the working instance and attach back to the failed instance.
  6. Start the failed instance .... the instance did boot and I am able to log in.

I hope this helps!

Andrew Schulman
  • 8,561
  • 21
  • 31
  • 47
YohannesM
  • 21
  • 1
1

Me Too!

The underlying cause was an interrupted yum upgrade and a junior staffer doing the work reconnected, and ran yum-complete-transactions to finish everything.

However, something didn't write a file into /boot/initrd....newver.... which was probably related to the latest kernel entry in grub2.cfg missing its initrd=/.... line completely.

The quick fix was to reattach the boot disk volume to a different instance, mount it, and edit /mountpoint/etc/grub2.cfg so that the instance starts up the older version of the kernel. Then re-disconnect and reattach to /dev/sda1 of the original instance.

NOTE lately its been hard to attach a centos boot volume to a different centos machine, because the UUID is the same on the root volume. Workaround is to use a different OS as your temp machine, like Debian for CentOS disk fixups.

Once you're in again, run yum reinstall kernel* to repeat the missing steps, and on completion reboot again to be sure it restarts properly this time and onto the newest kernel.

Criggie
  • 2,219
  • 13
  • 25
1

I had a similar problem with a CentOS instance. This AWS support article gives quite a good overview. Here's how I and managed to solve my problem:

  • Shut down the original EC2 instance and then detach the /dev/sda1 disk
  • Start a new, temporary EC2 instance, and attach the disk as /dev/sdp to the new EC2 instance
  • SSH into the new EC2 instance and mount /dev/sdp to /data

Then I wanted to go back to a previous kernel. The instructions on the CentOS wiki were helpful:

  • List all Grub entries with grep "^menuentry" /data/boot/grub2/grub.cfg | cut -d "'" -f2
  • Picked the 2nd one from the top, in my case this was CentOS Linux (3.10.0-957.12.1.el7.x86_64) 7 (Core)
  • Configure the boot default with grub2-set-default --boot-directory /data/boot/ 'CentOS Linux (3.10.0-957.12.1.el7.x86_64) 7 (Core)'

Then shut down the new EC2 instance, detach the volume, attach it back to the original instance (to /dev/sda1) and boot the initial instance.

0

It looks to me like your kernel got upgraded in such a way that it doesn't understand your root filesystem anymore. Your best bet is to create a new node and mount the EBS volume from the old one as a non-root / non boot device, and transfer the critical data over.

Jason Martin
  • 4,865
  • 15
  • 24
  • Can I downgrade the kernel and update again? I ran `yum update` in the new instance and it jumped to `Linux ip-172-31-4-249 2.6.32-696.3.1.el6.x86_64 #1 SMP Tue May 30 19:52:55 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux` and boots OK. – Chloe Jun 05 '17 at 06:15
0

I came across similar problem, and it turns out, AWS EC2 defaults differ for launching instance vs. creating an AMI: hardware virtualization (HVM) is the default choice in first case, but paravirtual (PV) is the default for image creation.

I stumbled upon this when tried to move EC2 instance between availability zones by snapshotting its EBS volume and creating a new AMI, and this discrepancy in settings (which I did not pay attention too) wasted an hour for me.

tl;dr: just choose HVM when launching from a customized AMI and your instance should boot fine.

Benny K
  • 121
  • 2