What to do with suddenly unreachable non-logging EC2 instance?

Question

I have an EC2 "micro instance" running Canonical's Ubuntu 10.04 LTS. Has been running for 6-9 months now, infrequently rebooted (once every few weeks at the most).

I just did what I thought was a routine aptitude update, aptitude full-upgrade. On noticing there seemed to have been some new -ec2 linux images installed, I rebooted the system. While it seemed to reboot and go back to "running" status on the console, it didn't come back with its usual ssh and http services. I've tried stopping and starting it, re-associating it's elastic IP... no joy.

The strange thing is, "Get System Log" (AWS console) returns a completely blank log. Empty. Nothing. Not one character. (At least it's empty after the first start-stop; before the stop it just contained a final line about restarting).

I've tried a few stop-start cycles but no improvement.

Any advice what to try next to get my instance back to life ?

Is this an EBS boot instance or instance-store? What is the AMI id? — Eric Hammond, Oct 31 '11 at 22:02
I've edited your question to clarify that the Ubuntu 10.04 AMIs you are running were created by Canonical, not by Alestic (me). I list Canonical's Ubuntu AMI ids at the top of http://Alestic.com — Eric Hammond, Oct 31 '11 at 22:06
Misbehaving instance was created with ami-311f2b45 back around February'11; I've just used ami-c00e3cb4 to bring up a new instance no problem (see answer below). Both EBS backed. — timday, Oct 31 '11 at 23:09

score 4 · Accepted Answer · answered Oct 31 '11 at 23:32

I run into very same problem recently. I'm quite new to EC2 in general, but with some help from Eric's blog I have managed to troubleshoot and resolve the issue, although I'm still not sure what it REALLY was. I think it possibly is missing kernel AKI for this particular AMI and its new updated kernel image (BTW, Im running the same AMI)

I stopped my instance, attached the volume to the new one (running on the same AMI). Had to play a bit with e2label and fstab.
Mounted old filesystem (including dev and proc) and chrooted to it
Upgraded kernel to the version one before the latest, as I couldnt find AKI corresponding with it. I had to change AKI Manually using EC2 API tools
Removed new EBS volume (fixing first partition labels) and booted back to the old volume

Im running now 2.6.32-318-ec2

Can someone correct me if I'm wrong pointing the missing AKI as the source of problem? Anyway it worked and I'm sure Ill test all upgrades on the test host first before applying it to the production system.

Thanks; nice to know it's not just me, and that there is some rational explanation. — timday, Nov 04 '11 at 16:31

timday · Answer 2 · 2011-11-01T00:15:33.667

My solution/recovery was:

Instantiate a fresh instance with the Ubuntu 10.04 AMI ami-c00e3cb4 (promptly updated and upgraded and rebooting to linux-image-2.6.32-319-ec2 no problem).
re installed all the packages of importance
Mounted a snapshot of the old non-booting instance (made after it became non-booting) as a volume.
rsynced over the handful of /etc and /var and /home of importance

and it's back as it was before (with the advantage of being a little less crufty).

I didn't bother trying to boot a fresh instance with the problem image because... well, surely all the "state" lives in the disk image (which I can only guess suffered some boot-related corruption) so I wouldn't expect any different result.

Just "one of those things" I guess ?

In future I think I'll be snapshotting more regularly, and before any kernel updates.

What to do with suddenly unreachable non-logging EC2 instance?

2 Answers2