OpenStack EC2 Ubuntu cloud image -- SSH host keys were changed after emegency reboot

Question

Recently one of our servers was hanged due to IPMI BMC failure. It is CentOS 6.3 OpenStack compute host serving KVM vitual machines with qcow2 backend.

There was running a VM based on EC2 Ubintu could image (precise-server-cloudimg-amd64-disk1.img).

After the system reboot I found a strange thing: ssh host keys on VM were recreated (13:25 - reboot time):

root@weather:~# ll /etc/ssh/*key
-rw------- 1 root root  668 Nov 21 13:25 /etc/ssh/ssh_host_dsa_key
-rw------- 1 root root  227 Nov 21 13:25 /etc/ssh/ssh_host_ecdsa_key
-rw------- 1 root root 1679 Nov 21 13:25 /etc/ssh/ssh_host_rsa_key

I also found that some orphan i-nodes was deleted during FS recovery process:

Nov 21 13:25:23 weather kernel: [    0.901159] EXT4-fs (vda1): INFO: recovery required on readonly filesystem
Nov 21 13:25:23 weather kernel: [    0.902688] EXT4-fs (vda1): write access will be enabled during recovery
Nov 21 13:25:23 weather kernel: [    1.930773] EXT4-fs (vda1): ext4_orphan_cleanup: deleting unreferenced inode 1286
......
Nov 21 13:25:23 weather kernel: [    1.940810] EXT4-fs (vda1): ext4_orphan_cleanup: deleting unreferenced inode 53755
Nov 21 13:25:23 weather kernel: [    1.940815] EXT4-fs (vda1): ext4_orphan_cleanup: deleting unreferenced inode 53754
Nov 21 13:25:23 weather kernel: [    1.940819] EXT4-fs (vda1): 8 orphan inodes deleted

My question is: why ssh keys could be recreated? Can it be a result of data loss in filesystem? And how to prevent this in the future?

qcow2 cache mode is set to write-through in libvirt VM configurations. Host filesystem is ZFS (zfsonlinux) placed on hardware RAID controller with BBU.

If this is a result of a file-system inconsistency on reboot - I am very mystified since ssh key files are not changed and all relevant data expected to be synced to stable media.

score 2 · Accepted Answer · answered Nov 26 '13 at 09:57

Noone's stepped in to say anything intelligent, so I'll state the obvious.

Yes, it could be as a result of data loss in the file system. I can't speak for ubuntu, but the CentOS (RH-style) sshd startup scripts provide for automatic creation of the keys if they're missing, and I presume ubuntu does something similar.

If your VM's file system was corrupted as a result of the failure of the underlying host, and that corruption happened to take out the system's ssh keys, then I would expect them to be automatically regenerated, and therefore to have changed.

Is that what happened? Sadly, at this point, I don't think anyone can tell.

If your system had been tripwired, then you'd have some kind of baseline audit of the FS with which you could compare the current state in order to make a more-informed decision about what exactly had happened to the VM image. As it is, you'll have to make a business decision as to whether this machine is sensitive enough to justify a complete clean rebuild, or whether you just shrug and accept it as one of those things.

Many thanks. Is there a simple way to check for write-back caching bugs in the storage stack? — Veniamin, Nov 26 '13 at 13:00

OpenStack EC2 Ubuntu cloud image -- SSH host keys were changed after emegency reboot

1 Answers1