2

I'm running SmartOS 20130405T010449Z with an Ubuntu KVM inside. The Ubuntu VM ran healthily for months, then after a reboot of the physical hardware the vm no longer connects to the network on startup, so I can't ssh into it to check its health.

I can log into SmartOS and start the VM:

$ vmadm start [uuid]

verify that it's running:

$ vmadm list
UUID             TYPE   RAM     STATE     ALIAS
[uuid]           KVM    10240   running   steve

and ping it:

$ ping steve
steve is alive

but when I attempt to drop into the VM's console, the command simply hangs forever:

$ vmadm console [uuid]
[hangs forever]

I get the same result when I attempt to ssh from inside SmartOS:

$ ssh steve
[hangs forever]

I can't ssh from other machines on the network, because the Ubuntu VM's IP address never comes up on the network.

What should I try next to access this VM?

  • Can you break into the grub prompt and run single user? This kind of thing is usually caused by one of the services waiting for the network or a device. You need to find out what it's waiting for. – hookenz Oct 11 '15 at 23:58
  • @Matt Is single user synonymous with using the boot option with `noimport=true`? If so, yes, I can run single user, but I don't know what to do next; `vmadm list` reports no VMs (presumably because the configuration was not imported) and I don't know how to bootstrap SmartOS's configuration manually. – James Bailey Oct 12 '15 at 03:14
  • Re: services waiting for the network or a device, when I do `vmadm stop [uuid]` I see a warning before the successful stop: `[ID 722105 kern.warning] WARNING: ip_interface_cleanup: cannot open /devices/pseudo/udp@0:udp: error 13`. Initial Googling didn't turn up anything relevant on this warning, but now it's looking more likely as a culprit. I'll investigate. – James Bailey Oct 12 '15 at 03:37

1 Answers1

1

Ok, I eventually recovered what I wanted from the VM, so for posterity, here is what I did:

First, I updated SmartOS. I was hesitant at first, fearing data loss, but the upgrade was totally painless: put a new version on a new USB stick, shutdown, swap the sticks, and reboot.

After the update vmadm console and ssh would still hang when connecting to the VM, so the key insight (I was unaware of this before) was to connect via VNC instead:

root@smartos $ vmadm info [UUID] vnc
{
  "vnc": {
    "host": "192.168.1.7",
    "port": 64762,
    "display": 58862
  }
}

me@anotherMachine $ xtightvncviewer 192.168.1.7::64762

There, the problem was immediately apparent: the VM was stuck at the boot menu, waiting for a boot option to be selected. I selected the default option and hey presto, the VM came up perfectly healthy.

There was a catch, though: presumably when I updated SmartOS, I lost the "external" NIC, so the VM came up without a channel to the outside world. I had to manually edit /usbkey/config in SmartOS and add these lines, which were missing:

external_nic=[MAC address]
external0_ip=192.168.1.20
external0_netmask=255.255.255.0
external0_gateway=192.168.1.1

and then add the external NIC to the VM:

root@smartos $ cat add_nic.json
{
    "add_nics": [
        {
            "physical": "net1",
            "index": 1,
            "nic_tag": "external",
            "mac": "[MAC address]",
            "ip": "192.168.1.8",
            "netmask": "255.255.255.0",
            "gateway": "192.168.1.1"
        }
    ]
}
root@smartos $ cat add_nic.json | vmadm update [UUID]

I had to reboot SmartOS to pick up the configuration change, and then the VM came up with a network interface.

Caveat: vmadm console still won't work, for some reason; it still hangs indefinitely. However, ssh steve works from inside SmartOS, and I can ssh to the IP address from other machines on my network.