20

I am shipping a bunch of ESXi 5.1 servers to remote offices where they will be powered via APC UPS.

I would like to have the UPS trigger a shutdown of the connected server - I would then rely on the ESXi configuration to take care of the shutdown/suspension of the VMs hosted on it.

I can see that APC have a solution documented using their PowerChute Network Shutdown, but this involves setting up an extra server per office, and requires network cards on each UPS. We are generally using UPS without a network card (e.g. Back-UPS Pro) - they come with a USB connector, and they are readily available in the locations where our offices are.

How can I connect a UPS to a ESXi host via USB, then have ESXi detect a power outage and then act accordingly? Has anyone managed to do this.

dunxd
  • 9,482
  • 21
  • 80
  • 117
  • 1
    Have you timed the shutdown process of the VM's through the host shutdown? Can the battery hold long enough for that period? – ewwhite Jan 04 '13 at 14:38
  • Thanks for pointing this out. Not yet - at this stage I am just shipping the ESXi servers for running a Domain Controller, but I am sure that once we have the resource in place we will add a few more servers, at which point the timing might change. – dunxd Jan 04 '13 at 14:54
  • The shutdown policy is quite long by default. But to be honest, I don't run UPS shutdown on my ESXi hosts or clusters. Seems counter-intuitive, but has never been an issue. – ewwhite Jan 04 '13 at 15:00
  • Why bother having UPS on your ESXi hosts at all then? If the power goes because of an outage or because the battery ran down you get the same result. – dunxd Jan 04 '13 at 15:21
  • To weather brief outages. But at my bigger sites, I have 2-4 hours of UPS power available for the VMWare cluster, storage and networking. – ewwhite Jan 04 '13 at 15:22
  • I have my fingers crossed that you don't experience any extreme weather caused long power outages in those locations. – dunxd Jan 04 '13 at 15:26
  • [I have experienced the worst...](http://serverfault.com/questions/408130/assessing-equipment-damage-following-a-lightning-strike-should-i-have-planned) And it's typically fine. I haven't encountered VMDK or guest OS corruption as a result of a hard shutdown/reboot. – ewwhite Jan 04 '13 at 15:28
  • Hmm - worth evaluating as not bothering setting this up is the simplest solution... Sadly in some of my locations extended power cuts wouldn't be uncommon, and in those places the UPS batteries will end up getting killed by regularly getting drained to zero. – dunxd Jan 04 '13 at 15:34

9 Answers9

25

Yes, it's possible. Here are details of my similar setup.

Hardware configuration: APC Smart-UPS 1500 connected to the ESXi 5.1 Host via USB. A Linux virtual machine running on this ESXi host. UPS is connected to this VM using ESXi USB pass through option.

Software configuration: NUT (Network UPS Tools) master running in the VM, and native ESXi NUT slave running on the ESXi host.

Shutdown logic: VM is running the UPS driver usbhid-ups which is responsible for the communication with UPS via USB. The upsd process connects to the UPS through the usbhid-ups driver and monitors the UPS state. The upsmon master process running on the same machine connects to the upsd and initiates the shutdown. ESXi host is running the 2nd instance of upsmon which also connects to the same VM upsd via internal network.

On power failure the following sequence takes place:

  1. UPS via usbhid-ups reports to upsd about power failure.
  2. (optional, useful if you want to shutdown in few minutes instead of Low Battery) upsmon on the VM initiates upssched 5 minutes timer. Timer is aborted if power is restored.
  3. When timer fires or when UPS reports Low Battery, the upsmon raises the FSD (forced shutdown) flag to upsd.
  4. In a stand-alone NUT configuration the FSD flag would shutdown the machine. But here the shutdown command is replaced by simple logging like "I should shutdown now but I am waiting for the host instead". And does nothing.
  5. The FSD flag is also read by ESXi upsmon, which initiates the ESXi host shutdown.
  6. ESXi host shuts down all virtual machines one by one. The important thing is that VM which runs the upsd should be shutdown last (using ESXi startup/shutdown sequence configuration).
  7. Important: this VM must have vmware tools installed. When it receives the guest shutdown command from the host, the vmware-tools shutdown script is being started. This script checks for the /etc/killpower flag. If no flag, it does nothing (this means user activated linux shutdown, not the UPS event). But if the flag exists (FSD active), then this script sends to UPS the delayed powerdown command (say, in 3 minutes).
  8. After running vmware-tools script the guest VM shuts down.
  9. ESXi sees the last VM poweroff state and goes down itself (it takes around 1 minute because there is no other machines running now).
  10. In 2 remaining minutes the UPS cuts off the power.
  11. When power is restored, the ESXi starts and powers on all VMs. The UPS monitoring machine must be started first (the same configuration as for shutdown order).

Downloads:

The NUT for Linux could be installed from package.

The native NUT client for ESXi server can be downloaded using last link on this page: http://www.networkupstools.org/download.html

Some my scripts and conf files are here (only changed lines are shown): http://pastebin.com/KkEeanK1

Notes:

Of course there are more details, and it took some time for me to make this working as it should. But now it performs very nicely. This system accounts for the cases when you just shutdown the monitoring VM from inside (vmware-tools script is not run), or if it's a ESXi host initiated VM shutdown (no /etc/killpower flag, so no UPS load off), or if it's a ESXi shutdown (the same). The only important is to have this VM running ASAP after host boot, and shutdown it last (so host down time is predictable - as said above, it is around 1 minute for me and 2 more minutes I reserve just in case).

My UPS monitoring Linux VM is also Samba/NFS sharing server for backup storage, the NAT/DHCP server for VMs, and some other light-weight services. It takes around 22MHz of ESXi CPU shares and around 10MB of active RAM when idle. Due to using the NUT you can power more devices from the same UPS if required, and all they can be shut down gracefully. No PowerChute and/or expensive Network Monitor Card is required.

Oleg Semyonov
  • 251
  • 2
  • 2
17

Super question. It is actually possible to do this quite nicely - at least on some setups. I have tried the following recipe on a number of ESXi 5.5 hosts. Basically, the solution goes like this:

  1. Enable SSH access on your ESXi host
  2. Create a Linux VM - I use Ubuntu. You only need a very minimal setup - no GUI or anything.
  3. Connect your APC device via USB to the ESXi host and pass it through to the Linux VM.
    • Make sure that the USB controller you add to the VM matches the actual, physical USB controller the APC device is connected to, i.e. only add an XHCI controller if the physical device is a USB3 device. Mismatches seem to cause odd problems in the Linux USB device driver.
    • If things aren't working out and you see errors like ctrl urb status -62 in dmesg, chances are the physical controller doesn't match the one in your VM. If they do match - well then it's a problem. I have one setup with this sort of problem and no real solution to it.
  4. Install apcupsd on the Linux VM - in Ubuntu, you can do sudo apt-get install apcupsd to install the latest version. The NUT project is also nice but I am a traditionalist.
  5. Install the plink utility by doing sudo apt-get install putty-tools
  6. Connect to your ESXI host by doing plink root@<your ESXi host IP>. You can close the connection immediately. The objective is to get the host key saved so plink won't prompt for it again when we run it via a script
  7. Edit /etc/apcupsd/apcupsd.conf and change the items below so they match: UPSNAME < the name you'd like your UPS to have > UPSCABLE usb UPSTYPE usb # DEVICE DIRECTIVE should be blank for USB DEVICE Also make sure that /etc/default/apcupsd has ISCONFIGURED=yes
  8. Edit /etc/apcupsd/apccontrol and scroll to the doshutdown case. Make it look like this: doshutdown) echo "UPS ${2} initiated Shutdown Sequence" | ${WALL} # Shut down indirectly by triggering the ESXi host to do the # shutdown via VMWare tools /usr/bin/plink root@< your ESXi host IP > -pw < your root pw > "/sbin/shutdown.sh && /sbin/poweroff" ;;
  9. Restart apcupsd using sudo service apcupsd restart and see if things are working by invoking apcaccess. If not, check logs and dmesg
  10. Make sure all VMs that need to shut down nicely in case of a power failure have VMWare Tools installed. Also make sure that they are part of the VM startup/shutdown list (in the vSphere Web Client, go to: vCenter -> <your host> -> Manage -> Settings -> VM Startup/Shutdown). Make sure that the shutdown action is to shut down the guest OS.

Once you have these things running, the doshutdown scriptlet from step 8 gets invoked on a power failure. This is turn invokes the shutdown.sh script on the ESXi host, which signals the VMWare Tools package in each VM on your host to do a clean shut down via the guest OS. In my experience, it works better than the PowerChute software from APC.

If you like to monitor things from your VMs, you can setup slave apcupsd instances on them that connect to the master UPS control Linux VM. Your slave apcupsd.conf files should have an entry like this:
UPSTYPE net < your UPS control VM IP >:3551
Entries like UPSCABLE and such do not matter in this case. This works with the Windows version of apcupsd (available here) as well. You can use the included apctray.exe to check out the current status of things.

That pretty much covers it, I think.

MrMajestyk
  • 1,023
  • 7
  • 9
  • +1 worked like a charm. First time! – Morten Kristensen Jun 01 '16 at 12:59
  • This answer worked perfectly, although at my client's office we had to tweak the `doshutdown` sequence a bit. We added `${APCUPSD} --killpower` right before the `/usr/bin/plink` part so that the UPS shuts down after a little while and restarts automatically when power is back. Also, it's worth noting that step 6 should be done as `root` acquired via `su` or `sudo su`, but **not** `sudo -s`. – Andrea Lazzarotto Feb 20 '17 at 14:15
5

According to APC, this is not possible and you require Powerchute Network shutdown. We tried this a number of times with USB and found no solution.

VMWare has info here on using the APC approved solution.

Would also think SmartUPS would be a better choice and you can fit with network card. Naturally more money but if your servers are at all important, that cost should be worth it. Also gives you more monitoring and alerting which might be useful at a remote site. You also need to assure sufficient runtime for all VMs to cleanly shutdown and then shutdown the host

Dave M
  • 4,494
  • 21
  • 30
  • 30
  • 1
    This seems like the most sensible answer being supported by both vendors. Unfortunate that VMware haven't thought to build anything into ESX/ESXi that does this natively. The Network solution requires that at least one network switch is powered via UPS too. – dunxd Jan 04 '13 at 15:34
  • 2
    It would not make much sense to *not* power network switches via UPS... they consume very little corrent and are critical to any network operation. – Massimo Jan 04 '13 at 16:19
4

You might consider using the USB device passthrough functionality to a guest running PowerChute or other software able to monitor the UPS's health and capable of triggering a shutdown on the ESXi host (e.g. apcupsd). ESXi officially only supports a very limited number of USB devices for passthrough, but people have been attaching and passing through different classes of devices for a while already with varying success, but the APC UPS USB seems to work according to this walkthrough for a Windows VM or this one for a CentOS Linux VM.

the-wabbit
  • 40,319
  • 13
  • 105
  • 169
2

Have a look at vSphere Management Assistant (vMA) from here We use it at my office for doing what you are attempting, however with Smart-UPS connected via USB rather than Back-UPS.

deveneyi
  • 284
  • 2
  • 3
  • Please add more detail as this is an undocumented setup as far as APC or vmware are concerned. – dunxd Jan 07 '13 at 10:24
  • vMA is deprecated https://blogs.vmware.com/vsphere/2017/04/vsphere-management-assistant-deprecation.html – jspinella Jan 03 '21 at 18:30
1

While possible (probably/generally), I don't think an automated shutdown of a computer on battery power is a good idea. If you're going to do that, then for most practical intent and purposes, you should probably just save yourself the money of a battery-backed UPS, and let the loss of power shut down your machine for you. (Granted, a clean shut down is always preferable to a power loss, but you seem to be missing out on the point of having a battery time of longer than a couple minutes if you automatically shut every thing down when you lose the power feed.)

The way I've always handled it is to have monitoring alert the SAs when the power goes down, so the SAs can use their grey-matter to decide when (or even if) to shut down the servers. If it's a brief outage, it may not be a good idea to shutdown the servers at all, or you may want to leave some servers up and running as long as possible, and only shut them down before the battery's about to die. Really seems to me like a decision-making task better suited for a human than a simple rule.

HopelessN00b
  • 53,385
  • 32
  • 133
  • 208
  • You don't have to configure your UPS to trigger a shutdown immediately, but you do want it to shut off before the batteries drain completely else you will have to buy more batteries, particularly in some of the locations where I work and the power goes daily. It is great to get human's involved of course, but you don't always have a System Administrator in a remote office. – dunxd Jan 04 '13 at 15:39
  • @dunxd Good point - I'm more accustomed to HA environments where at least some of the servers have to stay up, come hell or high water, so the name of the game is figuring out how to best ration to the power (selectively shutting down devices) to create the least service impact possible, which won't be everyone's focus or use-case. – HopelessN00b Jan 04 '13 at 16:05
1

In the olden days of baremetal installations, APC PowerChute Plus was an essential part of my install process. Using the simple serial signaling cable and their Red Hat-only binary, it was easy to setup rules to govern a locally-attached server. Basic email notifications for UPC battery events, line power events and shutdown actions were available:

POWERCHUTE MAIL MESSAGE
Message from PowerChute@Bonanza:

UPS on battery: Blackout 000.0 V. 

and

POWERCHUTE MAIL MESSAGE
Message from PowerChute@Bonanza:

Normal power restored: UPS on line.  

or

POWERCHUTE MAIL MESSAGE
Message from PowerChute@Bonanza:

Shutdown started.  

Plus a reasonable interface to see what was happening...

enter image description here

That software eventually went commercial (or was buried on the APC website). There were a few open-source approaches to provide something similar. But this all gets complicated with single VMWare ESXi hosts.

It seems like this is something that VMWare should have incorporated into the base hypervisor. It's basic and could offer a decent level of protection for users. The most common remedies I see now are USB passthrough to a dedicated VM, a network daemon approach or doing what I do; not configuring any automatic or battery shutdown...

Granted, I typically go with a UPS that can support the system load for an hour or more, but extended outages DO happen. Maybe an alternative is to collect a few low-cost or refurbished network interface cards and plan to buy SmartUPS devices as a minimum...

ewwhite
  • 194,921
  • 91
  • 434
  • 799
0

Check out the following link. Not the most elegant solution, but a very practical, very straightforward solution. There are possible drawbacks in terms of security (depending upon your particular network design, the guests loaded on the Hosts, and the access users have to those guests but you get to make that call.

HopelessN00b
  • 53,385
  • 32
  • 133
  • 208
0

I used MrMajestyk solution and only changed the ssh access via plink with ssh access without password using rsa public key. The rsa key generated in the apcupsd VM must be included in /etc/ssh/keys-root/authorized_keys of the vmware host.