7

This is my basic setup:

  • I run a server (DL380 G7; linux 3.13 kernel) that is host to ~10 virtual machines
  • It is set for automatic power on
  • I use NUT for UPS management
  • Graceful shutdown duration of the Host (including first shutting down the VMS) is ~8-10 minutes
  • Total runtime of the UPS (I have 2, each powering one PSU in the server and one PSU the attached storage) on fully charged batteries is ~75 minutes.
  • I have set the levels of UPS/NUT so that I have the critical level (LOWBATT) i.e. initiate shutdown at 15 minutes remaining (I dare not go lower)

The following scenario that has so far happened to me twice during the last 12 months:

  • Power loss, UPS take over just fine
  • Power remains off for about 1 hour -> shutdown initiated, as it should be
  • The server stops the vms, begins shutdown procedure
  • --> sometime here power comes back
  • Server completes shutdown and powers off
  • Server does not come back online, since the UPS has power (again) and the server actually never lost power (being supplied by the UPS), so basically it looks to the server as if it had been an intentional graceful shutdown.
  • As soon as I become aware I remotely power on the server via ILO [last time this happened was today at 03:46am :-), so that is why I am asking ]

As ewwhite has pointed out, the specific UPS models would be helpful:

  • Eaton 5PX 2200VA, with +1 EBM
  • Roline Prosecure II, 1500VA RM2U, with +1 EBM

Have any of you run into the same problem? Is there an out of the box solution with some UPSes?

So far I have considered setting up some low power linux device (Raspberry Pi?) to take over the monitoring; it would check the ups units for sufficient charge of the batteries and input power status and then restart the server via ILO/IPMI.

Is any automatic solution just too much bother (for my case and in general) and should I just go with manual intervention when and if it happens?

regards

martin

ewwhite
  • 194,921
  • 91
  • 434
  • 799
martin
  • 73
  • 1
  • 1
  • 5
  • You should specify the make/model of UPS involved here. – ewwhite Aug 04 '14 at 12:40
  • an excellent idea – martin Aug 04 '14 at 12:45
  • A cheap lower power device to trigger the start of the servers sounds like a sensible approach. That low power device wouldn't even need to be powered by a UPS, as it is only supposed to be doing something when utility power is available. In order to avoid looping power cycles, it is crucial that the servers are only started once battery power reaches sufficient charge (as you noted), it may also be a good idea to require utility power to have been stable for some time. – kasperd Aug 04 '14 at 14:03
  • @kasperd: I was thinking along those lines as well. I guess if there is any reason I don't want to follow ewwhite's advice and stick with one UPS, I'll just give it a try with a Pi. I'll just have to think of some ruleset/criteria what actually represents "stable power is back","not yet" etc. – martin Aug 04 '14 at 14:31

3 Answers3

1

This is a case where you shouldn't be using two UPS units, where each feeds a power supply. That may be a big part of your problem, as a single UPS can restore the previous power state following an outage (this is the default in the HP ProLiant BIOS as well). Having two seems to mess up this logic.

Are you connected to the UPS via serial or USB cable?

See the specific suggestions at:
How to wake a server after UPS Shuts it down when Mains power is restored?

This should be easy to test, but to be honest, I spend very little time dealing with these edge cases. Server room power is one of the easiest things to plan for, in that you can spec x-hours of battery runtime and be able to ride through power-loss scenarios like this.

If the outages are longer, I just make sure I can remote in and handle things manually.

ewwhite
  • 194,921
  • 91
  • 434
  • 799
  • Thank you for your answer. You first link really is the very question I have meant to ask (but I did not find it when searching); the second thread shows me that your setups are a wholly different league than my own, but I can see where the problem is. My UPS are both connected via USB, by the way. – martin Aug 04 '14 at 13:03
  • @martin I'd just go with one UPS. By default, it should work the way you expect. – ewwhite Aug 04 '14 at 13:04
  • I bring this up one more time :). Do you think it makes sense to run the second UPS as a "dummy"? So to have its power output connected as it is now, but not the data line and only run the bigger of the two (2200VA) with intelligence i.e. configured and connected to nut. I figure that would give me longer uptime, wouldn't it? The main reason I even have 2 UPS units is that at the time I was in the market for them, 2 smaller UPS were cheaper than a single bigger one of their combined capacity; that might be different now. – martin Aug 04 '14 at 14:11
  • @martin I'm not sure. UPS devices are intended to help with gaps in utility power. If there are this many issues where you need the increased runtime, is there any value in trying to stabilize power at the facility or with the utility provider? – ewwhite Aug 04 '14 at 14:13
  • Generally power supply stability is not too bad (in my understanding), but it is a at-least-bordering-on-rural area with frequent thunderstorms in summer and late autumn. So, tiny gaps (<= 1s) outages are quite frequent and were one big source of agony before I had any kind of UPS. Unfortunately there seem to be also some general weaknesses in the power grid in my area, as longer outages (>= 30 minutes) are prone to happen especially after heavy rainfalls (flooding) and storms (usually some falling tree chopping off a line). Not necessarily ideal region for running servers, I guess. – martin Aug 04 '14 at 14:28
1

An alternative solution that requires no hardware change is to setup the shutdown process to reboot if the UPS has power after all the VMs have shutdown. This will involve figuring out where in the shutdown process you can put your init script and you need to make sure that nut doesn't get closed beforehand as you need it to communicate to your UPS.

Are you sending a shutdown command to the UPS at the end of the server shutdown? If not you could consider also the option to do that and then you can set the delay until shutdown so your server really finished the shutdown and also a timeout after the ups has gone down and until it powers up the server after power is back. If power is back before the shutdown completed you will still have your server powered down completely by the shutdown command but it will be brought back up after some timeout.

Check the NUT upscmd shutdown.return and the associated timeouts.

Baruch Even
  • 1,043
  • 6
  • 18
1

A ghetto solution, but it works. Get a small mikrotik router or a Linux board, and put wake on LAN in it. Place the device without ups backup, configure it to send wake on LAN every minute or 30 sec. So when it loses power, it sends no commands, but when it gets power it sends WOL every 1m or 30s. So your server never shuts down when power is on.

Cory Knutson
  • 1,866
  • 12
  • 20