This is my basic setup:
- I run a server (DL380 G7; linux 3.13 kernel) that is host to ~10 virtual machines
- It is set for automatic power on
- I use NUT for UPS management
- Graceful shutdown duration of the Host (including first shutting down the VMS) is ~8-10 minutes
- Total runtime of the UPS (I have 2, each powering one PSU in the server and one PSU the attached storage) on fully charged batteries is ~75 minutes.
- I have set the levels of UPS/NUT so that I have the critical level (LOWBATT) i.e. initiate shutdown at 15 minutes remaining (I dare not go lower)
The following scenario that has so far happened to me twice during the last 12 months:
- Power loss, UPS take over just fine
- Power remains off for about 1 hour -> shutdown initiated, as it should be
- The server stops the vms, begins shutdown procedure
- --> sometime here power comes back
- Server completes shutdown and powers off
- Server does not come back online, since the UPS has power (again) and the server actually never lost power (being supplied by the UPS), so basically it looks to the server as if it had been an intentional graceful shutdown.
- As soon as I become aware I remotely power on the server via ILO [last time this happened was today at 03:46am :-), so that is why I am asking ]
As ewwhite has pointed out, the specific UPS models would be helpful:
- Eaton 5PX 2200VA, with +1 EBM
- Roline Prosecure II, 1500VA RM2U, with +1 EBM
Have any of you run into the same problem? Is there an out of the box solution with some UPSes?
So far I have considered setting up some low power linux device (Raspberry Pi?) to take over the monitoring; it would check the ups units for sufficient charge of the batteries and input power status and then restart the server via ILO/IPMI.
Is any automatic solution just too much bother (for my case and in general) and should I just go with manual intervention when and if it happens?
regards
martin