2

So, basically I am testing and diagnosing my UPS's as well as the power sources (PSU) of my server. For that purpose I am powering down the server "the hard way", by unplugging it from the wall as to simulate a power loss.

This I have been doing like the way described, and has helped me find which UPS's are not working properly, as well as what PSU's need change (if the server shuts down then something needs change, else everything is ok). However, I am starting to worry that constantly unplugging my server and "killing" it the hard way may cause some damage to it or my data.

This leads me to my question: Is there an alternate way of performing these tests as to minimize the chances of damaging the server or its parts? Or there is no problem in doing what I am currently doing?

Again, I am trying to determine what power sources are defective (that is, UPS is ok but server dies anyway when unplugged). The UPS's I can test by their own as to avoid trying this method with the server, but I can't figure out how to test if my PSU's can handle fluctuations and spikes without actually trying them on a live server. Any guidance is greatly appreciated.


The server in question: HP ProLiant DL380 G7 server, with Intel Xeon. I also have it with RAID 1 level on its HDD's. It has Ubuntu 16.04.3 LTS running on its SSD's.

DarkCygnus
  • 131
  • 5
  • *Some background:* I noticed that when I got power surges or blackouts my server died despite being behind an UPS. This lead me to this testing and diagnosis, as I suspect something was wrong with my UPS's or my power supply units (turns out the problem seems to be with the latter). – DarkCygnus Jan 31 '18 at 01:32
  • Boot the server into BIOS, and leave it there while playing around with the power. Optionally remove the drives first, although idle disks shouldn't have any problems with unexpected power loss. – wurtel Jan 31 '18 at 08:09
  • For anyone that is interested: I traced the problem to the PSU. Seems that "newer" PSUs need to have a UPS with 0ms transfer time (A.K.A. "on-line UPS"). Older PSUs don't need such precise transfer time. We got a proper UPS and the server no longer dies. [This answer](https://serverfault.com/a/199883/405302) and it's post helped me figure out :) Thanks to everybody – DarkCygnus Aug 23 '19 at 23:56

3 Answers3

2

You have an HP ProLiant DL380 G7. Look at the following:

The Systems Insight Display (SID) shows the health of the internal components. enter image description here

If you have an amber light on the either of the power supplies: shown on the SID or on the actual physical units, there's a problem.

You can also log into the server's ILO to check the Integrated Management Log. If you lose power suddenly, there may be an entry in the log indicating something like:

- Server reset.
- Server power removed. 
- Server power restored.

You have the option of not connecting both power supplies to the same UPS. Connect one to the power mains and observe the behavior.

Check the firmware on your system. G7 servers are old now, but by running Ubuntu, you're probably missing the HP reporting and management agents (they're optimized for RHEL/CentOS/VMware/Windows). You can download the full set of firmware for this model using this HP bootable DVD.

ewwhite
  • 194,921
  • 91
  • 434
  • 799
  • Many thanks for your answer. Indeed I have been.checking the LEDs, and when the power falls the PSU LEDs blink amber. Also, when reviving the server, they blink amber for a second before going green (specifically one PSU). Could it be further beyond my PSU and be something on the connectors the server has for such PSUs? I still am waiting for new PSUs I ordered to validate this hypothesis. Any other suggestions? I really appreciate your help – DarkCygnus Jan 31 '18 at 14:00
1

Do not unplug your UPS from the wall. I asked a similar question 9 years ago on this site and got the following answer from Evan Anderson:

The UPS is losing its electrical ground when you unplug it from the wall. While it's unlikely that anything would go wrong, the UPS designers "expect" that path to ground to remain available at all times, and if something did short during your test you might see sparks (smoke, flame, etc) when the electricity takes another path to ground. I've unplugged UPSs from the wall for testing before, but seeing a flash of "lightning" and hearing a loud "bang" coming out of a UPS during one such test gave me "religion" about not doing that again.

So if you are on a switched outlet, switch it off. If you're not on a switched outlet, consider flipping power to the breaker so that ground circuit stays connected.

As for disconnecting your servers by pulling the plugs, you shouldn't be doing any physical damage to the machines by doing that. You may corrupt non-battery-backed RAID arrays, or disrupt in-flight writes which can cause messy file systems and data loss - but your physical servers should be fine.

As for your actual problem, which is that during brownouts/blackouts/surges you still lose your servers upstream of your UPS there are a few things that might cause this:

  1. If you have dual power supplies in your servers and one of them is on UPS and one is not (which is common enough), you may have a fault in the PSU switching inside the server
  2. Again if you have dual power supplies, perhaps one of them is overloading and the server is shutting down for safety reasons
  3. Depending on the type of UPS you have, it may no longer be functioning correctly. I had a site once that had constant brownouts - 20 to 30 times a day the power would drop below 200v (normally 230v) and the UPS would go into boost mode, and sometimes the voltage would spike to 250v and the UPS would go into buck mode. This shortened the life of the traditional UPS dramatically (I typically got around a year out of the UPS). We switched to a double-conversion UPS (also called an Online UPS) which solved this issue.
Mark Henderson
  • 68,316
  • 31
  • 175
  • 255
  • Thanks a lot for this insight. After my tests I think that the problem is Option 1 (the switching), as I have tried several combinations (with, without UPS, one ppwer source with other without, etc...) and every time I experience shutdown. Also, as ewwhite indicated, I have seen the diagnostic LEDs, and when this happens the PSU LEDs blink amber for a brief moment (just when they die, and right for a moment just when pressing the power button to on). Any new suggestions based on this? Much thanks again – DarkCygnus Jan 31 '18 at 13:57
  • 1
    @DarkCygnus Take ewwhite's advice on board. I've known him for 5+ years and when it comes to HP stuff he's usually spot on. – Mark Henderson Jan 31 '18 at 14:32
  • Will do :) thanks all for your support, will try these and if problems persist ping you or well turn it into another question. Cheers – DarkCygnus Jan 31 '18 at 15:01
  • 1
    After some testing, seems that the problem are the UPS's... seems that they aren't supplying enough power that the PSU's require. Will try with a UPS that can handle such requirements and then inform here. – DarkCygnus Mar 02 '18 at 03:33
  • So, we bought new UPS's and double checked they were appropriate for server usage, but still the problem persists... however... I now suspect it may have to do with the cable connecting one PSU to one UPS, it seems that it is not supplying enough current (the back led of one of the PSU shines less brightly than the other). Again, will update when I try it and tell you guys – DarkCygnus Apr 06 '18 at 22:30
1

Two notes:

One is that the best way to connect the UPS is through the dual power supply of your servers. If either the power or the UPS (battery) fails, everything stays up.

Second: except for what was said about loosing ground, it's not bad to unplug a server (if you don't care about data corruption), except for the SSD. Depending on which SSDs you may have, it may have a super capacitor to deal with it. But, losing power could damage blocks that are being erased or written.

Edit about the dual power supply: the correct way is one power supply in the wall, the other power supply through the UPS. Wrong would be to connect only one power supply, or connect them both through the UPS. If you do, failed UPS self-tests will interrupt power, and you can't turn it off to replace the battery.

Of course, one doesn't have that luxury with servers without dual PSU.

Halfgaar
  • 7,921
  • 5
  • 42
  • 81
  • Hey Halfgaar, yes I do have SSDs as indicated on my post (at the very bottom), it is where my Ubuntu OS lives. Got 2 SSDs (for OS) and 6 HDD (raided)... and yes I usually remove the disks before attempting these tests, is that Ok also? I only conduct these tests when the server is not on use. Now, by connecting the UPSs the way you suggest, would you mind explaining how? Or better, what is the alternate way of connecting it? Thanks a lot for your help – DarkCygnus Jan 31 '18 at 14:06
  • @DarkCygnus Sorry, I missed you having SSDs. But yes, if you remove them (including disconnect power), that 100% safe. I edited my answer to describe the PSU wiring. – Halfgaar Jan 31 '18 at 14:17
  • Thanks, much clearer now :) and yes this serves does have dual PSUs, so Ille try that you suggest. Thanks again. – DarkCygnus Jan 31 '18 at 14:20