My client's HP ProCurve 5412zl chassis switch reboots on occasion, despite being powered through four redundant power supplies and being under UPS protection.
These reboots usually happen during a real power outage or during a brown-out or low-voltage event. All of the equipment attached to the UPS stays up except for the switch.
The UPS for the rack is an APC SmartUPS SUA3000XL 208V with step-down transformer. This switch provides PoE for phones and access points throughout the facility. The battery cells are healthy, replaced recently and have a full charge.
These blips have the effect of rebooting all of the phones in the facility and disconnecting users from their sessions. It's disruptive.
In the switch logs:
Keys: W=Warning I=Information
M=Major D=Debug E=Error
---- Event Log listing: Events Since Boot ----
I 02/17/16 22:26:31 03802 chassis: System Self test started on Master
I 02/17/16 22:26:31 03803 chassis: System Self test completed on Master
I 02/17/16 22:26:35 00061 system: -----------------------------------------
I 02/17/16 22:26:35 00062 system: Mgmt Module 1 went down without saving crash
information
M 02/17/16 22:26:35 03001 system: System reboot due to Power Failure
And version information:
valley-core# sh version
Image stamp: /ws/swbuildm/rel_orlando_qaoff/code/build/btm(swbuildm_rel_orlando_qaoff_rel_orlando)
Nov 19 2014 15:17:26
K.15.16.0005
335
Boot Image: Secondary
For years, I didn't realize that you have to modify the power supply settings on this switch model, but this unit is configured properly to take advantage of the multiple PSUs.
valley-core# sh power-over-ethernet
Status and Counters - System Power Status
System Power Status : Full redundancy
PoE Power Status : Full redundancy
Chassis power-over-ethernet:
Total Available Power : 600 W
Total Failover Power : 600 W
Total Redundancy Power : 600 W
Total Used Power : 359 W +/- 6W
Total Remaining Power : 241 W
Internal Power
Main Power
PS (Watts) Status
----- ------------- ---------------------
1 300 POE+ Connected
2 300 POE+ Connected
3 300 POE+ Connected
4 300 POE+ Connected
External Power
EPS1 /Not Connected.
EPS2 /Not Connected.
Additional PSU information:
valley-core# sh system power-consumption
Slot Power Usage:
Slot Module Description Current Power
----- ----------------------------------------- ---------------
A HP J9534A 24p Gig-T PoE+ v2 zl Module 18 W
B HP J9536A 20p GT PoE+/2p SFP+ v2 zl Mod 23 W
C HP J9534A 24p Gig-T PoE+ v2 zl Module 18 W
D HP J9534A 24p Gig-T PoE+ v2 zl Module 19 W
E HP J9534A 24p Gig-T PoE+ v2 zl Module 17 W
F HP J9534A 24p Gig-T PoE+ v2 zl Module 18 W
G HP J9534A 24p Gig-T PoE+ v2 zl Module 18 W
H HP J9534A 24p Gig-T PoE+ v2 zl Module 18 W
K HP J9534A 24p Gig-T PoE+ v2 zl Module 18 W
L HP J9534A 24p Gig-T PoE+ v2 zl Module 19 W
valley-core# sh system power-supply
Power Supply Status:
PS# Model State AC/DC + V Wattage
---- --------- ------------- ----------------- ---------
1 Unknwn Powered AC 120V 875
2 Unknwn Powered AC 120V 875
3 Unknwn Powered AC 120V 875
4 Unknwn Powered AC 120V 875
4 / 4 supply bays delivering power.
Total power: 3500 W
What's unique is that the switch is the only device losing power. None of the connected servers have power issues, despite being on the same battery or PDU.
I can admit that the power in this location is poor and suffers from voltage dips and the occasional spike. But the UPS didn't even log a fault during this recent warm-boot.
I have another 5412zl at an unrelated customer that has done the same thing multiple times in the past.
Any thoughts on what I can do about this? Should I try to move two of the PSUs to utility power instead of all being on the UPS?
Edit:
Boot history shows:
valley-core# sh boot-history
Mgmt Module 1 -- Saved Crash Information (most recent first):
=============================================================
ID: 29008d6a
Active system went down: 02/01/16 09:23:54 K.15.16.0005 335
Switch rebooting due to temporary loss of power or low voltage
ID: 994a405a
Active system went down: 12/14/15 11:31:15 K.15.16.0005 335
switch rebooting due to temporary loss of power or low voltage
An HP change note on a previous firmware revision says:
Power (CR_0000112424) - When the switch is exposed to AC power fluctuations and the voltage drops too low, the switch reboots and generates an incorrect error message saying the switch crashed. With this fix, the error message is changed to "Switch rebooting due to temporary loss of power or low voltage".
This is consistent with this tech note.