0
I have a "server" that recently has started crashing around once a week. I have yet to find a reason for the crashing but the nonrecoverable nature of the crash is unusual. The machine is standard desktop parts and only the essentials.
- AMD Phenom II X6 1045T CPU
- MSI 785GM-P45 Motherboard
- 4 x G.SKILL Ripjaws 4GB DDR3 SDRAM
- Antec PSU
- APC SmartUPS 850VA UPS
- 2 x Western Digital WD10EARS Caviar Green 1TB HDDs
The uninterruptable power supply is connected via USB and the server is configured to power down in the event of a power outage and to wake on USB trigger from the UPS or WOL.
When the server crashes it locks up and becomes completely unresponsive, no bluescreen or anything else, and the power LED flashes slowly on and off at 1 second intervals. Physically disconnecting power, waiting for 10 seconds and then reconnecting the power is the only way to bring it back online.
Does anyone have any insight into what the blinking power LED indicates?
1It's not going into standby mode is it? That's the only circumstance under which I've seen a flashing power light on commodity hardware. – Flup – 2013-03-22T15:59:24.550
@Flup - I thought the same thing but pressing or holding down the power switch has no affect. The OS reports an unexpected shutdown with nothing indicating it went to sleep or otherwise. In the past when there have been power failures the logs would indicate as much. – MyItchyChin – 2013-03-22T16:22:36.190
I supouse it is hardware related. Flashig leds or beeping are the few ways that hardware has to let know what is going on. did you check temperature? – Luis Siquot – 2013-03-22T16:45:22.280
@LuisSiquot - I don't suspect it to be a thermal issue. The temperatures are always low whenever I have checked it. The case has 4 x 120 MM cooling fans, the CPU has the stock AMD heatsink and fan, and the PSU has 2 fans internally. The server isn't under any real load when it crashes and it's configured in the BIOS to cool aggressively. It's kept in a closet with floor and ceiling ventilation in a climate controlled space. – MyItchyChin – 2013-03-22T16:57:04.573
did you open the case and see the fan over the CPU working at the moment of crash? the rest of the stuff is realy marginal if this fan fails for a while. What about the power supply how many watts is it? – Luis Siquot – 2013-03-22T17:09:30.710
@LuisSiquot - The CPU fan, and all other fans are working fine. The amount of ventilation and cooling in the case is sufficient that if the CPU fan were to fail it would still not overheat. That being said the BIOS is configured to alarm when the CPU or System fans fail, and it's configured to throttle the CPU when a thermal issue occurs. I'm confident the issue is not related to the CPU overheating. – MyItchyChin – 2013-03-22T17:57:00.967
What's the condition of all the capacitors on the board? Images of the board all show solid capacitors, and the manual is quite mute on the subject of diagnostic flashing. – Bon Gart – 2013-03-22T19:51:38.480
@BonGart - I combed through the manual and it made no mention of diagnostic flashing. I ended up replacing the PSU and so far it seems stable. – MyItchyChin – 2013-03-25T19:09:13.523