0

I am experiencing a problem with one of my servers - DELL PowerEdge 2850. It reboots unexpectedly, throwing a Event Id 6008 to the event log ("Unexpected Shutdown"). The issue started yesterday and has rebooted about 10 times yesterday and continued to do it through today.

There are no other events or errors that are logged in the event log just before the 6008 event. We haven't changed anything with the hardware. The only thing we have changed "software" wise, is we turned on a .NET service that we developed which runs on an identical server which has ran without any issues for 2 years. Other than that the software has remained the same. I have it set to do a kernel memory dump whenever there is a server failure but it isnt even doing that. I called tech support on it and we still don't have a solution. I have reseated the power supplies, switched the PDU that the server's power supplies are on, and I ran the full DELL Diagnostics tests (not the quick tests) and everything passed. I suggested to tech support could it be the power supplies but said no because it has two power supplies and one acts as a backup, so it can't be that.

Other notes, the servers are not on a USP - which is not ideal but that's the setup. I have two other servers in the room that run along the side with it and none are experiencing the same issues as this one server.

Has anyone experienced similar issues? Any insight or suggestions would be greatly appreciated!

Thanks!

sysadmin1138
  • 131,083
  • 18
  • 173
  • 296
GavinWoods
  • 145
  • 1
  • 2
  • 8
  • UPDATE: Fortunately our servers are covered on a warranty by our dealer, i worked with them on Friday and today and they suggested we just replace the server. Hopefully that will solve the problem. It likely won't be a completely new server as we have had one replaced before and they sent us everything except for new hard drives, which it is still a possibility that our hard drives are failing. – GavinWoods Jun 28 '10 at 16:01

2 Answers2

1

Reverse the last thing that you did (turning on your .NET service), regardless of how it ran in the past or how it runs on another server.

joeqwerty
  • 108,377
  • 6
  • 80
  • 171
  • in total agreement - after turning off the last change see if event happens again. I would also try to analyse the dump file- see if you can download the following utility http://www.nirsoft.net/utils/blue_screen_view.html. This might help you shed some light on your problem. – redknight Jun 26 '10 at 05:45
  • I turned them off, i'll see if it still occurs. This might be a while until i get back to you, since the problem only occured twice over the weekend. @redKnight - The problem with the dump file analyzer is that it's not recording dump files, which has indicated power issue although, the other servers arent getting similar issues. – GavinWoods Jun 28 '10 at 13:55
0

It could be a whole load of things causing it to reboot. Since you said you enabled kernel dumping and you aren't getting one, that would suggest either power is being immediately removed from the system and not giving it a chance, or the kernel deems it unsafe to write the dump to disk. Either way, it sounds hardware related to me.

You might want to turn it off and reseat the RAM, and if the hard disks are hot-swappable, just give them a little push in too. Clutching at straws, but you never know.

Going through the event log, what are the events that happen prior to the shutdown - anything unusual, or oddly consistent immediately before it powers itself off? The System log would probably be most helpful, but the Application log can sometimes show up some interesting things.

I'm not familiar with Dell servers, but if they're anything like HP ones they'll have some sort of iLO which may give you some indication of a hardware fault. I've had something similar - Windows reported an unexpected shutdown, but the HP integrated logs reported a hard disk died immediately before the reboot, and I can only assume the RAID controller threw a wobbler which Windows wasn't happy with and crashed.

Ben Pilbrow
  • 11,995
  • 5
  • 35
  • 57
  • Thank you for the feedback. I reseated the RAM and hard drives. Yeah DELL has a similar app to iLo, OpenManage, i'm going to install it now and see if it can give any more details. As for any other issues occuring, I have been seeing printer errors in the system log but they occur hours before the error so i doubt its the issue but its still a possibility. – GavinWoods Jun 28 '10 at 14:00