I have a server that has been running for well over 5 months and suddently it stop responding. I couldn't ssh into it or anything else so I decided to reboot it and the reboot fixed it.

I'm trying to figure out what happened and I'm not sure exactly where to look. I started to look in /var/log but there are tons of files in there and I'm not sure which one I should pay attention to. I'm slowly going through each one of them but if anyone can point me in the right direction, it would be great.


  • 25
  • 1
  • 3

3 Answers3


I'd start with /var/log/messages, which is going to be where most generic output defaults to. It will include boot messages and any kernel warnings. Depending on the type of issue, there may be no forensic data remaining. For example, RAM may not produce errors. Disk errors will be in the logs.

SSH might have simply broke. Without knowing status at console, it's difficult to say definitively. Typically, an otherwise stable Linux box that hasn't been changed suddenly locking up would example a hardware issue. Most hardware issues require further troubleshooting and diagnostics.

If you can provide more details, I will likely be able to give you further recommendations.

  • 23,440
  • 2
  • 57
  • 69
  • Hi Warner, I looked in /var/log/messages to see the log right before I rebooted the machine and there is nothing that would indicator something went wrong. I am running the server on Amazon EC2 so it might be possible something broke and my server was affected. I checked the disk free space and I am barely using 20%. Let me know what kind of details you need and I will do my best to provide it :) Thanks for your help! – Cerim Apr 24 '10 at 04:12
  • Amazon EC2 eliminates most potential hardware scenarios. I'd start looking at the daemons that run on the system. Apache logs, et cetera. It helps to run historical graphing-- you might look at sar. Any anomalies, anything that looks out of place. Chances are, it may be near impossible to isolate unless it recurs or you find evidence now. – Warner Apr 25 '10 at 02:17
  • It is possible though that an instance hangs if the machine on which my instance is running have a hardware failure. There is nothing is the log, and the fact that it never happened before and it didn't happen again (yet) leads me to believe its more likely to be a hardware failure ... I'll monitor the instance closely and wait until it happens again – Cerim Apr 25 '10 at 13:21

Maybe only sshd went down? Was PING to server responding? Use "monit" if you want to keep your services (like ssh, ftp,apache, etc) always up.

  • I have something like monit installed. All the things I was monitoring stopped. I block PING so I wasn't to ping it but I wasn't able to access SSH or other other services running on the machine. The more I think about it, the more I think it could be a hardware issue... – Cerim Apr 24 '10 at 06:50

Can you paste the output of the /var/log/messages that that you have, just before the server got re-booted ?

It is not possible to find out the reason for the lockup without checking the log files.

Also is the lockup recurring or was it a one off event ?

  • It happened once, I have multiple servers running and only this one was affected. I never had this issue before. The output of /var/log/messages before the reboot was a syslog-ng entry, that's the only thing in the log. syslog-ng[1774]: Log statistics ... syslog-ng[1774]: Log statistics ... shutdown[801]: shutting down for system reboot – Cerim Apr 25 '10 at 13:17