5

Linux server hangs for several times without knowing the exact reason. Which file or log should I look at? The server responds to ping, but I can not ssh to it. And when I plug a monitor into it, it does not have any response. Any idea?

mattdm
  • 6,550
  • 1
  • 25
  • 48
snow8261
  • 169
  • 1
  • 1
  • 6
  • Start with the contents of /var/log. – user9517 Aug 22 '14 at 06:25
  • 1
    Sometimes you see such symptoms if the system is running low on RAM. If you are lucky, the system managed to log something in `/var/log`, that will tell you, if it is running low on RAM. – kasperd Aug 22 '14 at 07:57

1 Answers1

7

Kernel hangs are difficult to debug as no oops message is displayed on screen as in case of crash and if you are really lucky you will see something in /var/log/messages as during hang your entire system hangs along with syslog daemon and nothing will be write inside these files.

With that said hangs can be as simple as temporary performance issue caused due to memory or cpu contention,using inefficient algorithm or may be as complicated as deadlock.So like I mentioned above if you are really lucky

1: Check in /var/log/messages or may be run dmesg to get some pointer 2: If your system is hanging on regular basis then configure kdump along with sysrq keys to know the exact problem.

For more info please refer to http://people.redhat.com/anderson/crash_whitepaper/

Prashant Lakhera
  • 683
  • 1
  • 9
  • 25
  • i checked the /var/log/messages,it did not log anything during the period when it hangs. – snow8261 Aug 22 '14 at 06:49
  • 1
    Not all Linux variants use /var/log/messages. – user9517 Aug 22 '14 at 06:52
  • Yes as I mentioned above that is expected,the other thing you can check if you have sysstat package installed is /var/log/sa/sa and see if there is any performance issue during that period.I really doubt about this too as I believe system is not logging for sar also whenever system hang happens.If you still not able to find something in it best way is to configure kdump along with sysrq and replicate the issue. – Prashant Lakhera Aug 22 '14 at 06:53
  • there is no sa,just yesterday's. – snow8261 Aug 22 '14 at 07:11
  • May be I over complicate the stuff but it should be like this -rw-r--r-- 1 root root 251376 Aug 21 23:50 sa21 -rw-r--r-- 1 root root 275239 Aug 21 23:53 sar21 -rw-r--r-- 1 root root 45584 Aug 22 04:10 sa22 so to read today file run sar -A -f sa22 >> /tmp/sarlog and then check to see if you see any performance issue during the time of issue like load average,memory utilization,iowait,user space vs kernel space utilization,for more info you can refer http://www.thegeekstuff.com/2011/03/sar-examples/ – Prashant Lakhera Aug 22 '14 at 08:12
  • @PrashantLakhera i have read the sar report,it stops at 2:20 in the morning,and before 2:20 i did not find anything strange. – snow8261 Aug 22 '14 at 08:50
  • The point is when that issue occur,did SAR logged out data during that period and if yes did you see anything suspicious,by default SAR collect data after every one min or at the end of day(11:53pm) cat /etc/cron.d/sysstat # run system activity accounting tool every 10 minutes */10 * * * * root /usr/lib64/sa/sa1 1 1 # generate a daily summary of process accounting at 23:53 53 23 * * * root /usr/lib64/sa/sa2 -A Before we go further let me clarify you that SAR can only tell you that there is some issue but its a limitation of SAR that it cant tell you which process is causing that issue – Prashant Lakhera Aug 22 '14 at 08:56
  • I created one video related to SAR might be helpful for you,as you mentioned you are new to sysstat https://www.youtube.com/watch?v=4oIM1Yc0m7w&index=8&list=PLckUzKjgYDgajJVYOjNztS6Q4SOho0RKY – Prashant Lakhera Aug 22 '14 at 16:12
  • @PrashantLakhera thanks,the same error repeat today,it is because of there is no memory left in linux server. – snow8261 Aug 23 '14 at 01:17