Diagnosing a Kernel Panic

5

I have a PC that's running Ubuntu 9.04 with the KDE desktop installed. I use it as a file/printer/IMAP server. Usually, I switch it on and then use some other computer to work on. After a few hours of just sitting around with just the login prompt on screen, the system panics - the scroll lock and caps lock keyboard lights flash.

I'd like to fix this.

However, being a linux noob, I have no idea where to start.

So, the question is, what's the best way to diagnose the cause of the panic?

I have googled a bit but often the solutions, if any, are vague. Ideally, the answer would end up being a flow chart of the steps needed to narrow down the cause of the panic.

See my response below for further details and questions.

Skizz

Posted 2010-01-11T14:48:32.463

Reputation: 992

Yes, I know I've accepted my own answer, but I did mark it CW, and it is the whole process as I went through. – Skizz – 2010-01-14T09:40:11.513

Answers

2

Here's what I've done (feel free to make corrections to any suggestions below):

Update Software

I noticed in the console view mentioned below that there was a call to bitmap_weight just before the panic message. I looked on nVidia's website and found a new version of the video driver, so I downloaded and installed that. I also ran the update manager to update all the software on the machine. I'm still getting the panics but it seems to be longer between instances. I guess it is always advisable to make sure you've got all the latest updates. UPDATE: No, the panics haven't changed, even the Ctrl-Alt-F7 console displays the same messages (How do I get this written to a file?!).

Memtest86+ (link)

If it doesn't already appear, pressing 'esc' during the boot up sequence displays the grub menu. On this menu is an option to run Memtest86+. On Ubuntu 9.04 this is V2.11. There is a V4.00 available on their website. You will need to download the ISO CD image, burn a CD with it, restart the computer and boot from the CD. For my problem, the default tests didn't highlight any problems. Pressing 'c' displayed a configuration menu and here there is an additional test that can be performed - the bit-fade test. This one takes a long time to run (it is currently running as I type this). If this does highlight a problem, try replacing the memory chips and repeat the test. If it still fails then you probably need a new motherboard.

Testing results: I checked the PC this morning and memtest was still running - 9 hours with no errors. It did confuse me at first since it was doing a bit fade test, the timer said 20 minutes. I thought it had rebooted, but in fact the timer is just the time since the start of the bit fade test. Swapping back to default tests causes the displayed time to show the total up-time. So it seems the memory is OK.

kexec-tools and console view

I have now installed kexec-tools although it is a complex beast so I don't think I'll get anything useful for now. In doing so, however, I came accross a page that lists some useful keyboard shortcuts. At the login, I pressed Ctrl-Alt-F8 to display the console output. I left the machine running and it panicked - the console did display some messages, one of which was a panic message. Now, it would be really useful if this output had been saved to a file as it only had about 25 lines of message visible. Does anyone know where this file is or how to get it to save the output?

So, in the X console (the Ctrl-Alt-F8 screen) one of the functions in the log was bitmap_weight, so I though it might be the video card.

Turning Off the Video

In the /etc folder are a set of folders called rc0.d, rc1.d, etc, and these contain a set of scripts used to set the system up. Normally, the scripts in rc2.d are executed when you turn the system on. In here, the script to start up the window manager are called S30gdm for Gnome and S30kdm for KDE. Rename these as K70gdm / K70kdm and reboot the system. You now have a text prompt login, the GUI is disabled. Doing this, the system was far more stable, it stayed alive all night, something it hadn't previously done. I have an nVidia video card, and checking their web site I saw that there has been an update to the video driver recently. I have now installed this and will see if the problem persists. I should point out here that I do keep the system fairly up to date with any recent updates and I think there was a kernel update which might have affected the old video driver.

Finally, to restore the GUI, rename K70gdm and/or K70kdm back to S30gdm / S30kdm and reboot.

Skizz

Posted 2010-01-11T14:48:32.463

Reputation: 992

It seems that the video card was to blame, replacing it with an older one I had, I've been able to run the system for days without any problems. Of course, now I've written this, it'll all go to pot. – Skizz – 2010-06-07T09:13:39.577

0

My approach would be to get the full output of the kernel panic (as output to the console) and Google the driver/subsystem that caused the panic. This would be found in the output near the bottom.

Launchpad would be a good place to search for Ubuntu specific problems.

Also, a hardware failure cannot be ruled out but at the same time, it could be a bad driver.

Sometimes these problems are difficult to diagnose unless it's something that has been seen by others.

EmmEff

Posted 2010-01-11T14:48:32.463

Reputation: 1 277

OK, so where do I get the output? I can't remember if I have a frozen login or a blank screen. Can I get this output after the machine has been rebooted? – Skizz – 2010-01-11T15:53:17.863

Hi Skizz, first you need to establish if your kernel is configured to 'capture' a crash dump. For ubuntu, research "dumputils". Also the 'panic string' might be in the syslog (/var/log/messages) - there should be a 'stack trace' - this will help narrow things down. – Aaron – 2010-01-11T17:16:42.797

I've looked in the /var/log/messages file and there wasn't much help there, however, I since discovered the buffered log writing option (I forget which file that's in exactly at the momemnt) so there might be something there next time. – Skizz – 2010-01-11T17:49:56.163

0

It'll panic just idling? Try memtest (it should be in the escape menu in grub).

Successful memtest uptime is measured in hours.

Broam

Posted 2010-01-11T14:48:32.463

Reputation: 3 831

This is certainly something to do early on in the diagnostic process and I'll give it a go. – Skizz – 2010-01-11T15:52:07.697