Unable to debug/reproduce Ubuntu crash

2

1

I am generally stumped with this Ubuntu crash I am having. The machine freezes, mouse does not move. Cant escape to a terminal, cannot quit X, Apport reports nothing.

I am have been trying for weeks to reliably reproduce the crash, but I cant determine any patterns. Sometimes it will work for days. But usually when it crashes, it does so again after the next several boots in about 5-10 minutes.

This is a brand new Thinkpad T410. It came with Win 7, which I kept and now have it dual booting with Ubuntu 10.04.

I have tried running Win 7 to see if it crashes there, because I thought it might be a hardware defect, but it has not crashed there. I am not ruling this possibility out though since the crash happens so sporadically.

In ubuntu I have tried several kernals: 2.6.32.(21-23) and in recovery mode and they all suffer from this problem.

I have been googling and checking all the system logs looking for something suspicious and have found nothing. The logs are kind of poluted with info about my wifi card, but there are no errors.

Looking for ideas of what to try next.

Jono

Posted 2010-07-06T18:31:07.507

Reputation: 121

Answers

0

I seemed to have fixed the problem. Since installing the new Ubuntu (10.10), I have not experienced a crash yet. It has been 2 weeks. It is using kernel 2.6.35-22 64 bit.

I realize this does not answer my general question about how to debug very deep crashes, but it solves my issue.

Jono

Posted 2010-07-06T18:31:07.507

Reputation: 121

Sigh. And so it goes. Someday I'll learn how to debug such crashes efficiently. I'm convinced kernel hackers don't do the crazy things regular users do. – mlissner – 2011-10-20T01:48:44.343

2

Have you tried Memtest86? It could be a RAM issue if your system randomly locks up. Run it overnight and see if you get any errors.

Is the laptop still under warranty? If so, then just ask for a replacement, as annoying issues now will become big problems later

TheLQ

Posted 2010-07-06T18:31:07.507

Reputation: 2 738

Yes. I have tried memtest. Nothing to report there. Comes with the default 1 year warranty, but I am worried it is going to come back with the same problem since I cant really explain it to them. – Jono – 2010-07-06T19:06:07.490

@Jono and they may be less than willing to help because it is an Ubuntu issue. – Jarvin – 2010-07-06T19:41:56.577

Well you should first try the other answer's here first, then try and return it. It"s worth a try – TheLQ – 2010-07-07T19:44:05.970

2

Along the same lines as MSW, if you have sshd running, you can try to ssh into your computer from a different computer (if you have one available). It might be that X is just frozen up, but the computer is still responding.

In all my years of linux use and helping others with there linux problem there are only a handful of times where I've seen a computer running linux actually freeze up (could not SSH into it). These situations were almost always hardware related (except the time I ran a forkbomb for fun... There was another which was a driver issue, but I don't remember if that was fully frozen or not). Hopefully if you can SSH in you will have more tools to diagnose the problem. If you can't its probably hardware... though it looks like you've already checked memory, which would be my first guess.

Another suggestion: Try disabling Compiz (this can be done in the appearances menu by setting special effects to none).

EDIT: Alright, so sounds like it is more than just X crashing. The fact that windows works is interesting though. Try disconnecting as much hardware as possible (printers, wifi card, etc) and then disabling your wifi drivers. Turn off as many components of Ubuntu as possible. Perhaps unloading unused kernel modules. Disable X. Turn off unneeded services. If the problem persists at least you've ruled some things out... If not, try enabling/reconnecting one thing at a time.

This could take a bit of effort... an easier thing to try which may solve the problem is a fresh install of Ubuntu (if you back up /home/, /etc/, and a list of your installed apps, and you won't know the difference). This may not solve it given (it sounds like) you currently have a pretty fresh install, but if it does it has the promise of pretty minimal effort.

Jarvin

Posted 2010-07-06T18:31:07.507

Reputation: 6 712

I have disabled compiz and I have tried the SSH test. The server stops/cannot be reached when the machine is frozen. – Jono – 2010-07-06T19:45:31.970

1

Turn on the X escape hatch:

$ gnome-keyboard-properties

and then Layouts ➤ Options... ➤ Key sequence to kill the X server ➤ enable. If X is seizing your machine (probably through a video driver) this might get you out to check on more state.

/var/log/Xorg.0.log and ~/.xsession-errors may have useful information for you. Finally, if your logs are getting full of wireless card messages, it shouldn't be and unfortunately the two could be related. Post an example of the WiFi errors and their frequency.

update:

I'm leaning to the bad hardware, bad video driver side now with a strong preference for driver as Win7 works. If you are trying to use the Nouveau experimental drivers for your nVidia Quadro 3100M, don't. Canonical really pushed that before prime-time.

$ sudo apt-get remove  xserver-xorg-video-nouveau libdrm-nouveau1
$ sudo apt-get install jockey-gtk nvidia-current nvidia-settings \
                       xserver-xorg-video-nv

Don't worry if the apt-get remove complains that stuff isn't there as you don't want it to be. I'd still like to see your Xorg.0.log file.

update 2:

Thanks for the Xorg.log. That NVS 3100M graphics chip in your T410 is quite literally schizophrenic, as it will behave like an nVidia Quadro or Intel 8xx depending on... something. Your X server is treating it as an Intel chipset.

(II) intel: Driver for Intel Integrated Graphics Chipsets: i810,

Which should work, but who knows. Please confirm whether your Windows system thinks your graphics chip is nVidia or Intel. I am now convinced this is a driver bug.

msw

Posted 2010-07-06T18:31:07.507

Reputation: 3 287

I have these keys enabled, and they do not get me out of X in a freeze which makes me think it is not X related.

Here is my .xsession-errors file: http://pastebin.com/X6rH6spV

and my syslog: http://pastebin.com/raw.php?i=1VGN9aJK

– Jono – 2010-07-06T19:35:20.687

Both of those files look clean, and the rtl8192 logs are just informational and not too frequent, so that's a cold trail. – msw – 2010-07-06T21:50:41.177

Here is my Xorg.log http://pastebin.com/raw.php?i=SXPz1itm

– Jono – 2010-07-07T16:08:34.107

I dont believe I have a NVS 3100M. This is an i5 processor with Intel integrated graphics and did not come with an additional graphics card. It is using the i915 driver in linux and the "Intel Graphics Media Accelerator HD" in windows. nouveau is installed however. – Jono – 2010-07-07T19:13:50.927

€0'05 says it's nouveau, I told you what you could do, oh well. – msw – 2010-07-08T03:13:08.350

@msw. I am not sure what your last comment means. But yes I will try switching to nv from nouveau once I figure out how to uninstall it without uninstalling all of X. Thanks, and I will get post results soon. – Jono – 2010-07-08T15:18:49.450

Wait. Even though I have nouveau installed, I am using i915. Even if I installed nv, it would still be using i915. Could there still be a conflict? – Jono – 2010-07-08T20:59:58.770