how to diagnose a hard system seizure? Dell+Ubuntu

0

1

I've got Ubuntu 9.10 on a Dell Vostro 420 desktop, a little over a year old, which I use for plain vanilla work stuff (email, web, terminal, text editor). Every now and then, at totally random times, it completely freezes on me. Hard. Mouse and keyboard stop working, cursor stops blinking, clock stops moving. All I can do is hold down the power button on the front of the box to shut it off.

Sometimes it happens after several months of continuous uptime; sometimes it happens a few minutes after a reboot, while all I've done is open a terminal to look at log files, or maybe firefox to do a google search. Each time, there is nothing at all in /var/log/messages at the time of the crash. This makes it seem like a hardware problem, and indeed a few months ago I opened the box and wiggled everything and the problem went away for a while. But now it's back. I went in and checked everything, took out each RAM card and reseated. No luck. I ran all the system diagnostics (the long version) and everything passed with flying colors. Something is messed up in this box, but without any useful logs or failed tests, how in the world am I going to find it? And of course, Dell's not gonna help me cause I went and replaced Windows with Ubuntu.

What steps would you take next to track down this problem?

rob

Posted 2010-06-18T00:38:11.143

Reputation: 1

Answers

1

Here's a checklist I always follow in the situations similar to yours:

  • Keep an eye of the temperature. Last time I had this kind of problem, I put a temperature graph on my KDE 4.x desktop and quickly saw that the slowdowns/hangs were strictly related to temperature. After I opened up the laptop and cleaned the dust, everything started to work.

  • Are the fans working OK? Check the fan rotation speed.

  • Is some application suddenly and very rapidly eating up all the available RAM? See the HD activity and memory usage via your favourite application - sar, Gnome system monitor, mrtg, whatever.

  • If you have desktop effects enabled, try to disable them and see if the problem is related to 3d acceleration. And if you have 3d enabled, you might try to cause the crash with some 3d torturing, for example by installing & playing tuxracer (or ppracer, whatever it's called today).

  • If the hangs are completely random, suspect the power supply/battery. My Dell Latitude D830 has already one battery replaced already, I got this thing back in late 2007. In my case the battery just died one night - it did not recharge at all and the laptop was blinking some strange lights, but I would not wonder if a malfunctioning battery would cause sudden lockups.

And as mentioned, flaky HDs can cause all kind of funny side-effects. Try smartctl -a /dev/sda (or whatever your HD is).

Janne Pikkarainen

Posted 2010-06-18T00:38:11.143

Reputation: 6 717

0

Bad hard drives can cause freezes. Check your S.M.A.R.T status and post it up. Be aware that many drives become flaky and fail without any sign in the S.M.A.R.T status. Is the hard drive light on solid when it freezes? You can try running from a live CD for a while to see if you can reproduce the freeze. If it is not reproducible from a live CD, your probably looking at a flaky hard drive. Keeping an eye on the system temps might also provide some clues. Does it crash more when the weather is warm? Since you don't see anything in the messages log, it does not sound like a software issue.

James T

Posted 2010-06-18T00:38:11.143

Reputation: 8 515

0

You can setup a persistent Ubuntu install on a USB Flash drive (8-16 GB will do fine).
Then start using that for a while and access your data from the hard drive.
Change your BIOS boot settings to first try the USB and then the harddisk
(and, remember to avoid keeping any other USBs plugged in. Though, you can take a few trials do locate the first point in your USB ports, if you keep the Ubuntu USB plugged there, I think other USBs will not be attempted at boot time).

Use can micro-USB flash drives (like this Transcend T3 model) if form-factor is a problem.

While you continue your normal work from this USB booted Ubuntu,
keep a check for your problem reproductions.
Since the harddisk is not in the path, any problems related to it will be bypassed.

nik

Posted 2010-06-18T00:38:11.143

Reputation: 50 788