How to troubleshoot a hardware problem on linux?



Just to note I am not having a problem at the moment, but have had previously so it sparked my curiosity...

When a computer locks up suddenly to so caps lock flashes incessantly and the only possibility to do you troubleshoot what is causing it? On Windows there would be some errors in the event log...on Linux it seems there is no opportunity for anything to be written to the log, making it hard to troubleshoot...

In this case, how would you troubleshoot the problem through linux?


Posted 2010-05-04T12:49:08.157

Reputation: 1 621

1Sudden H/W lockups rarely get logged by any operating system. – kmarsh – 2010-05-04T13:01:23.813

Well, they do on windows, even if it is vague.... – Jack – 2010-05-04T14:20:30.510

not always. depends on the problem; if it's a true hardware freeze, the first indication Windows will give (in the error logs) is that it's rebooting. (BSoDs are not true hardware lockups in this sense.) – quack quixote – 2010-05-04T20:37:15.900

I've had Windows report of CPU problems, memmory problems etc. It seems Linux has no such capability for reporting, and I must rather use other diagnostic tools. Correct? – Jack – 2010-05-05T01:36:12.487

2Flashing caps lock indicates a kernel panic (which is more or less the same as the BSoD on Windows). It's not necessarily a hardware problem, it could be a bug in the kernel/drivers. – Marius Gedminas – 2010-07-30T11:27:15.027



Try booting memtest86+ from bootable media and see what it says about your memory and memory subsystem integrity.

Also, the last job started might get logged in Cron to /var/log/syslog or /var/log/messages.

If not, and debugging this issue on an ongoing basis, you could set up auditd and a cron job with ps to log system activity and what jobs are running on a continuous basis.


Posted 2010-05-04T12:49:08.157

Reputation: 4 632

No I said I am not having any problem at the moment, I just want to know the equivilant way to see hardware problme on linux, as I can on windows. – Jack – 2010-05-04T13:20:02.083


Kernal devices will report problems to dmesg, which may be logged separately as well, or in kern.log.

For serious problems, a POST diagnostics board may be used.


Posted 2010-05-04T12:49:08.157

Reputation: 1 290


Logs are the first place to look, as kmarsh says, but if the logs don't tell much in the case of a serious HW failure, then it doesn't matter what OS you use, it just takes some old school trial and error.

Determine if it is a hardware issue by running a live CD, otherwise it could be a driver issue misdiagnosed as hardware failure.

HW lockups are random, but frequent. I'd start with removing graphics cards (use on-board or backup cards), network cards or (gasp) modems if you have any, one at a time until you pinpoint the culprit. Run with one memory stick at a time (if you have x2) or swap out for other sticks while testing.

Your PSU could also be failing, sometimes adding a new card eats your watts, starving the CPU if your PSU isn't powerful enough, causing random fails.

If nothing else gives a lead, it could be your main board (usually corrosion if it's 2+yrs depending on the humidity where you live) or CPU.

Use software to monitor CPU temperature, overheating can cause lockups too.

After trying everything under the sun, with no luck, it might be time for a new PC ;)


Posted 2010-05-04T12:49:08.157

Reputation: 4 918

Driver errors can happen on a live CD just as on a fully-installed system. It all matters on what drivers the system is using. If you use only generic drivers and it still happens, THEN it would be a HW issue. – Kevin M – 2010-05-05T03:45:30.173


On most linux' today, you should be able to have an MCE log (Machine Check Exception) which may be decoded to find the actual hardware errors ( Also, you may run a Kernel Crash Dump, a kernel that runs the linux kernel you're using daily, and with this capture the incident and debug the cause

Sverre Marvik

Posted 2010-05-04T12:49:08.157

Reputation: 361


Nowadays whenever a previously working setup starts to misbehave, I don't even bother to read logs or anything like that first. Driver quality etc. is today so good that most of the sudden death -bugs have been ironed out and some hardware thing is more likely than a software bug. And even the most perfect code cannot fight against physical problems.

Some time ago my laptop started to act strangely. While watching a movie or compiling code or doing anything even relatively CPU intensive, everything suddenly got a lot slower. Moving windows took anything between 1-15 seconds. CPU frequency dropped from 2 GHz to 800 MHz and decided to stay there. Even the idle temperature was around +60C. Every now and then the whole thing did lock up.

After cleaning up the dust inside the laptop things got back to normal. Idle temperature +35-40C, no slowdowns.

OK, that one was a quite straightforward to trace due the heat and due the excessive amount of dust inside the laptop. :-)

If something more tricky pops up, I usually first let memtest86 run overnight and see if that gives me any results. If it doesn't, I fire up cpuburn or some similar program and see if that makes my computer to crash. If that doesn't help, I move on to torturing the hard disk with bonnie++ or iozone and see if that crashes something. Then I move on to 3D tests, such as playing PPRacer.

If I'm unable to get a controlled crash after all those tests, I move on to examining for more obscure stuff. Perhaps USB autosuspend is to blame? Or something even more odd.

In one case the computer did lock up every time a webcam software was started. After spending way too much time in configuring kernel parameters and so on, lsusb revealed something embarrassing. The webcam was connected to a USB 1.1 port instead of USB 2.0 port. After connecting the cam to USB 2.0 port it started to work.

Janne Pikkarainen

Posted 2010-05-04T12:49:08.157

Reputation: 6 717