Nowadays whenever a previously working setup starts to misbehave, I don't even bother to read logs or anything like that first. Driver quality etc. is today so good that most of the sudden death -bugs have been ironed out and some hardware thing is more likely than a software bug. And even the most perfect code cannot fight against physical problems.
Some time ago my laptop started to act strangely. While watching a movie or compiling code or doing anything even relatively CPU intensive, everything suddenly got a lot slower. Moving windows took anything between 1-15 seconds. CPU frequency dropped from 2 GHz to 800 MHz and decided to stay there. Even the idle temperature was around +60C. Every now and then the whole thing did lock up.
After cleaning up the dust inside the laptop things got back to normal. Idle temperature +35-40C, no slowdowns.
OK, that one was a quite straightforward to trace due the heat and due the excessive amount of dust inside the laptop. :-)
If something more tricky pops up, I usually first let memtest86 run overnight and see if that gives me any results. If it doesn't, I fire up cpuburn or some similar program and see if that makes my computer to crash. If that doesn't help, I move on to torturing the hard disk with bonnie++ or iozone and see if that crashes something. Then I move on to 3D tests, such as playing PPRacer.
If I'm unable to get a controlled crash after all those tests, I move on to examining for more obscure stuff. Perhaps USB autosuspend is to blame? Or something even more odd.
In one case the computer did lock up every time a webcam software was started. After spending way too much time in configuring kernel parameters and so on, lsusb revealed something embarrassing. The webcam was connected to a USB 1.1 port instead of USB 2.0 port. After connecting the cam to USB 2.0 port it started to work.
1Sudden H/W lockups rarely get logged by any operating system. – kmarsh – 2010-05-04T13:01:23.813
Well, they do on windows, even if it is vague.... – Jack – 2010-05-04T14:20:30.510
not always. depends on the problem; if it's a true hardware freeze, the first indication Windows will give (in the error logs) is that it's rebooting. (BSoDs are not true hardware lockups in this sense.) – quack quixote – 2010-05-04T20:37:15.900
I've had Windows report of CPU problems, memmory problems etc. It seems Linux has no such capability for reporting, and I must rather use other diagnostic tools. Correct? – Jack – 2010-05-05T01:36:12.487
2Flashing caps lock indicates a kernel panic (which is more or less the same as the BSoD on Windows). It's not necessarily a hardware problem, it could be a bug in the kernel/drivers. – Marius Gedminas – 2010-07-30T11:27:15.027