How can I troubleshoot a "Hardware Malfunction" blue screen?

6

2

Update 3/10/2011 (2): Switching harddrive slots once again allows Windows to load, but crashes are much more frequent than they were previously. It's clear that the issue has to do with the hard drive controller... Or a loose connection anywhere else in the system :(

Update 3/10/2011: After switching components back and forth, the system began to refuse to boot (no beeps, no video). I removed it from the case, and was able to get it booting again by switching the hard disks to a different slot. Windows asked to restart due to hardware changes, and I allowed it. After that, my system failed to launch windows (it reboots after the "loading Windows" screen).

At this point, my guess is that either the RAID controller or the SATA slots on the Motherboard have become damaged. I'm hoping the failure to boot isn't due to subsequent damage caused while I was troubleshooting. However, I have yet to have an actual test on the system fail, so I'm still somewhat mystified.

Update 3/9/2011: After moving my video card to a different PCIe x16 slot, I was able to run my test case without errors. Moving it back resulted in immediate errors again. I'll be spending tonight and tomorrow getting my PC sufficiently back together to run more strenuous tests on it, and hopefully make myself more certain.

Naturally, as soon as I posted the above, the test case failed with the same error message. Apparently the test case isn't as reliable as I thought it was.

My computer has suddenly started crashing to a blue screen with the following text:

hardware malfunction

call your hardware vendor for support

*the system has halted*

The crash occurs randomly during normal use. I have thus far always been able to reproduce it by transferring the contents of a large folder... But I'm not sure if this is caused by the file transfer, or simply because the transfer takes long enough for something else to trigger it.

A bit about my hardware

I have an dual core Intel CPU, and Asus motherboard. Video card is by nVidia, and connects via PCIe. My hard drives are in pairs, and connect via SATA to a RAID controller on the motherboard. They are configured to use a RAID0 configuration.

What I've tried so far

There is nothing in the Windows Event Log.

WhoCrashed was unable to find any crash records.

ScanDisk runs to completion (it launches prior to Windows load) and reports no errors.

MemTest reports no errors (to 200% coverage).

System temperatures are in the range of 40 to 50 degrees Celsius, with video card temperatures in the range of 60 to eighty degrees Celsius.

I have stripped the system down to a minimal configuration (hard drive, video card, one memory module, motherboard, CPU, power supply). The problem still occurrs.

I've reset the CMOS by removing the motherboard's battery for an extended period of time.

However, this has allowed me to rule out a few components:

It is not the video card because the problem still occurred after replacing the video card another one I had on hand.

It is not the hard drive or anything software related because the problem occurred after a fresh installation of Windows on a replacement hard drive.

It is not the hard drive cables because I replaced those with new ones and still had the problem.

It is not the power supply because the problem still occurred after replacing the power supply with another one I had on hand.

It is probably not the memory because I've tried three different memory modules in three different memory slots and was still able to replicate the issue.

It is probably not an issue of the motherboard grounding against the case because I've completely removed the motherboard from the case, and still encounter the issue.

Is there anything I can do to confirm what's causing the issue? At the moment it seems as though it must be either the motherboard or CPU, but those are both difficult components to replace... In addition, both components are relatively new (two to three years old).

I will gladly edit in any additional information I can get my hands on, and/or focus the question as I can find more details...

AaronSieb

Posted 2011-03-08T23:18:53.647

Reputation: 303

1It could still be the hard drive controller on the motherboard, did you try using a different controller? It could still be the PCIe or AGP slot, do you have on-board video you can test with? – MaQleod – 2011-03-08T23:33:47.753

1@MaQleod I only have one SATA controller, and no IDE drives. I don't have onboard video, but I'll check to see if I can use the secondary video slot. – AaronSieb – 2011-03-09T00:24:17.747

Answers

7

As you eliminated quite a few hardware items and your system doesn't allow a dump to be written, it's most likely the Processor, Memory or Motherboard as mentioned by @MattJenkins. I would suggest you to go through my post and especially this handly flowchart (link contains more information):

It seems that cleaning dust and resetting your BIOS might help too...

Tamara Wijsman

Posted 2011-03-08T23:18:53.647

Reputation: 54 163

Love the flow chart... did you make that? – Supercereal – 2011-03-09T14:30:47.333

Which part is the Motherboard Performance chart? I'm guessing the diamond labeled "Default motherboard settings"? From that point the chart only lists "runs on bench" and "CPU swap." Hmm... I might be able to swing a bench test, but a CPU swap isn't possible at the moment. – AaronSieb – 2011-03-09T14:46:46.900

@Kyle: No, I don't. AaronSieb: Just go through it, it's accompanied by slides... – Tamara Wijsman – 2011-03-09T15:30:01.607

Just got done reading over the slides. I've gone through pretty much everything on the chart up until the Runs on Bench node, except check the CPU seating... Which is kind of difficult in my current case (and hasn't been an issue for the past two years or so). I guess if some of the other tests I'm running don't pan out, I'll have to try to clear an area to pull the mobo out of the case. – AaronSieb – 2011-03-09T16:51:51.207

3

My usual answer for any problems relating to hardware:

Download Hiren's Boot CD: http://www.hirensbootcd.org/download/

Burn the ISO to disk, then boot off it. It contains many tools for diagnosing hardware problems - CPU testing, Memory testing, Motherboard testing.

By the sounds of your problem you may have the best results (or worst?) using one of the tools that offers a "Burn-In" test (exersizes the whole computer to stress test it).

Majenko

Posted 2011-03-08T23:18:53.647

Reputation: 29 007

Important note: Don't solely run the tests, benchmarks are there to test the system when placing a new component or when you have done over-clocking to test if the system remains stable. In order to troubleshoot the issue you would have to replace the individual parts as such tests influence multiple components, better said you can't test a specific component without the others attached (and being used)... – Tamara Wijsman – 2011-03-08T23:44:33.420

Running through some of the tests now, as I find them (the menu organization seems kind of weird). Any stand-out tests I should be looking for, particularly focusing on CPU, motherboard, and the RAID controller? – AaronSieb – 2011-03-09T17:40:25.330

"S&M Stress Test 1.9.1: CPU/HDD/Memory benchmarking and information tool, including temperatures/fan speeds/voltages (Windows Freeware)." That looks a likely candidate for finding your problem. – Majenko – 2011-03-09T18:09:59.003

Okay, I think I've found the main menu (at any rate, under the Windows XP option there's a list of tests and similar). The S&M test passed. Weird. I still have yet to see anything fail outside of a Windows 7 environment. – AaronSieb – 2011-03-09T21:32:56.693