6
I have a machine with a Core i7 CPU, 12 GiB of memory, 4 hard drives and a graphics card/sound card (both add-in PCI-E). This machine is somehow unstable, and I'm wondering how to troubleshoot the remaining issues.
Originally, the machine had an ASUS P6T SE mainboard and a 8800GT, running off a 700 W PSU, a LG DVD drive and 3 hard drives. Right when I built it, the RAM turned out to be faulty, so it got RMA'd. The sound card is a Creative X-Fi UAA. The first problem was when the 8800GT broke down, but that was easily solved by buying a new card. However, the machine would sometimes BSOD. Usually not under system load, but in idle. However, it BSODed once under load as well. Suspecting the RAM, I ran memcheck over night and no issues were found. Everything was working fine for most of the time.
Some months later (it would BSOD like once every month or so) the hard drive broke down. Classic head crash, replaced the hard drive and got the OS/data restored from backup. Now I switched the disk configuration to single system drive, then 2 disks in RAID0 and one disks for backup.
A few months later, the system started to BSOD more often (three times a day during near idle, i.e. web-browsing, RDP.) Interestingly, the machine has a WLAN USB stick and it would sometimes BSOD when I started many downloads simultaneously. Once the machine started BSOD'ing, I assumed that the mainboard might be faulty as the disk drives didn't report any problems, the graphics card just broke down and was replaced, and an additional memcheck showed no error. The original BSOD all had some message and not just a STOP ERROR CODE (for instance, I got 0x00000116 (0xfffffa800a546010, 0xfffff8801020907c, 0x0000000000000000, 0x000000000000000d) or 0x0000003b (0x00000000c0000005, 0xfffff8800138e4c7, 0xfffff8800b96c550, 0x0000000000000000).)
I replaced the mainboard with a different one, and the machine would now suddenly turn off. This led me to the conclusion that the PSU might be faulty, so I tested with a different one. The different PSU had a cable which was too short to attach it to the DVD drive, so that got cut off. With the different PSU (500 W), things were working rock-solid. I replaced the original 700 W PSU and put it back it, connected it to the DVD drive and the machine would turn off again. I removed the DVD and tested it in a different machine, and indeed, the DVD was faulty. I removed the DVD and the machine was running stable again.
A few weeks later, during gaming, the machine BSODed with Stop Error 1E without any further information. Rebooting and everything worked fine. On the same day, I wanted to run the Backup, and the backup failed with error 0x80070570 (files corrupted.) I ran chkdsk, and indeed, on my primary system drive some index ($SSI?) or so was broken, 9 files got deleted and everything was backed up. In order to check the drives, I ran three instances of HD Tune concurrently, and the machine BSOD again with 1E (0x0000001e (0x0000000000000000, 0x0000000000000000, 0x0000000000000000, 0x0000000000000000)). Hoping that one of the drives was faulty, I ran HD Tune sequentially over night, and no error occurred. The machine didn't BSOD, and is running fine again. sfcscan
also indicated no system files are broken.
As this machine has nearly everything replaced (hard drive, graphics card, memory, motherboard, PSU) or removed DVD drive; do you have any ideas how to troubleshoot what the heck is going on? The weirdest thing is that it works fine now with extreme load for hours straight, but still I had those two failures over the weekend (both under load, interestingly). Each part in isolation seems to work fine, but the combination somehow makes problems. I'm totally lost where to trouble-shoot, as every time I try to check something, the pesky thing just works fine.
Update: Just got another BSOD (1E), while reading a web site. I got the screen where a memory dump was created, progress bar going up to 100%, but after the reboot, Windows is not aware that the machine crashed. The reliability log does not show a crash. However, looking into the Minidump folder I dug out the minidump from the weekend, and the call stack has a HIDPARSE in it. Can a USB keyboard (or USB mouse) produce a bluescreen?
Update2: I replaced all hard-drive cables and reinstalled Windows. Reinstall worked fine, installing applications for 6 hours straight as well. When turning off, I got a stop error 24. I'm suspecting the primary hard drive to be unreliable (Samsung HD103SJ), as I don't see what else could be causing the problems. HDTune and chkdsk however report that the drive is OK.
3You are experiencing a highly unusual number of failures. Do you have the machine running from a UPS or power line conditioner? It's possible the electricity running into your residence is unstable and power surges/spikes are causing damage to your electronics. – BBlake – 2011-04-19T14:45:39.450
@Anteru from @BBlake's suggestion, it appears that the one constant in your problems is the power coming into the computer. (If you haven't already) try a UPS, if that doesn't solve the problem I would take out everything but the bare essentials, 1 RAM stick, just the video card, 1 hard drive. If it crashes swap out the pieces with one of the other RAM sticks/Hard drice/etc until you have a stable system. Then add components very slowly (i.e.- 1 a week) and when you start having crashes you know where to look. – Patrick – 2011-04-19T15:01:05.093
No, I haven't, and I wonder how those would be related (other electronics at home work just fine, i.e. TV and stuff.) I also have the PC connected through a fuse protected plugbar. Didn't try to get an UPS though. Any idea how to figure it out whether the power line is the source? – Anteru – 2011-04-19T15:03:23.730
@Anteru by getting a UPS :-) There are also power conditioners that don't have the UPS functionality in it so it is cheaper. I think the most direct way to check the quality of the power coming in would be an oscilloscope, though that's an expensive toy to have unless you are hardcore. – Patrick – 2011-04-19T15:15:33.303
Do you run chkdsk on a regular basis for maintenance, you should...0x1E, can be caused by bad driver, virus, or hard disk error...http://msdn.microsoft.com/en-us/library/ff557408(v=VS.85).aspx
– Moab – 2011-04-19T15:42:11.237Any recommendations which power conditioner to use? The power supply to the house should be stable, at least nobody here or in the area ever reported issues with unstable voltage/spikes (oh and the next power plant is actually not far.) – Anteru – 2011-04-19T18:11:51.403