Debugging a "Somewhat" Faulty PC Component

6

2

TL;DR: I have a component that fails very infrequently, preventing my computer from POSTing (I am suspicious that this may be a motherboard issue). Since I cannot reliably reproduce this error, what would be the best way for me to find a solution to my problem?


I recently (1.5 weeks ago) assembled my first computer after purchasing all the components (PCPartPicker markup below). Everything seemed to be working perfectly until this weekend: once every 20 or so "power-ons", my PC will physically turn on (all the hardware will light up except USB devices, but I will not reach a POST beep).

The Build

Please find the PCPartPicker Parts List here, and the PCPartPicker Benchmarking List here. Further, I have provided a detailed breakdown of my components below:

  • CPU: Intel Core i7-4930K 3.4GHz 6-Core Processor
  • CPU Cooler: Corsair H100i 77.0 CFM Liquid CPU Cooler
  • Motherboard: Gigabyte GA-X79-UP4 ATX LGA2011 Motherboard
  • Memory: Corsair Vengeance 16GB (2 x 8GB) DDR3-1600 Memory
  • Storage: Samsung 840 EVO 120GB 2.5" Solid State Disk
  • Storage: Western Digital Caviar Black 1TB 3.5" 7200RPM Internal Hard Drive
  • Video Card: EVGA GeForce GTX 780 3GB Video Card
  • Case: NZXT Switch 810 (Black) ATX Full Tower Case
  • Power Supply: Corsair 760W 80+ Platinum Certified Fully-Modular ATX Power Supply
  • Optical Drive: Asus DRW-24B1ST/BLK/B/AS DVD/CD Writer
  • Operating System: Microsoft Windows 8.1 - OEM (64-bit)

The Problem

A have a few symptoms that may or may not be related:

  • Occasionally (once ever 20 boots) my computer will power on, but not POST. All the hardware turns on with the exception of my USB devices, but I never even reach the screen where I can access the BIOS.
  • When I render videos, there are sometimes green blocks (glitches) that appear on the screen. Could the issue lie with the Graphics card even though it performed well during stress tests?
  • Also, occasionally, some of my USB devices will glitch and not work for a few moments.
  • Some of the most basic OS functions will occasionally "stop responding". For example, the "Sound" function that lets you control the USB microphone input levels will always stop responding after I make a change to the settings.
  • Finally, when I first plugged in my video card, it did provide any signal to my second monitor through the alternate DVI slot. I had to move it to the top PCI-E slot before being able to use both ports. [Resolved - #3]
  • When I booted most recently, I received an error that read "The Main BIOS is corrupted. The system will be recovered from the backup BIOS [...]" (see image below). Does this confirm a MOBO issue?

BIOS Issue

My Suspicions

I believe that given all the problems I am having, this is a motherboard issue. However before dismantling all the components and RMA'ing this component, I'd like to confirm my suspicions.

What I Have Tried

I have stress tested each of the major components other than the motherboard:

They all passed with flying colors! However, I haven't manually tested the PSU - could the PSU be the issue?.

Further, I have done the following debugging on my own:

  • Double-checked the connections throughout my setup;
  • Reassembled the setup twice;
  • Checked the PSU voltages - voltages look fine (see below). All other cables passed test also.
  • I replaced the motherboard very recently, and about 4 days later the same boot issue occurred [Added - #6].

PSU Tester Results

My Question

Essentially, I am wondering how I can appropriately debug this issue. Usually when I ask this question, I get told "take everything apart, and start putting it back together until it stops working". I can't do this because it takes about 20 - 25 power-ons for the error to occur - how will I know if it is working or if it just hasn't faulted yet?

Squagem

Posted 2014-02-10T17:25:36.977

Reputation: 169

Some MB can have RAM compatibility issue (some red G.Skill sticks with some AMD 790 NB), sometimes causing cold-boot issues. Also I've a Gigabyte motherboard which stopped working without reason : did all the tests I could, replaced every parts, but nothing. Ended up buying another 2 years ago. Now, some months ago I tried it again (with same CPU/RAM/PSU), it told me at the first boot the main BIOS was corrupted, restored it, and since that it works without issue as a network gateway/server. Those Gigabytes MB are nice but can encounter weird issues… – piernov – 2014-07-21T17:25:59.833

1

Unfortunately, for this kind of sporadic failure, isolation testing is generally the easiest option. If you have access to a POST card, you could try using that to see if the POST code on failure is the same each time and if it provides any hints as to why it's failing.

– ernie – 2014-02-10T17:45:35.280

I'll swing by my local MicroCenter and grab a POST card and PSU tester today - this sounds like a great next step. – Squagem – 2014-02-10T18:01:46.367

OUCH holy macaroni.. I never do all these software tests..maybe it has its benefits, sometimes, but swap hardware that will give you a good indication. You need to have spare hardware that works, to do the tests but no techie would even attempt proper troubleshooting without spare hardware. I'd be wary of having a computer without spare parts(even outdated parts) for it! They are needed to troubleshoot. I don't know what it is or what would cause it.. Perhaps you try changing the CMOS battery..though CMOS battery i'd have thought only requires changing if the clock slows down. – barlop – 2014-02-21T00:27:23.173

1but some kind of BIOS corruption issue, well, there's the ROM aspect of the BIOS, and there's the BIOS settings in CMOS.. maybe the MBRD has an issue.. the only hardware you can change in relation to the BIOS is the CMOS battery I doubt it'll help but it's perhaps worth a try. And perhaps worth flashing the BIOS – barlop – 2014-02-21T00:30:15.730

Thanks for your input Barlop. I believe I am going to RMA the MOBO and see what happens after a new clean install. Unfortunately, I have no spare parts to test this in a more structured way. – Squagem – 2014-02-21T00:32:34.877

that's an idea. Also, Googling this- dualbios "the main bios is corrupted" gives loads of results. It's worth persuing those and that may also help find the trigger and diagnose. – barlop – 2014-02-21T00:33:21.337

Answers

2

In my experiences, corrupt BIOS can either be a power supply, or a motherboard issue. I noticed on your picture of the PSU tester, you don't have the CPU pins plugged in (either 8 or 4 pins). Plus those in and re-run the test.

The tester you have is a cheapo, but will find IMMEDIATE problems with your power supply. Do not trust it to be the end-all test. Double check that everything is in correctly (i.e. Power cables, the motherboard is on the stand offs in the case, all the cards plugged in properly, etc.). Try to run the system with just keyboard and mouse for the 20 or so reboots. Is it a standard keyboard/mouse, or a super-fancy style? It may be drawing more power than the system expects as well if it's on the same USB Host. Try spreading the connections on USB out across two sets. Barring that, "borrow" a cheap $20 keyboard/mouse combo, and try with those. Again, just the keyboard, mouse and monitor should be hooked up at this point.

Everytime, does it boot up properly? If not, more troubleshooting ahead. Unfortunately, there is no built-in video, so I can't recommend you remove your graphics card yet. While the system is plugged into the wall, powered down, remove the CMOS battery. Check this with a multimeter. It should read around 3.0V, +/- about 10%. If it's OK, plug it back in.

Have you done a BIOS upgrade? Read the motherboard's documentation about how to do so. Watch for things such as formatting the flash drive with the correct file system. I mention this because some Asus motherboards will read an NTFS flash drive, but not do a BIOS upgrade from it; Asus motherboards seemed to have required exFAT or FAT32, in my experience.

But honestly, the next thing to try is to seriously get spare components and swap out until it works PROPERLY. I realize it may cost some money, and you'll be delayed getting the system running. Maybe take advantage of your warranty period, and send back the motherboard, PSU and video card. See what they'll say about it.

Canadian Luke

Posted 2014-02-10T17:25:36.977

Reputation: 22 162

Luke, thank you very much for your detailed answer! Looks like there are quite a few next steps for me to tackle here. As you can see from my edits above, I did recently replace my motherboard, but the errors persisted. What's the probability of the second motherboard failing? I'm thinking it's low and that this recent test has highlighted my PSU as the main issue. Finally, I have updated my BIOS to the most recent versions, and to clarify, I'm not using an ASUS motherboard.

– Squagem – 2014-03-14T12:28:13.300

I understand you might not be, but I mention it because other motherboards May have a similar limitation – Canadian Luke – 2014-03-14T14:33:02.617