3

I just built a new system and I'm getting very frequent STOP errors. The STOP errors I've hard are 0x00000101 and 0x00000124, which are both Machine Check Exceptions.

I know that I have a hardware error, but I'm not sure if it's the CPU or the motherboard that's the problem. I don't have another CPU of the same socket handy to swap and test.

The reason I suspect the CPU is because in the event log I see a log of these:

A corrected hardware error has occurred.

Reported by component: Processor Core
Error Source: Corrected Machine Check
Error Type: Cache Hierarchy Error
Processor ID: 1

The details view of this entry contains further information.

From what I've read, that could be a BIOS issue, but it certainly sounds like a CPU issue.

The motherboard is a GIGABYTE with AMD SB700 southbridge. The CPU is an Athlon X2 7750.

My understanding is that the CPU is a re-branded X4 with two cores disabled, probably because they are faultly.

I want to RMA the faulty part, but I'm not sure which it is. Anyone care to guess?

UPDATE: RESOLVED - After replacing the motherboard, the problem turns out to be "Cool 'n Quiet". Disabling "Cool 'n Quiet" in the BIOS suddenly resolved the issue. No more BSODs.

Chris Thompson
  • 537
  • 1
  • 13
  • 22

4 Answers4

2

Two ways to look at it:

1) The CPU almost never goes bad. We've replaced 36 desktop motherboards this year. We've only replaced 1 bad CPU. The CPU failing is very, very rare.

2) In my career/experience, I have never had a bad CPU that allowed the computer to get as far as a blue screen. When bad CPU's happen, the machine does not boot.

I can't site anything specific on those two pieces of advice beyond general experience, as I have not seen that specific error message before, but my career says bad motherboard.

Happy Hamster
  • 403
  • 5
  • 12
  • That's what I was thinking myself, but the "Processor Core" error from Windows is giving me pause. – Chris Thompson May 16 '09 at 22:26
  • CPUs can break, but most commonly because someone improperly applied thermal paste and they overheated. Most of the the time an inherently faulty CPU will be pulled long before it gets a chance to ship. – username May 16 '09 at 22:42
  • 1
    The CPU cannot be told apart from the motherboard by an application (ie, socket could be bad, other things like that). There is no way to tell definitively from a windows error message where the problem lies. Do what you can to eliminate problems you can fix (update BIOS, drives, reinstall OS), then replace the most likely problem component. I agree with Happy Hamster that this is /probably/ a motherboard problem - but there's no way to be sure aside from replacing one of them. – sh-beta May 16 '09 at 23:31
  • 1
    RE - Chris -- One of my favorite error stories is a hard drive error I got once. Error log was filled with errors about the inability to write to a sector. Obvious hard drive fault, right? Ran Spinrite to confirm, drive was bad. Replaced drive. Same problem. Ran Spinrite, drive confirmed bad again, so I assumed I got a DOA replacement. Replaced THAT drive, same problem, realized something was up. All three drives tested fine on another machine. Problem? Bad hard drive controller on the motherboard! So yeah, errors can be very misleading sometimes... – Happy Hamster May 17 '09 at 01:06
1

I cought similar issues with CPU overheating; check the fan or install one if it's passivly cooled. Try also hardware monitoring tools (such as PC Probe for Asus desktop motherboards) that can indicate CPU overheat

Dani
  • 1,216
  • 1
  • 13
  • 20
0

Wikipedia has a section on interpreting MCEs, which has a link to a tool from AMD called mcat, which apparently will help you figure out exactly what the numbers mean.

Blorgbeard
  • 191
  • 4
  • 7
0

How hard would it be to return/replace the CPU, could you afford the 'downtime'? If you replace it you'll know very quickly if it's a CPU or MB issue.

Chopper3
  • 100,240
  • 9
  • 106
  • 238
  • I think I'm going to go that route. Downtime isn't an issue since its a new computer. I think I'm going to RMA the motherboard since it seems that's the most likely culperate. – Chris Thompson May 17 '09 at 00:34