BSODs and Prime95 failures

4

2

My computer is notoriously unstable. It blue screens all the time. I'm running Windows 7. Here's what's in the box:

  • Intel Core i7 920 (Stock cooler, not overclocked)
  • Gigabyte EX58-UD3R motherboard
  • 6GB (3x2GB) OCZ Gold memory (set to 1333MHz, it has problems booting if I leave it at 1066)
  • GeForce 9500 GT
  • Antec 650W power supply

When idle it seems to run at between around 40 and 50 degrees Celsius, according to SpeedFan. I've run many memory tests, and none of them have come up with any problems.

Now I've received several messages when it Blue Screens:

  • IRQL_NOT_LESS_OR_EQUAL
  • Page fault when not paging (or something like that)
  • Random addresses/registered

Unfortunately, they go by too quickly for me to take much from them.

I just ran Microsoft's hot fix for the first (though I'm not positive that my error is 100% the same as theirs, I don't know if I'm getting the 0x0000000A part), so I don't know if that will help or not, but if Prim95 is any indication, it won't, for the following reason:

When I run Prime95, 8 threads start up, and they all stop very quickly. I get the following errors in the results.txt file:

[Tue Feb 16 15:44:35 2010] 
FATAL ERROR: Rounding was 0.5, expected less than 0.4 
Hardware failure detected, consult stress.txt file. 
FATAL ERROR: Rounding was 0.5, expected less than 0.4 
Hardware failure detected, consult stress.txt file. 
FATAL ERROR: Rounding was 0.5, expected less than 0.4 
Hardware failure detected, consult stress.txt file. 
FATAL ERROR: Resulting sum was 4050964008042496, expected: 2785959515376393 
Hardware failure detected, consult stress.txt file. 
FATAL ERROR: Resulting sum was 4.042840052791945e+056, expected: 3.789462128888016e+016 
Hardware failure detected, consult stress.txt file. 
FATAL ERROR: Resulting sum was 5.593535921577141e+247, expected: 1.208964328863723e+017 
Hardware failure detected, consult stress.txt file.

When I looked at the stress.txt file, it suggested memory might be my problem, but as I said, I've run multiple memory tests (MemTest86, I think? It was a while ago), and no problems have been detected.

After running the hotfix, the test threads managed to stay running a little longer, and while my temperatures definitely rose, they never really got about 60C.

So, basically I see three problems:

  1. I'm running pretty hot. With the stock cooler, I idle close to 50 on some cores with the side of my case off. Putting my hand in front of the CPU fan, I don't really feel much of a breeze. Is this normal for the 920 stock cooler?
  2. I blue screen all the time (like 1-4 times per day).
  3. I can't seem to run Prime95 for more than a few seconds.

Can anyone point me in the direction of what might be going wrong here, and perhaps what to do to confirm/fix the problem?

Thank you.

Mike Pateras

Posted 2010-02-16T21:22:22.580

Reputation: 870

remember, memtest reporting no failures just means the tests it ran succeeded. it doesn't guarantee good memory. – quack quixote – 2010-02-16T21:28:54.877

Thanks for the edit. How did you format those lines properly?

The thing is, I have those same memory kits in two other systems. One identical to this system, and one that's slightly different. The system that's identical to this suffers from similar problems. The one that's different has no problems at all. This leads me to question the motherboard (which is different), or one of the other pieces of hardware, more than the memory kits.

Still, how can I verify that I have good memory? – Mike Pateras – 2010-02-16T21:35:22.710

My personal experience is that memtest86 can detect only very serious memory problems. I would suggest running Prime95 or mprime to test CPU and memory, instead. I once had broken memory that would cause incorrect SHA-1 sums for huge files (too big to fully fit in RAM and multiple in paraller). My best guess is that the memory was super sensitive to slight voltage changes cause by full CPU load and high multiple HDD IO usage. The problem was fixed by using pair of RAM. – Mikko Rantalainen – 2013-10-17T07:09:30.307

Answers

4

First things first - Go to Control Panel > System (Windows Key+Pause/Break) and then under Advanced, you should see "Startup and recovery", click Settings and you can disable Automatic restart on system failure.

Next time a BSOD occurs, you can see what the cause is.

Also, you may want to see Blue Screen View, a very good tool to help you see previous Blue screen errors.

Now, As ~quack said, just because it passes some tests, doesn't mean it is good. If you ran it for a few hours, swapping the modules around and re-running again may make it quickly touch some places it didn't before - but really, unless you run memtest86+ for around (or ideally over) 48 hours, you will not have a good result.

Next, the errors you said are most commonly down to faulty/corrupt/bad memory, but can really be anything - the most likely reason is bad/dodgy device drivers.

If you are getting this every time you run Prime95, I would highly recommend you try unplugging EVERYTHING from your machine other than power, video and keyboard (and mouse, unless you are confident of using the machine without one). Now, go to safe mode and try running Prime 95 again. This is the best way of testing if it is a driver issue - apart from actually reinstalling Windows from scratch and not installing any drivers!

If you are still seeing random problems and Memtest86+ really is not showing errors, it is most likely a problem with the motherboard or even CPU, however, this can be very hard to diagnose.

As for temperature - the lower the temperature, the slower the fan speed - your CPU is very cool and there is nothing to worry about.

William Hilsum

Posted 2010-02-16T21:22:22.580

Reputation: 111 572

Great suggestions, especially the BSOD stuff. Thanks!

If I end up blue screening again, I'll try the minimal safe-mode thing. – Mike Pateras – 2010-02-17T18:07:30.670

1

I had a similar problem recently: system had shown some instabilities, Prime95 returned hardware failure, etc. I ran memtests until the cows came home, all runs came up clear, really drove me crackers ... in the end it turned out the memory voltage was too low.

Molly7244

Posted 2010-02-16T21:22:22.580

Reputation:

1

Googling for BSOD with Gigabyte EX58-UD3R and OCZ gold gives me several results, what's common is that most of the BSOD vanish on changing the memory timings and voltage settings.

Have a look:

  1. Tweak Town
  2. Overclockers
  3. Tom's Hardware

Sathyajith Bhat

Posted 2010-02-16T21:22:22.580

Reputation: 58 436

I tried following all of the settings I could find in those links. I set my memory timings manually, I set my QPI/VTT Voltage to 1.31, and I tried setting my memory frequency to 1600MHz, but it won't boot if I do. I'm stuck at 1333MHz. I haven't crashed in a while (not since I applied the Windows hotfix yesterday), but I still can't get Prime95 to run more than a minute or so. – Mike Pateras – 2010-02-17T18:06:28.490

@Mike Pateras Have you updated your BIOS ? A BIOS update could also eliminate the BSOD's. – Sathyajith Bhat – 2010-02-17T21:20:08.110

1

I have a similar setup, and had similar problems. As mentioned above, it was solved with a voltage change. I have the exact same RAM kit, and the problem is that it will automatically set itself at 1.5V, whereas it's made to run at 1.65V. Change this in your BIOS, and you should be golden! (Ha.)

ryantmer

Posted 2010-02-16T21:22:22.580

Reputation: 121

Sadly, I was already at 1.64V (the next up is 1.66V). Thanks for the suggesiton, though. – Mike Pateras – 2010-02-17T18:04:45.327