Can CPU overheating cause data corruption?

3

Background:

I realized recently thanks to dmesg and syslog that my CPU has been overheating since May. No overclocking. I have an Intel® Core™ i5-2500K, and the stock cooler was simply defective as far as I can tell. I know I should have kept an eye on dmesg for hardware errors, but I only saw:

CPU2: Core temperature above threshold, cpu clock throttled

Every once in awhile so I didn’t think too much of it. I’ll obviously be more careful in the future. As far as I can tell, CPU temps were hovering around 90 °C.

Here’s my question:

I’ve copied a lot of data since May, my pictures, band recordings, etc…

Is it possible that my data has been silently corrupted by this issue? I’m guessing that CPUs have ECC or checksums built in, but I’m not sure. The last thing I want of course is to have been using my computer this whole time and now suddenly all my data has been silently corrupted. Any thoughts are greatly appreciated.

cat pants

Posted 2015-09-13T01:04:32.220

Reputation: 635

Answers

2

You state this:

Is it possible that my data has been silently corrupted by this issue? I’m guessing that CPUs have ECC or checksums built in, but I’m not sure.

Yes, core computer systems have ECC capability, but from what I understand of modern CPU architecture, the CPUs themselves can run up to 90 °C as long as that temperature is not not sustained all of the time at 90 °C.

Meaning if your system is peaking at 90 °C and then dropping down again to something less than that, there should be no issues. And even then, the worst aspect of running a CPU at 90 °C is you will literally burn out the CPU faster; data corruption might happen but that will be obvious quite quickly.

Forget about your data files, your core OS system will just start collapsing before your eyes. This page on “Impact of Temperature on Intel CPU Performance” explains it quite nicely:

Sensitive electronics like CPUs have a finite lifespan and running them at higher temperatures shortens it. So unless you want to have an excuse to upgrade your system often, higher temperatures are counter-productive.

With PC hardware, higher temperatures make both minor and major hardware faults much more likely. These hardware faults can result in anything from reduced performance due to minor errors needing to be corrected to data corruption or bluescreens due to more dramatic errors.

Also of note is their three key conclusions:

  1. Modern Intel CPUs run at full speed (including the full Turbo Boost allowed based on the number of cores and workload) all the way up to 100 °C
  2. Even after the CPU hits 100 °C, the performance is not greatly affected until the CPU spends about 20% of the time > 99 °C
  3. While stock cooling only causes around a 2.5% drop in performance, even a budget after market cooler will dramatically improve CPU temperatures

So that seems to be some confirmation that the worst thing that can happen because of running a modern CPU at 90 °C or higher is a shortened lifespan of the CPU itself; not much else.

That said—and this is mostly anecdotal—but as far as your question goes:

I’ve copied a lot of data since May, my pictures, band recordings, etc…

In my experience, files that have data within them corrupted either cannot be copied at all—with the system itself stopping the copy due to some data read error—or the files themselves show some data modification date that seems off. Like if you had a picture from 2012 that you never touched but suddenly it has a modification date of 2015 that somehow matches the date you copied the file, then I would worry.

But don’t let panic overwhelm you. I am fairly confident that if the 90 °C temperatures were fairly sporadic and not sustained, your data should be fine. If you were able to actually copy the files, then I believe they are fine.

That said, I looked through a PDF the official Intel datasheet for the “Desktop 5th Generation Intel® CoreTM Processor Family” and found I believe is some pertinent info on page 71 where the TCC (Thermal Control Circuit) activation temperature is said to be 96 °C for that family of CPUs. When the TCC is tripped, the CPU throttles down CPU cycles in an attempt to cool itself down.

So if you are noticing 90 °C CPU temperatures and occasional “CPU2: Core temperature above threshold, cpu clock throttled” messages, that means that occasionally the CPU temp is going above 96 °C, throttling down CPU cycles to cool off and then you are back to below 96 °C.

Which means if you ask me, you should see about getting the cooler/fan on that CPU fixed. But as far as it damaging data on your system? I wouldn’t think that occasional 96 °C+ that are successfully throttled down by the system would do anything to your system other than shortening the life of the CPU itself.

JakeGould

Posted 2015-09-13T01:04:32.220

Reputation: 38 217