Laptop overheating after doing the usual cleaning routine

8

I have a Vaio VGN-CR353 laptop was given to me around September or October 2012 and I installed Ubuntu on it. I have already made it into a very personal laptop and installed games under wine (SC2, Frozen throne), several IDEs (Sublime Text 2, Eclipse, Netbeans) with no hitch... until last November.

Just so you know, I never touched the internals until the last week of November, when I determined that it was not software that was causing this problem.

Ubuntu reports that frequently hits the 95C or the 105C critical marks and automatically shutdown. I have already addressed the issue by:

  • Dusted-off the internals. Amazingly, it was very clean to begin with.
  • Removed very minor accumulations in the fan and sinks.
  • Reapplied thermal compound several times already, just incase I applied it wrong. Currently testing different application techniques. Also chose nano diamond to rule out shorting due to the compound.
  • Reseated the sinks tightly. Event bent up a bit the arms that hold the sink to ensure that the sinks are as tight as possible.
  • Made sure vents were clear
  • Bought a cooler
  • Elevated the laptop by buying larger "rubber feet". The laptop now sits at least 1 cm from a flat surface
  • Reinstalled different versions of Ubuntu since Linux kernels from 2.6 to 3.2 suffer an overheat issue. Currently on a 3.5 kernel (Lubuntu 12.10).

But still, after addressing these issues the overheat issue still exists. The overheating happens when:

  • I surf the net on any browser (Firefox, Chromium) even when flash plugin is not installed (And so Flash is not to blame)
  • I copied files to an external hard disk worth 39GB via the terminal. Unusually, it does not overheat when copied using the GUI.
  • Using Netbeans, event when just writing the code, not even compiling yet.
  • Randomly!
  • Even when I'm in the school computer lab which is crazy cold.
  • After a clean install of Windows

Limitations:

  • No BIOS settings for fan nor frequency settings for processors (It's Sony, what do you expect?)
  • lm-sensors don't detect fan sensors or any other sensors besides the CPU cores and motherboard, because Vaio laptops notoriously don't implement such.

I already installed lm-sensors and gkrellm to monitor the temperatures. I currently have view of both CPU cores, and ACPI temps. Unusually, I never saw them go beyond 60C. Currently, the latest readings in temperature range from 32C on fresh boot, 43C at room temperatures, 49C on moderate load (multi-tab surfing) and 53C when using Netbeans. It's quite weird that the temperatures fluctuate with great differences between each use.

Also, sometimes the system reports having reached the critical temps even when the laptop does not feel hot at all, like a while ago in the lab.

Until now, I am still waging this war with the laptop. Am I missing a vital routine that could turn the tables around and once and for all fix this issue? I am running out of ideas.

Update1:

Currently downloading drivers for another laptop via Firefox. CPU usage is 80% and 21% with temps of 58C and 51C on both cores. ACPI temperature at 60C and disk usage (write due to download) up to 205KB/s. Ram usage approx. 500MB. No overheating just yet.

Update2:

Just before running Prime95, I already tested installing and using Windows for a couple of days. Same thing happens on Windows. The only difference is that unlike Linux which shuts down the machine semi-properly, on Windows, it just turns off! It's like pulling the plug suddenly.

Therefore it's not a Linux issue.

Update3:

Managed to get hold of and run Prime95 on Linux. Amazingly, I could even push the laptop to 100% load on both cores, 100% memory use and reach ~90C stable and without going over (tested for like 10-15 mins) without overheating. I just wonder why the machine suddenly reports 95C and 105C.

Update4:

Dismantled the laptop for a thorough clean and then reassembled it. Nothing out of the ordinary, just a minor dust layer After that, I ran Prime95 for 30 mins to prove that the laptop can't overheat. It even tops at most 91*C, average at 85*C. It must be a faulty sensor.

Update5: Finally ran a script that monitors temperatures in a log-graph, rather than just watching the current temps go up. Modified the script on this post to monitor the ACPI (as GKrellM labesl it), Cores and HDD temps on my rig per second. And then I used the laptop on different scenarios, like surfing, compiling code, low power mode, balanced and high modes.

Then an amazing discovery, the ACPI sensor skyrockets to critical in a split second! This event trips the OS thermal protection which shuts down the PC. I have a log of the temps (ACPI,Core1,Core2,HDD), and the Critical warning from /var/log/syslog. I also have a graph of the log I made. You can see that in this per-second log, it pops to a whopping 111 Celsius, out of it's range of 40-50. Not only that, there is virtually nothing that's causing it. As you can see in the log and graph, the HDD and cores are acting just fine. It's the ACPI that's gone wild.

By the way, the "ACPI" temps come from this path: /sys/class/thermal/thermal_zone0/temp

terminal check

graph check

Joseph

Posted 2012-12-19T12:49:06.800

Reputation: 249

Can you hear the fans spinning well? Do they speed up and turn at max RPM just before the computer crashes? – terdon – 2012-12-19T13:03:45.820

@terdon I don't know about a max speed. I tried watching the fan while the bottom cover is off and the fan runs momentarily on BIOS (maybe a check), then it turns off when the OS loads until it loads a fresh desktop. It only runs again when I start using applications, just when it starts to rise to around 40*C+ and runs constantly throughout use. It's blowing hot air, so it means the heatsinks are doing their job. – Joseph – 2012-12-19T13:10:54.680

Are you sure the fans are even working? In the end you might have just reached the end of the laptop's lifespan. – Ramhound – 2012-12-19T13:26:06.443

2Repave it with Windows and see whether you're still having trouble. If so, it's a hardware issue; if not, it's a totally unprecedented problem with the Linux ACPI driver. – Aaron Miller – 2012-12-19T13:40:25.610

@Ramhound yup, pretty sure the fans are working. They spin constantly rather than the on-off behavior of other laptops. They still spin when the OS shuts itself down until power-off. – Joseph – 2012-12-19T13:43:57.217

@AaronMiller Will try Windows in a few moments. – Joseph – 2012-12-19T13:44:42.460

@AaronMiller still having problems even with Windows on it. While I was on Windows, I installed Speccy which determined that it was the motherboard that was hot. – Joseph – 2012-12-19T15:30:53.457

Probably time to look for a new laptop, then. – Aaron Miller – 2012-12-19T15:36:34.287

Was it new when you got it? Sony had some problems with the main fan about 3 years back and replaced a bunch of fans on warranty. – Daniel R Hicks – 2012-12-25T13:41:48.543

@DanielRHicks It's been there for a while before I got it. – Joseph – 2012-12-25T14:41:59.710

You might check this out. I think eventually the problem fans would stop entirely.

– Daniel R Hicks – 2012-12-26T04:39:23.017

Answers

3

It's been 3 months and finally pin-pointed the problem. It's a hardware problem and that spammy-looking ad-filled Indian site was right (won't post it here as it's a commercial entity), it's chip-level damage that's common to a number of Vaio laptops.

So the best and probably the only solution is to turn it over to your nearest service center for repairs. If it's under warranty, you're fine. If not, well, expect shelling out a few bucks for it. You might be better off buying a new notebook.


Anyways, I got another workaround and it's highly dangerous. I am only sharing this for purposes of informing that there is a way to get around it, but has its tradeoffs. This is not sound advice, just stating that it's possible.

This dangerous move involves disabling the ACPI critical trip point of the Linux. To do this, one must edit their grub file:

gksudo leafpad /etc/default/grub

And add thermal.nocrt=-1 to GRUB_CMDLINE_LINUX_DEFAULT as shown:

GRUB_CMDLINE_LINUX_DEFAULT="quiet splash thermal.nocrt=1"

Then update grub:

sudo update-grub

Then reboot.

This disables the ACPI critical trip point but not the thermal sensor, so that we can still monitor if afterwards.

After doing so, I ran my logger script. However, to compensate for the lack of a natural trip point handler, I set GKrellM to fire an action when the event happens. Since GKrellM is usually delayed, it's good for knowing that when it goes over the trip point, it has gone over it for a significant amount of time when I fire an action.

Then I went on with my usual routine. After doing so, the system tripped it again. However, it was a sudden spike, that it did not even register in GKrellM but my logger got it recorded. It's a very abrupt spike and that was it.

Joseph

Posted 2012-12-19T12:49:06.800

Reputation: 249

0

I have a similar problem with an HP laptop and the answer for me is simply that the custom power save options or commands to the BIOS are not working under linux.

So basically the problem is with Sony and them not wanting it be be o-ther-than-windows capable.

Gunnish

Posted 2012-12-19T12:49:06.800

Reputation: 219

It's not a Linux issue. Already tried running Windows on it and the same thing happens. – Joseph – 2012-12-25T13:37:33.193

Oh, I'm sorry, seems like a harder problem than I was aware of then. – Gunnish – 2012-12-25T15:19:28.260