3
4
Background
I recently bought the Asus ZenBook Pro. I use it for testing deep learning experiments locally. These experiments are often quite compute intense both on the CPU and GPU. I've recently experienced some huge performance drops when doing some heavy computations.
I have Ubuntu 16.04 installed.
Problem
The problem arises when I e.g. schedule a training job using TensorFlow, Keras or running a CPU and GPU heavy job in ROS or Python. After about 30-60 seconds of expected performance (i.e good and high performance) the performance suddenly dies and the entire computer becomes almost unresponsive. A complete reboot is needed to recover functionality.
Using top
, nvidia-smi
or the system system monitor I see no sudden spike in any processors use of CPU or memory. No other processes starts using the CPU or GPU.
When in the unresponsive-state I also see no processors using any noticeable amount of processing power.
I suspect the power management of Ubuntu to cause the problem, since my fan is also acting uncontrollably from time to time, but I'm no linux expert. However, when I installed Ubuntu I had to do the initial boot with acpi=off
if it helps.
EDIT: I have tested the same code on other computers with Ubuntu 16.04 installed and see no issues here.
I appreciate any help in locating the problem or guiding me to somewhere I can research myself.
3I suggest to track the temperatures of the CPU & GPU - the problem might occur if they spike. This one laptop can have ineffective thermal paste on the CPU or similar causes. I don't use Ubuntu, but under Windows one can have them displayed continuously on the taskbar. – harrymc – 2018-09-07T06:10:09.513
I can confirm what @harrymc said - a fan went dead in my Thinkpad. I had it replaced but got a cheap one with just 3 speeds not reported back to system, so now CPU slows down when overheated while Thinkpad thinks that the fan runs on top speed which is not the case. – Pawel Debski – 2018-09-08T10:21:28.020