1
I have a python program that utilizes multiprocessing.Pool
, which means it spawns multiple processes. The program gets 70k text files, does some processing, and saves them to another directory. I want to make use of all my CPU cores when running it, like this.
And for a while, I've been able to. My program took ~10 seconds to run every single time, which is good.
real 0m12.430s
user 0m13.072s
sys 0m9.704s
But for the past few days it has been inconsistent. Sometimes it uses 100% CPU for all cores. Other times, for no reason, the program only uses 0-1 percent, and it takes as long as 5 minutes to run.
real 5m6.186s
user 0m4.844s
sys 0m4.968s
Note that real >> user + sys, although I have no idea what this implies. Also note that I am not changing any parameters of the code or any other settings between runs. In this case, I have also noticed that RAM usage of each process never goes above 10 MB, which might affect CPU usage.
My question is, how can I even begin to diagnose this problem? Some of my hypotheses are:
- OS has detected my laptop is 2 years old and is trying to prolong its life by throttling CPU/RAM usage
- Maybe OS throttles a process's CPU/RAM usage if it's been using up a lot for quite a while.
- Someone has gained remote access to my laptop and is trolling me
I am running Linux Mint 17.3 on a laptop with Intel(R) Core(TM) i3-4005U CPU @ 1.70GHz, 4 cores, 8 GB of RAM, HDD
Thanks in advance.
EDIT: Here's vmstat output while program is running
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
3 1 0 2773588 242912 2522744 0 0 114 54 229 812 11 2 83 5 0
My first thought is that your processes become I/O bound. In your process monitor, any process in disk-wait state (or is that disk-sleep)? – xenoid – 2018-03-30T15:10:04.840
I ran
watch -n 1 "(ps aux | awk '\$8 ~ /D/ { print \$0 }')"
and surely enough, all of the python subprocesses are in D+ state. 90% of the time, no other processes are in that state. I am guessing this doesn't happen whenever the subprocesses are utilizing 100% CPU. – Carl Araya – 2018-03-30T15:24:01.160Run vmstat and check the iowait column, a high percentage means it is waiting for IO (thrashing the disk). Try posting like 10 lines worth of vmstat output in your question. HDD or SSD? Check the bi column (bytes in) which is in kB, this will show if it's reading your files. si/so (swapin/swapout) show if it's swapping. – peufeu – 2018-03-30T16:03:04.650
Note the first "time" output shows user+sys=22.7s and real=12.4s therefore it did use 2 cores but not more (don't know how many you got). – peufeu – 2018-03-30T16:08:06.663
Done editing my question. – Carl Araya – 2018-03-31T02:17:55.780