Gnu parallel not utilizing all the CPU

Question

I have a simple python script that read from stdin (a single line), do some processing (string parsing, no IO involved) and output to stdout

e.g. python parse.py < in.txt > out.txt

I have an in.txt which is around 200GB in size, and I use parallel to speed it up (I have 8 CPU cores).

cat in.txt | parallel -j8 -N1 --pipe python parse.py

What I observed the CPU is they are not fully utilizing, e.g.

%Cpu0  :  9.1 us, 22.7 sy,  0.0 ni, 68.2 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu1  : 27.3 us, 13.6 sy,  0.0 ni, 59.1 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu2  : 14.3 us, 71.4 sy,  0.0 ni, 14.3 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu3  : 14.3 us, 28.6 sy,  0.0 ni, 57.1 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu4  : 14.3 us, 38.1 sy,  0.0 ni, 47.6 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu5  :  4.8 us, 23.8 sy,  0.0 ni, 71.4 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu6  : 15.0 us, 20.0 sy,  0.0 ni, 65.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu7  : 23.8 us, 19.0 sy,  0.0 ni, 57.1 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st

And

ps ax | grep python

I got

12450 ?        S      0:00 /bin/bash -c               sh -c 'dd bs=1 count=1 of=/tmp/2NQLo8j4qy.chr 2>/dev/null';              test ! -s "/tmp/2NQLo8j4qy.chr" && rm -f "/tmp/2NQLo8j4qy.chr" && exec true;              (cat /tmp/2NQLo8j4qy.chr; rm /tmp/2NQLo8j4qy.chr; cat - ) | (python parse.py);
12453 ?        S      0:00 /bin/bash -c               sh -c 'dd bs=1 count=1 of=/tmp/zYnfr4Ss8H.chr 2>/dev/null';              test ! -s "/tmp/zYnfr4Ss8H.chr" && rm -f "/tmp/zYnfr4Ss8H.chr" && exec true;              (cat /tmp/zYnfr4Ss8H.chr; rm /tmp/zYnfr4Ss8H.chr; cat - ) | (python parse.py);
12456 ?        S      0:00 /bin/bash -c               sh -c 'dd bs=1 count=1 of=/tmp/wlrI14juYz.chr 2>/dev/null';              test ! -s "/tmp/wlrI14juYz.chr" && rm -f "/tmp/wlrI14juYz.chr" && exec true;              (cat /tmp/wlrI14juYz.chr; rm /tmp/wlrI14juYz.chr; cat - ) | (python parse.py);
12459 ?        S      0:00 /bin/bash -c               sh -c 'dd bs=1 count=1 of=/tmp/cyArLNBTTm.chr 2>/dev/null';              test ! -s "/tmp/cyArLNBTTm.chr" && rm -f "/tmp/cyArLNBTTm.chr" && exec true;              (cat /tmp/cyArLNBTTm.chr; rm /tmp/cyArLNBTTm.chr; cat - ) | (python parse.py);
12461 pts/0    S+     0:00 grep --color=auto python
15211 ?        S    144:22 perl /usr/bin/parallel -j8 -N1 --pipe python parse.py

Every time I run ps ax | grep python I got different temp files, I assume CPU is wasted in dealing with these temp files? Or am I doing something wrong?

What are the specifications of the CPU in use? `cat /proc/cpuinfo` — ewwhite, May 14 '14 at 17:34

score 3 · Answer 1 · answered May 14 '14 at 17:47

3

The -N1 is causing one process to be created per line. You are seeing the overhead of parallel set up. You should alter the python script to handle more than one line. Then cat in.txt | parallel --pipe python parse.py should make full use of the CPUs.

answered May 14 '14 at 17:47

Mark Wagner

17,764
2
30
47

score 3 · Accepted Answer · answered May 14 '14 at 21:57

3

While Mark's answer is correct and fully supported, you might want to give a new feature a spin.

cat file | parallel --pipe ...

maxes out at around 100 MB/s.

The new experimental option --pipepart delivers > 2 GB/s, but requires in.txt to be a real (seekable) file:

parallel -a in.txt --block 100M --pipepart python parse.py

answered May 14 '14 at 21:57

Ole Tange

2,836
5
29
45

Yes, Mark's answer helps, but your command is a faster fix, thanks – Howard May 16 '14 at 03:48

Gnu parallel not utilizing all the CPU

2 Answers2

Linked