Is another question extended from the previous one [1]
I have a compressed file and stream them to feed into a python program, e.g.
bzcat data.bz2 | parallel --no-notice -j16 --pipe python parse.py > result.txt
The parse.py can read from stdin continusuoly and print to stdout
My ec2 instance is 16 cores but from the top command it is showing 3 to 4 load average only.
From the ps
, I am seeing a lot of stuffs like..
sh -c 'dd bs=1 count=1 of=/tmp/7D_YxccfY7.chr 2>/dev/null';
I know I can improve using the -a in.txt
to improve performance, but with my case I am streaming from bz2 (I cannot exact it since I don't have enought disk space)
How to improve the efficiency for my case?