GNU parallel doesn't fully utilize my CPUs

Question

I'm running a command like this on my 36 core server (EC2 c4.8xlarge/Amazon Linux).

find . -type f | parallel -j 36 mycommand

The number of files to process is ~1,000,000, and it takes dozens of minutes. It should run 36 processes simultaneously. However, from the result of top, there are about 10 processes at most, and 70% is idle. ps shows more processes, but most of them are defunct.

I guessed it was because each mycommand finished so quickly, parallel could not catch up spawning new processes. So I tried parallel --nice 20 to allocate more CPU time to parallel itself, but this didn't work.

Does anyone have an idea to improve this?

$ parallel --version GNU parallel 20151022

Do you understand that `parallel` will not make `find` run in parallel. Even if it could, it would certainly be stuck waiting on IO as a CPU is always faster than the hard drive. — Julie Pelletier, May 31 '16 at 05:22
@JuliePelletier I tried making list of files beforehand by `find . -type f > files`, but didn't improve. — aosho235, May 31 '16 at 05:35
The speed of the `find` command will not improve unless you change the path you ask it to search. It is dependent on the hard drive's speed. — Julie Pelletier, May 31 '16 at 05:38
@JuliePelletier I thought I could remove the bottleneck from the hard drive by making list beforehand and feeding it to `parallel` (`parallel -j36 mycommand < files`), but didn't improve. — aosho235, May 31 '16 at 06:06
I may have misunderstood that. Yes, the list being done beforehand does remove that part of the strain on the hard drive. Then the actual question depends on what happens in your `mycommand`. — Julie Pelletier, May 31 '16 at 06:12
Just quick guess ... did you try with `find | xargs -n 1 -P 36 mycommand` if your xargs support the `-P` argument? maybe you could use larger `-n` if mycommand supports processing more files in one run. That would save some overhead. — Fox, May 31 '16 at 09:20
Do you really need the -j36? I thought parallel used the number of cores as the default. — Zoredache, May 31 '16 at 14:07
@Fox indeed xargs with -P option was much faster than parallel when I faced a similar situation, where `mycommand` was not worth the overhead of parallel. I timed running mogrify on 1000 images (resizing) using 4 cores, and parallel took 17 seconds (with only about 90% CPU utilization, 15% of which was from a perl child process of parallel), compared to 4 seconds using xargs -P 4 (full CPU utilization). — Yibo Yang, Aug 27 '18 at 00:29

Ole Tange · Accepted Answer · 2017-02-13T09:39:12.407

3

The number of files to process is ~1,000,000, and it takes dozens of minutes.

So you are running around 600 jobs per second. The overhead for a single GNU Parallel job is in the order of 2-5 ms, so when you are getting more than 200 jobs per second, GNU Parallel will not perform better without tweaking.

The tweak is to have more parallels spawining jobs in parallel. From https://www.gnu.org/software/parallel/man.html#EXAMPLE:-Running-more-than-250-jobs-workaround

cat myinput | parallel --pipe -N 100 --round-robin -j50 parallel -j100 your_prg

This way you will have 50 GNU Parallel that can each spawn 100 jobs per second.

edited Feb 13 '17 at 09:39

answered May 31 '16 at 22:19

Ole Tange

2,836
5
29
45

Now `ps` shows many `mycommand` processes as expected, and idle time has gotten down to 25%. Thanks! – aosho235 Jun 01 '16 at 02:02
Nice, but it has a drawback: it breaks `--results`, as executions are aggregated. – Francesco Frassinelli Feb 11 '22 at 12:32

score 0 · Answer 2 · answered May 31 '16 at 05:34

0

Eh, if I understood your questions you want to process all the files simultaniously?
parallel will launch multiple instances of mycommand , not multiple find instances.

answered May 31 '16 at 05:34

Hristo Mohamed

1
2

Yes I do. I want to process all the files, one file per one process, restricting the max number of processes to 36. I expect `parallel` launches a new process when one process finishes. Just like this: ```seq 1000 | parallel -j4 --bar '(echo {};sleep 5)'``` – aosho235 May 31 '16 at 05:51

Morpheu5 · Answer 3 · 2016-05-31T07:45:39.847

You are trying to open a million files, 36 at a time. Even if your command could run at full power on one CPU, you'd still incur in the overhead of opening those files in the first place. I/O is one of the most time-expensive operations on computers. Your best bet would be to load as many of those files beforehand into your machine's RAM, and work in RAM as much as possible. Depending on how much RAM you have, this may improve performance significantly, because once a read is started, subsequent reads tend to leverage on caching if done immediately one after the other. You may also want to make sure your filesystem lays files down in a cache-efficient way, and also that it is a good fs when it comes to multiple subsequent reads.

I don't think parallel is going to help you much with this refactoring.

GNU parallel doesn't fully utilize my CPUs

3 Answers3