2

I'm running a command like this on my 36 core server (EC2 c4.8xlarge/Amazon Linux).

find . -type f | parallel -j 36 mycommand

The number of files to process is ~1,000,000, and it takes dozens of minutes. It should run 36 processes simultaneously. However, from the result of top, there are about 10 processes at most, and 70% is idle. ps shows more processes, but most of them are defunct.

I guessed it was because each mycommand finished so quickly, parallel could not catch up spawning new processes. So I tried parallel --nice 20 to allocate more CPU time to parallel itself, but this didn't work.

Does anyone have an idea to improve this?

$ parallel --version GNU parallel 20151022

aosho235
  • 63
  • 4
  • 1
    Do you understand that `parallel` will not make `find` run in parallel. Even if it could, it would certainly be stuck waiting on IO as a CPU is always faster than the hard drive. – Julie Pelletier May 31 '16 at 05:22
  • @JuliePelletier I tried making list of files beforehand by `find . -type f > files`, but didn't improve. – aosho235 May 31 '16 at 05:35
  • The speed of the `find` command will not improve unless you change the path you ask it to search. It is dependent on the hard drive's speed. – Julie Pelletier May 31 '16 at 05:38
  • @JuliePelletier I thought I could remove the bottleneck from the hard drive by making list beforehand and feeding it to `parallel` (`parallel -j36 mycommand < files`), but didn't improve. – aosho235 May 31 '16 at 06:06
  • I may have misunderstood that. Yes, the list being done beforehand does remove that part of the strain on the hard drive. Then the actual question depends on what happens in your `mycommand`. – Julie Pelletier May 31 '16 at 06:12
  • Just quick guess ... did you try with `find | xargs -n 1 -P 36 mycommand` if your xargs support the `-P` argument? maybe you could use larger `-n` if mycommand supports processing more files in one run. That would save some overhead. – Fox May 31 '16 at 09:20
  • 1
    Do you really need the -j36? I thought parallel used the number of cores as the default. – Zoredache May 31 '16 at 14:07
  • @Fox indeed xargs with -P option was much faster than parallel when I faced a similar situation, where `mycommand` was not worth the overhead of parallel. I timed running mogrify on 1000 images (resizing) using 4 cores, and parallel took 17 seconds (with only about 90% CPU utilization, 15% of which was from a perl child process of parallel), compared to 4 seconds using xargs -P 4 (full CPU utilization). – Yibo Yang Aug 27 '18 at 00:29

3 Answers3

3

The number of files to process is ~1,000,000, and it takes dozens of minutes.

So you are running around 600 jobs per second. The overhead for a single GNU Parallel job is in the order of 2-5 ms, so when you are getting more than 200 jobs per second, GNU Parallel will not perform better without tweaking.

The tweak is to have more parallels spawining jobs in parallel. From https://www.gnu.org/software/parallel/man.html#EXAMPLE:-Running-more-than-250-jobs-workaround

cat myinput | parallel --pipe -N 100 --round-robin -j50 parallel -j100 your_prg

This way you will have 50 GNU Parallel that can each spawn 100 jobs per second.

Ole Tange
  • 2,836
  • 5
  • 29
  • 45
0

Eh, if I understood your questions you want to process all the files simultaniously?
parallel will launch multiple instances of mycommand , not multiple find instances.

  • Yes I do. I want to process all the files, one file per one process, restricting the max number of processes to 36. I expect `parallel` launches a new process when one process finishes. Just like this: ```seq 1000 | parallel -j4 --bar '(echo {};sleep 5)'``` – aosho235 May 31 '16 at 05:51
0

You are trying to open a million files, 36 at a time. Even if your command could run at full power on one CPU, you'd still incur in the overhead of opening those files in the first place. I/O is one of the most time-expensive operations on computers. Your best bet would be to load as many of those files beforehand into your machine's RAM, and work in RAM as much as possible. Depending on how much RAM you have, this may improve performance significantly, because once a read is started, subsequent reads tend to leverage on caching if done immediately one after the other. You may also want to make sure your filesystem lays files down in a cache-efficient way, and also that it is a good fs when it comes to multiple subsequent reads.

I don't think parallel is going to help you much with this refactoring.

Morpheu5
  • 259
  • 4
  • 18