I'm running a command like this on my 36 core server (EC2 c4.8xlarge/Amazon Linux).
find . -type f | parallel -j 36 mycommand
The number of files to process is ~1,000,000, and it takes dozens of minutes. It should run 36 processes simultaneously. However, from the result of top
, there are about 10 processes at most, and 70% is idle. ps
shows more processes,
but most of them are defunct.
I guessed it was because each mycommand
finished
so quickly, parallel
could not catch up spawning new processes. So I tried
parallel --nice 20
to allocate more CPU time to parallel
itself, but this didn't work.
Does anyone have an idea to improve this?
$ parallel --version
GNU parallel 20151022