Running commands in parallel with a limit of simultaneous number of commands

23

9

Sequential: for i in {1..1000}; do do_something $i; done - too slow

Parallel: for i in {1..1000}; do do_something $i& done - too much load

How to run commands in parallel, but not more than, for example, 20 instances per moment?

Now usually using hack like for i in {1..1000}; do do_something $i& sleep 5; done, but this is not a good solution.

Update 2: Converted the accepted answer into a script: http://vi-server.org/vi/parallel

#!/bin/bash

NUM=$1; shift

if [ -z "$NUM" ]; then
    echo "Usage: parallel <number_of_tasks> command"
    echo "    Sets environment variable i from 1 to number_of_tasks"
    echo "    Defaults to 20 processes at a time, use like \"MAKEOPTS='-j5' parallel ...\" to override."
    echo "Example: parallel 100 'echo \$i; sleep \`echo \$RANDOM/6553 | bc -l\`'"
    exit 1
fi

export CMD="$@";

true ${MAKEOPTS:="-j20"}

cat << EOF | make -f - -s $MAKEOPTS
PHONY=jobs
jobs=\$(shell echo {1..$NUM})

all: \${jobs}

\${jobs}:
        i=\$@ sh -c "\$\$CMD"
EOF

Note that you must replace 8 spaces with 2 tabs before "i=" to make it work.

Vi.

Posted 2010-06-17T11:47:04.917

Reputation: 13 705

Answers

15

GNU Parallel is made for this.

seq 1 1000 | parallel -j20 do_something

It can even run jobs on remote computers. Here's an example for re-encoding an MP3 to OGG using server2 and local computer running 1 job per CPU core:

parallel --trc {.}.ogg -j+0 -S server2,: \
     'mpg321 -w - {} | oggenc -q0 - -o {.}.ogg' ::: *.mp3

Watch an intro video to GNU Parallel here:

http://www.youtube.com/watch?v=OpaiGYxkSuQ

Ole Tange

Posted 2010-06-17T11:47:04.917

Reputation: 416

2One more option: xargs --max-procs=20. – Vi. – 2015-12-28T10:35:10.267

Haven't know about "moreutils" and that there's already a tool for the job. Looking and comparing. – Vi. – 2010-07-27T16:26:15.037

1The parallel in moreutils is not GNU Parallel and is quite limited in its options. The command above will not run with the parallel from moreutils. – Ole Tange – 2010-09-28T22:49:41.827

4

Not a bash solution, but you should use a Makefile, possibly with -l to not exceed some maximum load.

NJOBS=1000

.PHONY = jobs
jobs = $(shell echo {1..$(NJOBS)})

all: $(jobs)

$(jobs):
    do_something $@

Then to start 20 jobs at a time do

$ make -j20

or to start as many jobs as possible without exceeding a load of 5

$ make -j -l5

Benjamin Bannier

Posted 2010-06-17T11:47:04.917

Reputation: 13 999

Looks like the non-hacky solution for now. – Vi. – 2010-06-17T14:01:50.203

2echo -e 'PHONY=jobs\njobs=$(shell echo {1..100000})\n\nall: ${jobs}\n\n${jobs}:\n\t\techo $@; sleep \echo $$RANDOM/6553 | bc -l`' | make -f - -j20` Now it looks more hacky again. – Vi. – 2010-06-17T14:11:22.700

@vi: oh my .... – Benjamin Bannier – 2010-06-17T14:12:47.753

Converted your solution to a script. Now it can be used with ease. – Vi. – 2010-06-17T15:09:44.220

2

posting the script in the question with formatting:

#!/bin/bash

NUM=$1; shift

if [ -z "$NUM" ]; then
    echo "Usage: parallel <number_of_tasks> command"
    echo "    Sets environment variable i from 1 to number_of_tasks"
    echo "    Defaults to 20 processes at a time, use like \"MAKEOPTS='-j5' parallel ...\" to override."
    echo "Example: parallel 100 'echo \$i; sleep \`echo \$RANDOM/6553 | bc -l\`'"
    exit 1
fi

export CMD="$@";

true ${MAKEOPTS:="-j20"}

cat << EOF | make -f - -s $MAKEOPTS
PHONY=jobs
jobs=\$(shell echo {1..$NUM})

all: \${jobs}

\${jobs}:
        i=\$@ sh -c "\$\$CMD"
EOF

Note that you must replace 8 spaces with 2 tabs before "i=".

warren

Posted 2010-06-17T11:47:04.917

Reputation: 8 599

1

One simple idea:

Check for i modulo 20 and execute the wait shell-command before do_something.

harrymc

Posted 2010-06-17T11:47:04.917

Reputation: 306 093

It will either wait for all current tasks to complete (creating sags in the number of tasks plot) or wait for one specific task that can stall for longer time (again creating sags in this case) – Vi. – 2010-06-17T17:19:40.453

@Vi: Shell wait is for all background tasks that belong to this shell. – harrymc – 2010-06-17T18:57:27.710

1

for i in {1..1000}; do 
     (echo $i ; sleep `expr $RANDOM % 5` ) &
     while [ `jobs | wc -l` -ge 20 ] ; do 
         sleep 1 
     done
done

msw

Posted 2010-06-17T11:47:04.917

Reputation: 3 287

May be while [ \jobs | wc -l` -ge 20]; do`? – Vi. – 2010-06-17T13:38:06.443

sure, but in my sample, I'd then have to compute njobs twice, and performance is quite important in shell scripts that run sleep tasks ;) – msw – 2010-06-17T13:42:02.867

I mean your version doesn't work as expected . I change sleep 1 to sleep 0.1 and it start to average njobs to 40-50 instead of 20.

If there are more than 20 jobs we need to wait for any job gets finished, not just wait 1 second. – Vi. – 2010-06-17T13:57:10.633

1

You could use ps to count how many processes you have running, and whenever this drops below a certain threshold you start another process.

Pseudo code:

i = 1
MAX_PROCESSES=20
NUM_TASKS=1000
do
  get num_processes using ps
  if num_processes < MAX_PROCESSES
    start process $i
    $i = $i + 1
  endif
  sleep 1 # add this to prevent thrashing with ps
until $i > NUM_TASKS

Paul R

Posted 2010-06-17T11:47:04.917

Reputation: 4 717

0

you can do it like this.

threads=20
tempfifo=$PMS_HOME/$$.fifo

trap "exec 1000>&-;exec 1000<&-;exit 0" 2
mkfifo $tempfifo
exec 1000<>$tempfifo
rm -rf $tempfifo

for ((i=1; i<=$threads; i++))
do
    echo >&1000
done

for ((j=1; j<=1000; j++))
do
    read -u1000
    {
        echo $j
        echo >&1000
    } &
done

wait
echo "done!!!!!!!!!!"

using named pipes, every time, it runs 20 sub shell in parallel.

Hope it help :)

ouyangyewei

Posted 2010-06-17T11:47:04.917

Reputation: 1