Efficient way to use all cores in bash or zsh script

5

2

If I want to process large number of files with command "do_something" which only can use one core, what's the best way to use all available cores assuming each file can be processed independently?

At this moment I do something like this:

#!/bin/zsh
TASK_LIMIT=8
TASKS=0
for i in *(.)
{
  do_something "$i"&
  TASKS=$(($TASKS+1))
  if [[ $TASKS -ge $TASK_LIMIT ]]; then
    wait; TASKS=0; fi
}
wait

Obviously, this is not efficient because after reaching $TASK_LIMIT it waits when all "do_something" finish. For example in my real script I make use of about 500% of my 8-core CPU instead of >700%.

Running without $TASK_LIMIT is not an option because "do_something" may consume lots of memory.

Ideally, the script should try to keep number of parallel tasks at $TASK_LIMIT: for example if task 1 of 8 finished and there is at least one more file to process, the script should run next "do_something" instead of waiting for remaining 7 tasks to finish. Is there a way to achieve this in zsh or bash?

Lissanro Rayen

Posted 2012-10-26T15:19:22.130

Reputation: 278

hint: use trap to catch SIGCHLD in monitor mode. – Keith – 2012-10-26T16:22:43.527

Answers

6

I strongly suggest having a look at GNU parallel. It does exactly what you want and doesn't depend on any particular shell.

Lars Kotthoff

Posted 2012-10-26T15:19:22.130

Reputation: 1 447

0

Remember how many processes you started. When a process ends, decrease the count. When the count is lower than the maximum, start a new process.

The only problem is how to signal the end of a process. You can e.g. create an emty file of a given name in /tmp (composed of $$ and $BASHPID).

choroba

Posted 2012-10-26T15:19:22.130

Reputation: 14 741