3

We have more than 100 git repos, and sometimes I want to grep over all.

To update the repos I use this:

for repo in *; do (cd $repo; git checkout master; git pull); done

This is quite slow.

How to speed it up?

Running all updates at once would spawn too many processes.

I need a way to reduce the load to N workers.

Has someone a solution to this?

guettli
  • 3,113
  • 14
  • 59
  • 110

3 Answers3

4

You can use GNU parallel to do this task. From GNU parallel's home page,

" A job can also be a command that reads from a pipe. GNU parallel can then split the input and pipe it into commands in parallel."

There is excellent tutorial and this specific section addresses what exactly you have asked.

Edit: Here is the command you can use. (Slightly modified from Ole Tang's answer)

parallel -j<number of jobs to run> 'cd {} && git checkout master && git pull' ::: */

This will trigger parallel "number of jobs" you have specified and perform whatever you have asked to do it.

HTH

Nehal Dattani
  • 581
  • 2
  • 10
3

You can use xargs to do the job, for example

(for repo in *
    do
    [ -d ${repo} ] && echo ${repo}
    done ) | xargs -I{} -P4 ./gitActions.sh {}

The flag -P4 tells xargs to run up to 4 simultaneous process so you can play with the number of process you want/need.

Then your gitActions.sh file should contain:

#!/bin/bash
repo=$1
cd $repo; git checkout master; git pull
alphamikevictor
  • 1,062
  • 6
  • 19
2

Using GNU Parallel it looks like this:

parallel -j77 'cd {} && git checkout master && git pull' ::: */ 

It gives 77 workers.

GNU Parallel is a general parallelizer and makes is easy to run jobs in parallel on the same machine or on multiple machines you have ssh access to. It can often replace a for loop.

If you have 32 different jobs you want to run on 4 CPUs, a straight forward way to parallelize is to run 8 jobs on each CPU:

Simple scheduling

GNU Parallel instead spawns a new process when one finishes - keeping the CPUs active and thus saving time:

GNU Parallel scheduling

Installation

If GNU Parallel is not packaged for your distribution, you can do a personal installation, which does not require root access. It can be done in 10 seconds by doing this:

(wget -O - pi.dk/3 || curl pi.dk/3/ || fetch -o - http://pi.dk/3) | bash

For other installation options see http://git.savannah.gnu.org/cgit/parallel.git/tree/README

Learn more

See more examples: http://www.gnu.org/software/parallel/man.html

Watch the intro videos: https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1

Walk through the tutorial: http://www.gnu.org/software/parallel/parallel_tutorial.html

Sign up for the email list to get support: https://lists.gnu.org/mailman/listinfo/parallel

Ole Tange
  • 2,836
  • 5
  • 29
  • 45
  • 2
    Only script is not enough, explain what it does. – peterh Jul 09 '15 at 22:33
  • The images are nice. But I don't understand what the first image wants to show me. You say "GNU Parallel instead spawns a new process when one finishes - keeping the CPUs active and thus saving time". This sounds like a comparison. The one tools is `GNU Parallel`. What is the second tool of your comparison? – guettli Aug 04 '15 at 09:51
  • The first method (and picture) divides the jobs before running them. The second is GNU Parallel that divides the jobs while they are being run. – Ole Tange Aug 04 '15 at 11:35