1

I have a script which takes multiple arguments and i need to run this script on multiple instances in parallel on AWS. For example, for sake of simplicity, if i have three instances in AWS, i would like to run the following:

On instance-a: script.sh a b
On instance-b: script.sh s t
On instance-c: script.sh y z

I will be spawning the instances using an AMI which will have the runtime (MATLAB) and the program (using the runtime) installed as part of the image.

I was checking this link and i saw Capistrano mentioned. Will that work in my case? Any other lightweight alternative that can be explored? Just to mention, i will be needing the return status and output (CSV file) generated from each instance.

Technext
  • 147
  • 2
  • 7
  • 1
    GNU parallel is a lightweight option that can use SSH connections to run jobs in parallel over multiple machines and collect the output "grouped" by job (as opposed to interleaved, though it can do that too, with `--ungroup`). Not sure if this is a solution or a hack, but it's easy to get started, should be a tool included in your distro. – Michael - sqlbot Dec 12 '17 at 12:24

1 Answers1

1

If you only want 3 then this will work (version >= 20161222 for --results my.csv to work):

parallel --results my.csv ssh {1} script.sh {2} {3} ::: instance-a instance-b instance-c :::+ a s y :::+ b t z

But let me guess: You have many more instances listed in a file called hosts.txt:

instance-a
instance-b
instance-c

You do not care which instance runs which jobs - they are just workers. You have a .tsv file like input.tsv:

a[tab]b
s[tab]t
y[tab]z

Then you would run:

parallel --slf hosts.txt --results my.csv -a input.tsv --colsep '\t' script.sh 

If your command returns 0 on success you can even run on cheap ass spot-market servers: By using --retries 5 you can ask GNU Parallel to re-do the job on another server if one server breaks down (i.e. returns not 0).

Ole Tange
  • 2,836
  • 5
  • 29
  • 45
  • This looks quite powerful. :) You're right that there will be several instances and that i do not care which instance runs which job. I was just curious that in case if it _does_ matter as to which instance will run which job, then is such kind of scenario also possible with `parallel`? Can i have the corresponding instance name in each row as part of the `input.tsv` file for the scenario i just asked? – Technext Dec 12 '17 at 16:20
  • 1
    Yep. That will work. Or you can link the arguments as in the first example. – Ole Tange Dec 12 '17 at 17:38