2

Possible Duplicate:
Linux - Running The Same Command on Many Machines at Once

Here's the situation:

  • We have a lab that consists of fifteen quad-core machines, each running Ubuntu Linux.
  • There is a video encoding software I need to be running, but one job (i.e. one video with one configuration) takes a really long time (couple of hours).
  • There are about as many configurations as machines (about 15) and around 20 videos.

So I thought about having the videos accessible through a centralized storage, but let each machine run the encoding process.

In its most basic form, the command used is something like this

./encode -d default.conf -f local.conf -i inputFile.yuv

Now, the question is: Is there any software that I could use to easily deploy those tasks on the lab machines? I was thinking about:

  • Having one master that dispatches jobs, e.g. "Tell machine 1 to run /home/user/encode -i input1.yuv, then /home/user/encode -i input2.yuv, et cetera"
  • Being able to see which node is currently working on which task and for how long
  • Being able to stop a task or retry upon failure

I am not limited to CLI, could also be a GUI application. Any ideas?

slhck
  • 315
  • 2
  • 17
  • I'm all for closing this myself -- some very good suggestions in there. Also related: [What is a good modern parallel SSH tool?](http://serverfault.com/questions/17931/what-is-a-good-modern-parallel-ssh-tool) – slhck Aug 01 '11 at 09:55

2 Answers2

1

Consider installing TORQUE. Its scheduler isn't the best out there, but it's more than sufficient for this kind of usage. You can replace the scheduler with Maui if you need the extra features later.

The only feature from your list TORQUE misses is automatically retrying a job on failure. But you should be able to script that yourself on the TORQUE server by checking the output of its qstat command to know which jobs are running, and the contents of your output folder to know what's finished.

Mike Renfro
  • 1,281
  • 1
  • 8
  • 11
0

And if you need a bigger solution with more options and features consider the Sun Grid Engine (SGE) now known as the Oracle Grid Engine.

http://en.wikipedia.org/wiki/Oracle_Grid_Engine

mailq
  • 16,882
  • 2
  • 36
  • 66