4

I'm tried to build a grid-cluster based on CentOS. All the machines will have a somewhat similar structure (some with more processors than others) and I will just need to push jobs to a queue and have then run on the available nodes. One job per CPU and the rest stays in line waiting.

John T has been exceptionally helpful pointing me on Gnu Queue that seams to be a hit on what I pretend (the jobs will be essentially bath scripts). I'm still studying the issue before accepting his answer but am asking the comunity for some feedback as Gnu Queue site seams to indicate the project is dead for several years now.

I've also taken a look into Sun Grid Engine and it also seam like a candidate for the job, unfortunatelly Oracle is now killing the project and Univa is yet to release their port.

I just want to start with the right foot so my question would be, have you had any practical experience with this sort of clustering (grid computing). What would your recomendations be.

Thank you in advance.

Frankie
  • 419
  • 1
  • 6
  • 19

2 Answers2

4

We use Condor for job queuing, etc.

Philip Durbin
  • 1,541
  • 2
  • 15
  • 24
3

If you don't require a real queuing system, GNU parallel may suffice to start jobs on each system simultaneously. If you do need a real scheduler, then TORQUE Resource Manager and optionally a scheduler like Maui may be needed.

You might also be as well off with abandoning CentOS in favor of a live CD like PelicanHPC. At least then, the configuration would be simpler (for what it can do, at least). Assuming you're at a university of some sort, there's nobody there at all that does HPC and/or clustering? And no faculty with contacts at a national lab or similar facility that can offer CPU time to your project?

And this question may be a candidate to migrate to Serverfault.

Mike Renfro
  • 1,281
  • 1
  • 8
  • 11
  • 1
    GNU Parallel as queue system: http://www.gnu.org/software/parallel/man.html#example__gnu_parallel_as_queue_system_batch_manager – Ole Tange May 12 '11 at 11:49
  • thank you for the very detailed answer. Even though I eventually ended with Condor your answer put me on the right tracks. – Frankie May 17 '11 at 20:42