14

by now I have 10 servers for hpc, power computing oriented. My users need to launch several processes using qmake. The users are used to work with ubuntu 9.10, and the software from the repositories is switable for them.

I've deployed ubuntu 9.10 to all 10 servers (pxe rocks).

By now we work with parallel-ssh and cluster-ssh, which allows as to launch the same process to all servers. With this tools this tools the servers remain as independent but with the same software and the same launched command.

Now we would like to go to next step and see all the servers as a single one with all the resources from the other 9 as if was its resources.

The difference would be substantial in time to process and also time to design the command to launch.

Any advice on wich software to use will be very useful?

Thanks

John Gardeniers
  • 27,262
  • 12
  • 53
  • 108
Marc Riera
  • 1,587
  • 4
  • 21
  • 38
  • To implement the kind of cluster you're alluding to will require a rewrite of the software so that it's architected to execute on such a platform. Is a rewrite of the software within scope? If not, I'm not aware of a solution which will work as you've described. – Chris Thorpe Jan 26 '10 at 23:41
  • http://en.wikipedia.org/wiki/PVM – a sandwhich Oct 10 '11 at 14:38

4 Answers4

4

What you're talking about is called Single System Image (SSI). The most common variant of this scheme for Linux is implemented by MOSIX. While it does provide some advantages in terms of system management, in general processes cannot span across multiple nodes without using some form of MPI. Basically whether or not you use a "standard" cluster running on gridengine or you form your systems in to a single image, you will still need to modify all the software to be able to span multiple nodes.

Kamil Kisiel
  • 11,946
  • 7
  • 46
  • 68
2

A cluster isn't a single machine performance-wise, fs/memory locality are important to performance.

Doing things at the application level, while less general, is more resource-efficient. Your qmake example can be sped up significantly by setting up distcc.

Tobu
  • 4,367
  • 1
  • 23
  • 31
2

at the end I've used Sun Grid Engine.

I have documented in a private wiki, and cut and pasted on my blog . I think it can be usefull even without translation. ;)

Blog Entry : http://suportrecerca.barcelonamedia.org/blog/?p=240

If somebody want the wiki code, just ask it here.

Thanks.

Marc Riera
  • 1,587
  • 4
  • 21
  • 38
1

I've never implemented one before but it sounds like a beowulf cluster would work for what you're trying to do. I've done a lot of reading on this in the past and for some simpler processes there can be little recoding needed depending on what you're trying to achieve.

einstiien
  • 2,538
  • 18
  • 18