14

How does one take multiple computers and make them act as one, such that all their processors and memory are combined now and you are running any application such that yo are running them on a single very fast computer. Such that it can be used to run virtual machines (on software like vmware).

What operating system(s) can this be accomplished with? Or what software is needed?

Giacomo1968
  • 3,522
  • 25
  • 38
user2387
  • 141
  • 1
  • 1
  • 3

3 Answers3

15

The type of cluster that presents as a single operating system with lots of memory, multiple CPUs and can run whatever would normally run on the non-clustered version of that OS is called a Single System Image. This takes multiple cluster nodes and does just what you said, merges them into a single OS instance.

This is not commonly done because such a system is extremely hard to engineer correctly, and systems that cluster at the application level instead of the OS level are a lot easier to set up and often perform much better.

The reason for the performance difference has to do with assumptions. A process running on an OS assumes all of its available resources are local. A Cluster-ready process (such as a render farm) assumes that some resources are local and some are remote. Because of the differences of assumption how resources are allocated are very different.

Taking a general-purpose single-node operating system like Linux and converting it into a SSI-style cluster takes a lot of reworking of kernel internals. Concepts such as memory locality (see also: ) are extremely important on such a system, and the cost of switching a process to a different CPU can be a lot higher. Secondly, a concept not really present in Linux, locality of CPU, is also very important; if you have a multi-threaded process, having two processes running on one node and two on another can perform a lot slower than all four running on the same node. It is up to the operating system to make local vs remote choices for processes that are likely blind to such distinctions.

However, if you have a cluster-ready application (such as those listed by Chopper) the application itself will make local/remote decisions. The application is fully aware of the local vs remote implications of operations and will act accordingly.

sysadmin1138
  • 131,083
  • 18
  • 173
  • 296
  • It certainly disappoints me to read this. I'm no expert but surely this problem can be seen as specialist-specific responsibilities? Complicating all existing software seems like a bad workaround to solve the problem of having a good performance server. I understand there are lots of issues with latency and load balancing but I believe the solution will be cleaner if solved by system administrators rather than software developers. – ThreaT Feb 02 '15 at 13:17
  • 1
    @ThreaT If it's done at the OS-level the answers have to be generalizable and performant for most use-cases. To provide a perfomant, generalizable solution the OS will have to analyse running processes to discover their locality requirements and respond accordingly. Generalist high-latency parallelism solutions by an OS usually perform less well than app-specific solutions. See MongoDB (relies on OS) vs. Postgress (self-optimization) for an example of single-box scaling issues. – sysadmin1138 Feb 02 '15 at 15:34
  • Pity because it just means that embedded solutions can never really be explored fully since this approach will make spring-boot/embedded tomcat applications impossible to synchronize efficiently. – ThreaT Feb 02 '15 at 15:46
  • GNU Parallel solves the kernel configuration problem. – Yokai Jan 22 '17 at 08:26
3

Note: I'm not an expert in this topic.

The way I understand you, your interested in high performance computer clusters (as opposed to other cluster approaches like High-Availability or Load-Balancing). What you probably want is Super-, Grid- or Distributed-computing.

How does one take multiple computers and make them act as one, such that all their processors and memory are combined now and you are running any application such that yo are running them on a single very fast computer.

Without specialized hardware (see for example Torus interconnect or InfiniBand) you're limited to connecting the computers using Ethernet (meaning you either can do distributed- or grid-computing). But you should not forget or underestimate the speed difference of a local high-speed computer bus as opposed to Ethernet!

Now the question if grid- or distributed computing is something you should strive to achieve is highly dependent on the tasks you want to accomplish. With a bottle-neck like Ethernet, grid- or distributed-computing only makes sens for takts/applications that don't need to be very responsive and need to do very computation-intensive tasks. Which (broadly speaking disqualifies any application that isn't of a scientific nature. Also the application should probably be programmed in a way, that it can fully take advantage of the distributed nature of it's host.

If you're still interested, here is a list of compatible operation systems: Single system image

jlliagre
  • 8,691
  • 16
  • 36
Marco
  • 415
  • 1
  • 3
  • 16
2

There's not one big super-common way of 'clustering', it's a term used for making multiple servers perform one function but there's thousands of different functions you might want to perform.

For example there are database clusters (say Oracle RAC or MSSQL clusters) that just for database load can be configured to act as one - usually for performance and/or resilience purposed.

The same is true of other types of clusters, say CGI render-farms, they work together to render frames for the next Pixar blockbuster or whatever. The same is true for clusters used for scientific computing (gene study, particle physics, even nuclear decay).

So when we talk about a 'cluster' what we're really mean is 'a cluster doing xxxxx'.

So if you have a function you'd like to spread out over a bunch of servers let us know and we'll try to suggest some options for that use-case ok.

Chopper3
  • 100,240
  • 9
  • 106
  • 238