-3

My company has a big (relatively) computer farm, say, 100 physical servers (dual CPU hexacore e5 xeons with 160 Gb RAM) leased from some hardware provider (say Leaseweb or OVM) on monthly basis, means, on 1st January I pay for all 100 servers to use during 1st-29th Febrary.

The servers serve partners X,Y,Z. The partners pay our company for using the servers on usage basis: if they stop using the servers, they don't pay my company.

Suppose, partner X completely stops on 2nd January using our servers and I have now, say, 30% of my servers returning zero revenue: I lose 30% of invested money.

Given this scenario:

  • Are there any existing tools cluster management or provisioning tools that would allow me to quickly configure these systems as an HPC or cloud compute resource?
  • What are the existing scheduling and resource management tools that could be used to allow clients to submit compute workloads to the aforementioned cluster?
  • Do any of the previously mentioned resource managers integrate quickly with billing or client account management solutions?
Matt
  • 2,711
  • 1
  • 13
  • 20
rlib
  • 195
  • 1
  • 1
  • 7
  • 1
    Find someone to buy the spare capacity? What sort of answer do you expect us to be able to provide? – ceejayoz Jan 11 '17 at 21:35
  • 2
    I'm voting to close this question as off-topic because it just completely off-topic! – alexus Jan 11 '17 at 21:37
  • Something like that: http://www.cpusage.com/ but in bigger scale. – rlib Jan 11 '17 at 21:37
  • @rlib I suspect you'd spend more money implementing such a solution than you'd save. Why not hold your partners to a monthly billing cycle? – ceejayoz Jan 11 '17 at 21:43
  • @ceejayoz: the business model looks different from the described above; i made it simpler in the description above. The fact is that the model cannot be changed and I'm always at risk of not-using bought servers. So I'm looking for ways to return some money from the servers. – rlib Jan 11 '17 at 21:49

1 Answers1

3

This sort of thing is possible but logistically speaking it would likely involve several much more specific questions about setting up the infrastructure, which would be instance specific.

This sort of thing has been tried in the High Performance Computing(HPC) community several times, mostly without success. Here are some ovservations which may help you be successful

  1. The systems you mentioned are below the compute requirements for many of the institutions that have enough compute load to need on-demand resources beyond their dedicated systems.
  2. Without a high-speed interconnect(by which I mean Infiniband) between all of the nodes a 100 node cluster with the systems you described there is no practical HPC use for this system, 100 Raspberry Pi's would likely be as effective for the communication-intensive real world workloads you would likely be targeting
  3. Almost all HPC, cloud and high-throughput computing workloads(of the type that could use this sort of cluste) are data intensive, so you would likely need resources to create at least one additional storage cluster, as well as backup infrastructure, policies for hosting other peoples data, and some sort of significant internet connection for clients to upload and download data through
  4. If you do want to pursue this target a specific workload your hardware is good at, find potential clients who run that type of workload and see if they would like to use a cluster of your old/underutilized hardware. If you choose to do this setup the cluster in advance with the tools and applications the client will likely use; make sure everything has the most recent security patches and features available as well as legacy versions of the software. Test the resource manager and scheduling services rigorously and document how to use them, ideally with benchmarks or tests similar to real world workloads. Make this documentation and some example workloads available to the client in several formats. Also be prepared to have someone on call to deal with client issues per your SLA.
  5. HPC is not easy, be prepared to hire at least one wizard to setup and maintain your cluster, expect it to take a long time to find someone to fill this position and be prepared to do whatever that person says
Matt
  • 2,711
  • 1
  • 13
  • 20