0

The purpose for this server is deep learning algorithms. What I am planning to do is to assemble a server with 20 GPUs (Nvidia GTX 1080 TIs) and 2 CPUs (Intel Xeon Phis) and then have many users draw processing power from it.

Relevant to this question: If a user wants, say, 8 GPUs, then can 8 random free GPUs be allocated to them?

Also, in general, will this work? I still don't understand how the OS will function in this scenario. Is there some special server OS (such as Ubuntu server) which I can use to make this happen?

Thanks

Rushat Rai
  • 111
  • 4
  • One server with 20 GPUs? Or a cluster of machines with GPUs? – Sven May 24 '17 at 09:52
  • @Sven one server with 20 GPUs, not a cluster. This is the rack: https://www.supermicro.com.tw/products/superblade/module/SBI-7128RG-F2.cfm – Rushat Rai May 24 '17 at 09:55
  • You got that wrong. This is a [blade system](https://en.wikipedia.org/wiki/Blade_server). Each of the blades is a separate computer, they just share infratructure like power supply and network. – Sven May 24 '17 at 10:15
  • @Sven ah. That explains most of my confusion. I've actually scrapped this idea and have planned out a rack with 6 different machines each complete with 8 GPUs, and they are all linked through Infiniband network cards. – Rushat Rai May 24 '17 at 10:19
  • My suggestion would be to hire an expert consultant who can help you analyze your workload and come up with a good solution. HPC computing is tricky and it's easy to invest into the wrong kind of setup for your use case. – Sven May 24 '17 at 10:22
  • @Sven hmm that is worth considering. Also, on insidehpc.com it was written- "A high performance computer appropriate for most small and medium-sized businesses today is built from what are basically many ordinary computers connected together with a network and centrally coordinated by some **special software.**" Is this 'special software' slurm? – Rushat Rai May 24 '17 at 10:32
  • Let us [continue this discussion in chat](http://chat.stackexchange.com/rooms/59224/discussion-between-mythic-cocoa-and-sven). – Rushat Rai May 24 '17 at 10:46
  • Sorry, I don't do chat. And yes, slurm would be one example of such a software. – Sven May 24 '17 at 10:50

1 Answers1

1

This is not one computer but multiple ones that are each separate (but possibly connected with a fast/low latentcy Infiniband network). You need a classic HPC cluster environment with a job scheduler/batch system, eg. slurm.

Sven
  • 97,248
  • 13
  • 177
  • 225