Can GPUs be dynamically allocated in a 20 GPU server?

Question

The purpose for this server is deep learning algorithms. What I am planning to do is to assemble a server with 20 GPUs (Nvidia GTX 1080 TIs) and 2 CPUs (Intel Xeon Phis) and then have many users draw processing power from it.

Relevant to this question: If a user wants, say, 8 GPUs, then can 8 random free GPUs be allocated to them?

Also, in general, will this work? I still don't understand how the OS will function in this scenario. Is there some special server OS (such as Ubuntu server) which I can use to make this happen?

Thanks

One server with 20 GPUs? Or a cluster of machines with GPUs? — Sven, May 24 '17 at 09:52
@Sven one server with 20 GPUs, not a cluster. This is the rack: https://www.supermicro.com.tw/products/superblade/module/SBI-7128RG-F2.cfm — Rushat Rai, May 24 '17 at 09:55
You got that wrong. This is a [blade system](https://en.wikipedia.org/wiki/Blade_server). Each of the blades is a separate computer, they just share infratructure like power supply and network. — Sven, May 24 '17 at 10:15
@Sven ah. That explains most of my confusion. I've actually scrapped this idea and have planned out a rack with 6 different machines each complete with 8 GPUs, and they are all linked through Infiniband network cards. — Rushat Rai, May 24 '17 at 10:19
My suggestion would be to hire an expert consultant who can help you analyze your workload and come up with a good solution. HPC computing is tricky and it's easy to invest into the wrong kind of setup for your use case. — Sven, May 24 '17 at 10:22
@Sven hmm that is worth considering. Also, on insidehpc.com it was written- "A high performance computer appropriate for most small and medium-sized businesses today is built from what are basically many ordinary computers connected together with a network and centrally coordinated by some **special software.**" Is this 'special software' slurm? — Rushat Rai, May 24 '17 at 10:32
Let us [continue this discussion in chat](http://chat.stackexchange.com/rooms/59224/discussion-between-mythic-cocoa-and-sven). — Rushat Rai, May 24 '17 at 10:46
Sorry, I don't do chat. And yes, slurm would be one example of such a software. — Sven, May 24 '17 at 10:50

score 1 · Accepted Answer · answered May 24 '17 at 10:16

1

This is not one computer but multiple ones that are each separate (but possibly connected with a fast/low latentcy Infiniband network). You need a classic HPC cluster environment with a job scheduler/batch system, eg. slurm.

answered May 24 '17 at 10:16

Sven

97,248
13
177
225

Can GPUs be dynamically allocated in a 20 GPU server?

1 Answers1