I'm trying to set up a cluster of 8 computers plus a main file server. Ideally, I'd like to set this up in a pxe-boot, quasi-diskless/quasi-stateless environment (i.e. the only local storage is /var
, where things like torque configuration will go). Each of the 8 compute nodes has 4 NVIDIA Tesla K40m's, but the root file server has no GPU.
Ideally, I'd like to be able to create the complete installation on the file server (at /node
) then PXE-boot that to the compute nodes, but, I haven't found a way to install the NVIDIA drivers without an NVIDIA GPU on board. I found one question on NVIDIA's forums about how someone unsuccessfully attempted this...
Alternatively, I could install the NVIDIA drivers to one of the compute nodes (one is currently running CentOS on it's local disks) to (for example) /usr/local/nvidia
and keep track of what files it creates and create a tarball of that to copy to the file server installation.
Lastly, I could just maintain eight separate installations, but, I don't like this from a long-term maintenance perspective (each compute node will be running torque jobs so I'd like the nodes to look more-or-less identical).
In summary, what I'm asking for is this:
- Can I install the NVIDIA drivers without an NVIDIA GPU on board?
- Is there some other way I should be going about this?
For reference, we're running CentOS 7.
[root@compute-3 /]# uname -a
Linux compute-3 3.10.0-514.2.2.el7.x86_64 #1 SMP Tue Dec 6 23:06:41 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux