0

I am trying to reset the GPU of my Azure virtual machine (NVIDIA GPU Cloud Image running on Standard NV6 running Ubuntu 16.04.1) to get reproducible results on a deep learning algorithm. I found this NVIDIA help page, which explains that I cannot reset individual GPUs of a DGX-1 server:

In the case of the DGX-1 and DGX-1V platforms, individual GPU's can not be reset because they are linked via nvlink, so all the GPU's have to be reset simultaneously.

How can I find if the GPU on my Azure machine belongs to a DGX-1 server?

yagmoth555
  • 16,300
  • 4
  • 26
  • 48
miguelmorin
  • 229
  • 4
  • 13
  • Please update with the OS you are using and the Azure VM series to better assist the community in answering your question. – Ken W MSFT Feb 15 '19 at 13:32
  • Have you installed NVIDIA GPU Driver Extension for Linux? You can validate the extension is installed https://docs.microsoft.com/en-us/azure/virtual-machines/extensions/hpccompute-gpu-linux#troubleshoot-and-support – Ken W MSFT Feb 15 '19 at 15:55
  • I am able to run that command but the display of the commands in your answer does not change. – miguelmorin Feb 15 '19 at 19:03
  • Looking at the NVIDIA documentation, you need to be running on an NC VM, not a NV. https://www.nvidia.com/en-us/data-center/gpu-cloud-computing/microsoft-azure/ – Ken W MSFT Feb 15 '19 at 19:56
  • Yes, I had seen that page and I am unsure which Azure lines use which NVIDIA groups of GPUs. – miguelmorin Feb 15 '19 at 20:04

1 Answers1

1

You should be able to query the OS to tell you what device you have. You didn't list he OS in your question to I will make the assumption it's Ubuntu. Here are a couple commands you could try.

lspci -vnn | grep VGA -A 12

lshw -numeric -C display

GPU info on the N-Series can be found here: https://docs.microsoft.com/en-us/azure/virtual-machines/windows/sizes-gpu

Ken W MSFT
  • 594
  • 2
  • 6
  • The two commands work and show, for example, `Subsystem: NVIDIA Corporation GM204GL [Tesla M60]` and `product: GM204GL [Tesla M60]`. But they don't show if the devices are part of a DGX-1 server. – miguelmorin Feb 15 '19 at 14:45