1

On a newly built Ubuntu 16.04 machine, running nvidia-smi fails as a regular user

$ nvidia-smi
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

Running as root works

$ sudo nvidia-smi
[sudo] password for hanxue: 
Fri Jul 19 10:05:49 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.130                Driver Version: 384.130                   |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla P100-PCIE...  Off  | 00000000:3B:00.0 Off |                    0 |
| N/A   38C    P0    31W / 250W |      0MiB / 16276MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Tesla P100-PCIE...  Off  | 00000000:5E:00.0 Off |                    0 |
| N/A   33C    P0    31W / 250W |      0MiB / 16276MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   2  Tesla P100-PCIE...  Off  | 00000000:86:00.0 Off |                    0 |
| N/A   31C    P0    31W / 250W |      0MiB / 16276MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   3  Tesla P100-PCIE...  Off  | 00000000:AF:00.0 Off |                    0 |
| N/A   31C    P0    28W / 250W |      0MiB / 16276MiB |      4%      Default |
+-------------------------------+----------------------+----------------------+

And subsequently running as a regular user works

$ nvidia-smi
Fri Jul 19 10:09:00 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.130                Driver Version: 384.130                   |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla P100-PCIE...  Off  | 00000000:3B:00.0 Off |                    0 |
| N/A   40C    P0    31W / 250W |      0MiB / 16276MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Tesla P100-PCIE...  Off  | 00000000:5E:00.0 Off |                    0 |
| N/A   35C    P0    31W / 250W |      0MiB / 16276MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   2  Tesla P100-PCIE...  Off  | 00000000:86:00.0 Off |                    0 |
| N/A   33C    P0    31W / 250W |      0MiB / 16276MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   3  Tesla P100-PCIE...  Off  | 00000000:AF:00.0 Off |                    0 |
| N/A   33C    P0    27W / 250W |      0MiB / 16276MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

Is there a mis-configuration that needs nvidia-smi to be run by root users first, and is there a solution for it? e.g. manually load the NVIDIA kernel modules

hanxue
  • 1,367
  • 2
  • 11
  • 12
  • Hm. If you were able to run compute jobs then the correct modules should have been loaded already. But you don't have CUDA installed yet? The CUDA version should be showing. – Michael Hampton Jul 19 '19 at 04:27
  • CUDA is already installed. I wonder if it is a bug - on a newly built machine, `nvidia-smi` must be run as root first – hanxue Jul 21 '19 at 07:13

0 Answers0