Highest Voted 'nvidia' Questions - Server Fault Stack Exchange

1

vote

1 answer

Ubuntu server 20.04 LTS - Installing nvidia & cuda installs gnome as well

I have a GPU server which requires cuda for example for machine learning tasks. unfortunately, as soon as I install the NVIDIA drivers and cuda, apparently a variant of gnome is installed as well. This gnome variant can almost do nothing, the shell…

asked Sep 10 '21 at 17:19

Julian Bechtold

123
5

1

vote

1 answer

GPU server freezes during GPU idling

We have a new Supermicro Server AS-4124GS-TNR equipped with eight NVIDIA RTX A6000. The OS is Ubuntu 20.04.2, the NVIDIA driver version is 460.73.01 (no Nouveau driver used), the CUDA Version is 11.2. We ran a few long-lasting tests on the GPUs and…

ubuntu server-crashes nvidia freeze

asked Jul 14 '21 at 07:39

user776206

13
4

1

vote

1 answer

slurm nvidia-docker ignores CUDA_VISIBLE_DEVICES

I have a problem running nvidia-docker containers on a slurm cluster. When inside the container all gpus are visible so basically it ignores the CUDA_VISIBLE_DEVICES set env by slurm. Outside the container the visible gpus are correct. Is there a…

docker nvidia slurm

asked Mar 21 '21 at 18:26

JohnA.Zoidberg

13
3

1

vote

0 answers

GCP VM: nvidia-container-cli: initialization error: driver error: timed out: unknown

Lately my GCP VM of multiple GPUs throws the following error when I try to run my container: docker: Error response from daemon: OCI runtime create failed: container_linux.go:370: starting container process caused: process_linux.go:459: container…

docker google-cloud-platform google-compute-engine nvidia

asked Jan 19 '21 at 15:33

ben0it8

111
2

1

vote

1 answer

GKE can't schedule newly created pods that demand GPU on newly added nodes with GPUs

When adding new pool nodes with GPU Google Kubernetes Engine can't schedule newly created pods that demand GPU on these new nodes, should be automatic but not for GPU resources I guess, new pods stays in 'pending' state forever, how to fix…

google-cloud-platform kubernetes google-kubernetes-engine graphics-processing-unit nvidia

asked Jul 17 '20 at 08:19

Elras

21
4

1

vote

1 answer

Misbehaving NVLINK with 2080 ti cards?

I am running into problems with nvlink'd RTX videocards, and I wonder if someone more experienced with this tech could kindly look at the output below and tell me if there is a problem? Using a pair of MSI 2080 ti cards and an RTX NVLINK bridge by…

linux networking nvidia gpu nvlink

asked May 08 '20 at 14:34

Eric M

113
4

0

votes

0 answers

How can I get kernel / early boot output over my NVIDIA GPU using CentOS 7?

I recently installed CentOS 7.7 with KDE on a machine with both onboard graphics and an NVIDIA GTX 1080 Ti. I got the proprietary NVIDIA drivers installed, but it was quite difficult as I couldn't see what was happening during boot up past a certain…

linux centos boot grub nvidia

asked Sep 26 '19 at 17:46

josePhoenix

183
2
8

0

votes

1 answer

PCI at NVIDA Tesla P 100 in shared pass through mode is disabled

I have successfully completed NVIDIA Tesla P100 Grid Setup on the vSphere Host Server with VMware ESXI 6.7.While trying to add PCI devices in the VM, the option to choose PCI devices is shown as in the “Add Other Hardware” setting in “Virtual…

vmware-esxi vmware-vsphere vmware-esx pci nvidia

asked Mar 14 '19 at 11:29

Sarath Zacharia

31
1
5

0

votes

1 answer

How can I find out if my Azure VM is running on DGX-1?

I am trying to reset the GPU of my Azure virtual machine (NVIDIA GPU Cloud Image running on Standard NV6 running Ubuntu 16.04.1) to get reproducible results on a deep learning algorithm. I found this NVIDIA help page, which explains that I cannot…

azure nvidia nvlink

asked Feb 15 '19 at 11:53

miguelmorin

229
4
13

0

votes

1 answer

Install Nvidia Drivers 9.0 for TensorFlow pip (Debian 9.7)

I installed Nvidia drivers 9.1 on my Debian 9.7 (Dataproc) when I try to run TensorFlow 1.9 via this test script it fails: Used this guide to install GPU Drivers: https://cloud.google.com/dataproc/docs/concepts/compute/gpus Used pip install…

debian google-cloud-platform hadoop nvidia

asked Feb 11 '19 at 23:11

gogasca

313
2
15

0

votes

0 answers

Checking GPU firmware

In a solution of GPU in Cloud (with OpenStack) where the VMs can access the graphic cards via PCI-passthrough, we want to be sure no malicious person changed the firmware of the GPU from inside a VM. A potential solution we came up with was to use…

linux security openstack firmware nvidia

asked Aug 08 '18 at 13:53

J. Chorin

41
3

0

votes

2 answers

"Too many levels of symbolic links" in NFS via automount resolved by restarting Docker

This is bizarre and while I have a workaround, I'd prefer a permanent fix. I have a small group of GPU machines running Ubuntu 14.04 which I am using as workers for a cloud service that's effected via Docker images. I have nvidia-docker installed on…

docker nfs automount autofs nvidia

asked Dec 21 '17 at 21:50

krivard

182
2
9

0

votes

1 answer

yum install kmod-nvidia - kernel issue

Impossible to install NVIDIA driver on CentOS release CentOS Linux release 7.3.1611 (Core), the package kmod-nvidia gives errors and kernel incompatibilities. Usually installed with yum install kmod-nvidia -y Current output: sudo yum install…

centos yum nvidia

asked Aug 24 '17 at 08:24

Kevin Lemaire

135
2
10

0

votes

1 answer

Reverting yum update

I needed to update NVidia driver on a CentOS 6.9 and decided to update a bit more. So I did sudo yum update and rebooted. Unfortunately that caused problems with NVidia that were worse than before. I am able to login only remotely now, and…

centos yum nvidia

asked May 17 '17 at 00:04

Michael

1,723
2
11
7

0

votes

0 answers

Can't kill an process on GPU

i have an process running on an K80 GPU. Is there an way to stop it with the nvidia tools? I tried all the kill -9 etc. Nothing is killing it. $uname -a Linux slurm10 3.16.0-33-generic #44~14.04.1-Ubuntu SMP Fri Mar 13 10:33:29 UTC 2015 x86_64…

ubuntu-14.04 kill-process nvidia

asked Nov 14 '16 at 14:50

PlagTag

233
1
3
9

Questions tagged [nvidia]