Questions tagged [nvidia]

An American global technology company based in Santa Clara, California, best known for its graphics processors (GPUs).

62 questions
1
vote
1 answer

Ubuntu server 20.04 LTS - Installing nvidia & cuda installs gnome as well

I have a GPU server which requires cuda for example for machine learning tasks. unfortunately, as soon as I install the NVIDIA drivers and cuda, apparently a variant of gnome is installed as well. This gnome variant can almost do nothing, the shell…
1
vote
1 answer

GPU server freezes during GPU idling

We have a new Supermicro Server AS-4124GS-TNR equipped with eight NVIDIA RTX A6000. The OS is Ubuntu 20.04.2, the NVIDIA driver version is 460.73.01 (no Nouveau driver used), the CUDA Version is 11.2. We ran a few long-lasting tests on the GPUs and…
user776206
  • 13
  • 4
1
vote
1 answer

slurm nvidia-docker ignores CUDA_VISIBLE_DEVICES

I have a problem running nvidia-docker containers on a slurm cluster. When inside the container all gpus are visible so basically it ignores the CUDA_VISIBLE_DEVICES set env by slurm. Outside the container the visible gpus are correct. Is there a…
1
vote
0 answers

GCP VM: nvidia-container-cli: initialization error: driver error: timed out: unknown

Lately my GCP VM of multiple GPUs throws the following error when I try to run my container: docker: Error response from daemon: OCI runtime create failed: container_linux.go:370: starting container process caused: process_linux.go:459: container…
1
vote
1 answer

GKE can't schedule newly created pods that demand GPU on newly added nodes with GPUs

When adding new pool nodes with GPU Google Kubernetes Engine can't schedule newly created pods that demand GPU on these new nodes, should be automatic but not for GPU resources I guess, new pods stays in 'pending' state forever, how to fix…
1
vote
1 answer

Misbehaving NVLINK with 2080 ti cards?

I am running into problems with nvlink'd RTX videocards, and I wonder if someone more experienced with this tech could kindly look at the output below and tell me if there is a problem? Using a pair of MSI 2080 ti cards and an RTX NVLINK bridge by…
Eric M
  • 113
  • 4
0
votes
0 answers

How can I get kernel / early boot output over my NVIDIA GPU using CentOS 7?

I recently installed CentOS 7.7 with KDE on a machine with both onboard graphics and an NVIDIA GTX 1080 Ti. I got the proprietary NVIDIA drivers installed, but it was quite difficult as I couldn't see what was happening during boot up past a certain…
josePhoenix
  • 183
  • 2
  • 8
0
votes
1 answer

PCI at NVIDA Tesla P 100 in shared pass through mode is disabled

I have successfully completed NVIDIA Tesla P100 Grid Setup on the vSphere Host Server with VMware ESXI 6.7.While trying to add PCI devices in the VM, the option to choose PCI devices is shown as in the “Add Other Hardware” setting in “Virtual…
0
votes
1 answer

How can I find out if my Azure VM is running on DGX-1?

I am trying to reset the GPU of my Azure virtual machine (NVIDIA GPU Cloud Image running on Standard NV6 running Ubuntu 16.04.1) to get reproducible results on a deep learning algorithm. I found this NVIDIA help page, which explains that I cannot…
miguelmorin
  • 229
  • 4
  • 13
0
votes
1 answer

Install Nvidia Drivers 9.0 for TensorFlow pip (Debian 9.7)

I installed Nvidia drivers 9.1 on my Debian 9.7 (Dataproc) when I try to run TensorFlow 1.9 via this test script it fails: Used this guide to install GPU Drivers: https://cloud.google.com/dataproc/docs/concepts/compute/gpus Used pip install…
gogasca
  • 313
  • 2
  • 15
0
votes
0 answers

Checking GPU firmware

In a solution of GPU in Cloud (with OpenStack) where the VMs can access the graphic cards via PCI-passthrough, we want to be sure no malicious person changed the firmware of the GPU from inside a VM. A potential solution we came up with was to use…
J. Chorin
  • 41
  • 3
0
votes
2 answers

"Too many levels of symbolic links" in NFS via automount resolved by restarting Docker

This is bizarre and while I have a workaround, I'd prefer a permanent fix. I have a small group of GPU machines running Ubuntu 14.04 which I am using as workers for a cloud service that's effected via Docker images. I have nvidia-docker installed on…
krivard
  • 182
  • 2
  • 9
0
votes
1 answer

yum install kmod-nvidia - kernel issue

Impossible to install NVIDIA driver on CentOS release CentOS Linux release 7.3.1611 (Core), the package kmod-nvidia gives errors and kernel incompatibilities. Usually installed with yum install kmod-nvidia -y Current output: sudo yum install…
Kevin Lemaire
  • 135
  • 2
  • 10
0
votes
1 answer

Reverting yum update

I needed to update NVidia driver on a CentOS 6.9 and decided to update a bit more. So I did sudo yum update and rebooted. Unfortunately that caused problems with NVidia that were worse than before. I am able to login only remotely now, and…
Michael
  • 1,723
  • 2
  • 11
  • 7
0
votes
0 answers

Can't kill an process on GPU

i have an process running on an K80 GPU. Is there an way to stop it with the nvidia tools? I tried all the kill -9 etc. Nothing is killing it. $uname -a Linux slurm10 3.16.0-33-generic #44~14.04.1-Ubuntu SMP Fri Mar 13 10:33:29 UTC 2015 x86_64…
PlagTag
  • 233
  • 1
  • 3
  • 9