Questions tagged [nvidia]

An American global technology company based in Santa Clara, California, best known for its graphics processors (GPUs).

62 questions
14
votes
1 answer

What are actual Tesla M60 models used by AWS?

Wikipedia says that the Tesla M60 has 2x8 GB RAM (whatever it means) and TDP 225–300 W. I use an EC2 instance (g3s.xlarge) which is supposed to have a Tesla M60. But nvidia-smi command says it has 8GB ram and max power limit 150W: > sudo…
hans
  • 242
  • 2
  • 8
7
votes
1 answer

Google Kubernetes Engine node pool does not autoscale from 0 nodes

I am trying to run a machine learning job on GKE, and need to use a GPU. I created a node pool with Tesla K80, as described in this walkthrough. I set the minimum node size to 0, and hoped that the autoscaler would automatically determine how many…
5
votes
1 answer

Why is my CUDA GPU-Util ~70% when there are "No running processes found"?

After configuring a system with 2 Tesla K80 cards, I noticed when running nvidia-smi that one of the 4 GPUs was under heavy load despite there being "No running processes found". Why is this happening and how do I correct this? Here is the output…
4
votes
0 answers

Erase GPU memory

We have Nvidia GPU cards that can be used by different users in an OpenStack environment. A first user creates a VM with access to a GPU card, then deletes the VM when done. Another user then creates a VM which is given access to the same card.…
4
votes
2 answers

8 GPU machine freezes

We have a SuperMicro GPU server with: 2x Intel(R) Xeon(R) CPU E5-2660 v4 @ 2.00GHz 512GB memory more than enough disk space X10DRG-O+-CPU (BIOS Version : 2.0a [current]) X9DRG-O-PCIE PCI-E expander card 8x GTX 1080 It is setup with Ubuntu 16.04.1…
pks
  • 41
  • 3
3
votes
2 answers

NVIDIA-SMI can't communicate with NVIDIA driver

Problem description I am trying to set up a centos-7 GPU (Nvidia Tesla K80) instance on Google Cloud, to execute CUDA work. Unfortunately, I can't seem to properly install/configure drivers. Indeed, here is what happens when trying to interact with…
2
votes
0 answers

"Getting devices ready" on Windows 10 while booting VM/iSCSI on another machine than initially set up

TL;DR version: virtual Windows instance reinstalls GPU drivers while switching to other hosts despite the fact it's getting the same hardware all the time. I'm trying to avoid it / shorten its time Full version: I've got an iSCSI server (Windows…
Domel
  • 21
  • 4
2
votes
1 answer

Access Denied on NVIDIA GRID 7.2 Driver

I am trying to set up an NVIDIA Tesla T4 GPU and use its RTX functionality in a raytracing application (Bakery for Unity3D). But every time I launch the app, Bakery tells me it could not find the OptiX library. I believe to have tracked it down to…
2
votes
1 answer

Failed to initialize NVML: Unknown Error - Not able to complete NVIDIA Tesla P100 Grid Setup on the vSphere Host Server with Vmware ESXI 6.7

I am unable to setup the NVIDIA Tesla P100 Grid Setup on the vSphere Host Server with Vmware ESXI 6.7 on DELL EMC poweredge R740. When I am trying to run nvidia-smi command I am getting following error Failed to initialize NVML: Unknown…
2
votes
0 answers

Specify a GPU to use at launch

I am currently working with an Azure GPU VM (NV6 using M60 Nvidia Graphic card) I'm doing my benchmark on this VM without any issue for the moment. Now I'm doing the same benchmark on a NV12 which has 2 GPU (or at least Windows server sees it as 2…
Turgal
  • 121
  • 1
2
votes
4 answers

Nvidia driver breaks vncserver on CentOS 7.4, is there a work around?

CentOS Linux release 7.4.1708 (Core) uname -r output: 3.10.0-693.2.2.el7.x86_64 NVidia driver: NVIDIA-Linux-x86_64-375.66.run When using the Nvidia graphics card driver with the Nvidia GeForce GT 720 graphics card on CentOS 7.4 it works fine for…
Edward_178118
  • 895
  • 4
  • 14
  • 30
2
votes
1 answer

Installing NVIDIA Drivers for Diskless Environment

I'm trying to set up a cluster of 8 computers plus a main file server. Ideally, I'd like to set this up in a pxe-boot, quasi-diskless/quasi-stateless environment (i.e. the only local storage is /var, where things like torque configuration will go).…
2
votes
2 answers

Install Display Card In ProLiant DL580 Gen8 Server

We have a ProLiant DL580 Gen8 Server and want to install Gigabyte GForce GTX 980 ti Display Card in PCIE slot, When we connect 8 pins sockets power, server could not turn on, and when power socket not connected, server starts but the graphic card…
MTSS
  • 123
  • 5
2
votes
1 answer

Executing Cuda script in LXC container results in "cuda error: no CUDA-capable device is detected"

I followed the following instructions in order to set up Cuda inside an LXC container. When I try to execute the sample ./deviceQuery script inside the container following error is returned: $ ./deviceQuery ./deviceQuery Starting... CUDA Device…
Greg
  • 1,557
  • 5
  • 24
  • 35
2
votes
0 answers

libGL error: dlopen /usr/lib64/dri/nouveau_dri.so failed on CentOS 6.6

I'm having problems using the nouveau driver for my Nvidia GeForce 9100. Xorg starts up and works fine, I am able to use everything, although in /var/log/Xorg.0.log I have: $ cat /var/log/Xorg.0.log | grep EE [ 36.166] (EE) AIGLX error: dlopen…
Leo
  • 121
  • 2
1
2 3 4 5