2

I followed the following instructions in order to set up Cuda inside an LXC container.

When I try to execute the sample ./deviceQuery script inside the container following error is returned:

$ ./deviceQuery 
./deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

cudaGetDeviceCount returned 38
-> no CUDA-capable device is detected
Result = FAIL

Cuda is recognised and installed inside the container:

$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2015 NVIDIA Corporation
Built on Tue_Aug_11_14:27:32_CDT_2015
Cuda compilation tools, release 7.5, V7.5.17

The nVidia devices are mounted inside the "host and the LXC container:

$ ls -l /dev/nvidia*
crw-rw-rw- 1 root root 195,   0 Dec 20 23:31 /dev/nvidia0
crw-rw-rw- 1 root root 195, 255 Dec 20 23:31 /dev/nvidiactl
crw-rw-rw- 1 root root 246,   0 Dec 20 23:31 /dev/nvidia-uvm

When I run sudo nvidia-smi inside the container I get the following error:

Failed to initialize NVML: Unknown Error

How execute Cuda scripts inside containers?

Greg
  • 1,557
  • 5
  • 24
  • 35

1 Answers1

1

It looks like this question has already been asked on SuperUser, but I can only flag it as duplicate if it already exists in ServerFault. I'll copy my answer here in hopes that it helps someone who stumbles on this question first.

I had this very same issue, which I wrote about at length here.

The issue you are having may be caused by using an LXC template that doesn't match your host. I am using Proxmox 4.4, which is based on Debian 8.6. My container was based on Ubuntu 16.04. Just like you, I saw the passed nodes in the container with root as the owner and group, not nobody:nogroup as expected.

A forum post I stumbled on inspired me to build a new container based on a template that matched my host, Debian 8.6. Once I did that the /dev nodes were owned by nobody:nogroup and nvidia-smi correctly identified my GPU.

If yours don't match, I strongly recommend you try making them match - the only way I am aware of is to rebuild it.

datu-puti
  • 111
  • 3