freeing kernel memory leaked by nvidia driver

2

1

I'm running caffe deep learning library on a graphics card and the library does some fancy stuff like mmap'ing some huge files into memory and passing buffers from ram to graphics card back and forth. After some time I noticed quite significant memory usage when nothing heavy is running on the system (with no X server, something like 10 processes: getty, sshd, syslog-ng, bash, ...):

MemTotal:       24688288 kB
MemFree:        19112788 kB
MemAvailable:   19102240 kB
Buffers:            6632 kB
Cached:            14892 kB
SwapCached:            0 kB

To be noted, I drop caches using echo 3 > /proc/sys/vm/drop_caches. So, that is roughly 5GiB used for something. And a very close number is accounted here:

Active:          4658852 kB        <-- here
Inactive:           2312 kB 
Active(anon):    4644112 kB        <--- and here
Inactive(anon):      760 kB
Active(file):      14740 kB
Inactive(file):     1552 kB
Unevictable:        6352 kB
Mlocked:        17111149713616 kB  <-- that is also strange

While checking the idea of nvidia leaks in kernel space, I found lines like:

0xffffc90005562000-0xffffc900055af000  315392    os_alloc_mem+0xc2/0xf0               [nvidia]       pages=76   vmalloc  N0=76

Which confirms the leaking from nvidia driver. Is it possible to somehow clean those allocations? And, how can I drop the mlocked memory above?

Alexander Sergeev

Posted 2016-04-08T08:22:45.250

Reputation: 131

1You'd probably have to rmmod it then insmod it back. – LawrenceC – 2016-04-08T13:48:38.597

How were you able to investigate this issue? I think I'm experiencing similar problem, thus I would like to confirm its root cause. – luka5z – 2016-05-20T21:44:34.933

@luka5z Check /proc/vmallocinfo. You might see excessive number of memory allocations made by nvidia module (see the question text). Note that I'm not saying there should be none of them, but you can sum up allocation sizes (for nvidia module) to see if the total usage is resonable. – Alexander Sergeev – 2016-05-21T07:35:18.233

Answers

1

Apparently, that was a bug in the nvidia driver. After updating the driver from 361.18-r4 up to 364.15 I'm no longer able to reproduce the problem. So, I'm considering the update fixes memory leaking.

Alexander Sergeev

Posted 2016-04-08T08:22:45.250

Reputation: 131

How were you able to install the 364.15 driver in linux? The latest version I can get for Linux from the NVIDIA website is still 361.45.11 released on 2016.5.24. – ypx – 2016-05-31T16:37:26.587

1

@ypx on gentoo it is much easier (see https://packages.gentoo.org/packages/x11-drivers/nvidia-drivers). The last available driver (for now) could be found here: http://us.download.nvidia.com/XFree86/Linux-x86_64/367.18/NVIDIA-Linux-x86_64-367.18.run

– Alexander Sergeev – 2016-05-31T16:45:56.230

Thanks, I found this list of downloads http://www.nvidia.com/object/unix.html and got driver 364.19 from their short lived branch. I think it's gone now...or not as noticeable as before...will see if it holds up after longer runs. But much happier now :D

– ypx – 2016-05-31T16:54:37.687