0

This is bizarre and while I have a workaround, I'd prefer a permanent fix.

I have a small group of GPU machines running Ubuntu 14.04 which I am using as workers for a cloud service that's effected via Docker images. I have nvidia-docker installed on all the worker machines, so that docker has access to the GPUs. The worker machines also function as individual servers which lab members can do experiments on directly (academic environment, the cloud service is experimental, etc). For the latter purpose, all the machines automount individual user shares over NFS. We recently switched to automount from a static fstab configuration, and I'm still getting used to it -- it's entirely possible there's some obvious issue at play here I'm not seeing because I'm an automount n00b. Finally, I haven't set anything up for docker images to be able to access the NFS shares, so in theory there should be no connection... in theory.

This week one of our lab members reported the Too many levels of symbolic links error when attempting to access their share drive from one of the GPU machines. They're not using docker at all (to their knowledge). There are no questionable symbolic links in their tree (via find -type l), so it has to be something else getting into a weird state. The mount point looks like this under ls -l from the parent directory:

dr-xr-xr-x 2 root root 0 Dec 5 18:38 labmember1

which seems... bad? root:root 555, really? and when you try to browse it you get, indeed:

$ cd /path/to/labmember1/
-bash: cd: /path/to/labmember1/: Too many levels of symbolic links

The share doesn't seem to actually be mounted -- it does not appear in /etc/mtab, and (predictably) attempts to unmount it manually report:

$ sudo umount /path/to/labmember1/
umount: /path/to/labmember1/: not mounted

Restarting autofs (service autofs restart) did nothing.

What I thought was unrelated at the time: docker had been spewing veth interfaces everywhere. This was a machine being actively used as a cloud worker, so I figured it was our cloud software. Now I'm not so sure.

Today the Too many levels of symbolic links failure occurred on another GPU machine, which has docker/nvidia-docker installed but does not run the cloud worker software. Lo and behold, veth interfaces everywhere, though in far fewer numbers than on the cloud worker machine.

On a whim, I stopped the docker service (service docker stop). Magic! The share mounts normally and our lab member can use their stuff again. The share remains in working condition after starting docker back up again.

So I can clearly fix this issue by restarting docker if(when) it happens again, but I'd like to know

  1. what is causing this in the first place? or, how can I find out?
  2. is there a way to prevent this from happening again, or am I stuck just fixing it every time it breaks?
krivard
  • 182
  • 2
  • 9
  • Was any explanation found for this behavior? I'm seeing the same things: I installed nvidia-docker on an Ubuntu 16.04 machine and restarting docker fixes the "Too many levels of symbolic links" problem. – Randall Radmer Feb 26 '19 at 23:15

2 Answers2

-1

How have you defined mount options for autofs on /etc/auto.master, are you doing direct or indirect automount?

Also did you autofs entirely within a Docker container with the --privileged option added to the docker run command? Using this approach you should be able to perform NFS mounts without any issues.

Please note that bind mounting an autofs mount into a container with an independent autofs daemon running can't be done because it may conflict with the autofs daemon running in the originating namespace.

For indirect mounts, running autofs in the root namespace provides automounting for Docker containers by binding the autofs top level mounts into containers with the Docker volume option should function mostly as expected.

Mika Wolf
  • 169
  • 3
  • > into containers with the Docker volume option should function mostly as expected No, it does not. – sgohl May 17 '19 at 12:55
-1
root@slave2:~# cd /home/client/

-bash: cd: /home/client/: Too many levels of symbolic links

solution:

root@slave2:~# cd /home/

root@slave2:/home# umount client

and then remount the file path.

J. Scott Elblein
  • 167
  • 1
  • 11
  • If you read the question, you'll find that /home/client is not actually mounted in this case, and umount fails. – krivard Jul 10 '19 at 20:17