I have a need to create namespaces inside a Docker container. And as part of this, I will need to mount a /proc
private to the inner namespace. I realize that I will have to run the container with certain privileges to make this happen, but I would prefer to enable the most minimal set.
This works:
$ sudo docker run --privileged --security-opt=seccomp=unconfined \
-it fedora:rawhide /usr/bin/unshare -Ufmp -r \
/bin/sh -c 'mount -t proc proc /proc'
This doesn't:
$ sudo docker run --cap-add=sys_admin --security-opt=seccomp=unconfined \
-it fedora:rawhide /usr/bin/unshare -Ufmp -r \
/bin/sh -c 'mount -t proc proc /proc'
mount: /proc: cannot mount proc read-only.
So, just turning off seccomp filters and adding CAP_SYS_ADMIN
isn't enough. What is enough?
Update: Selinux is a part of the problem. If you turn off selinux enforcement globally, it works. But, you can also turn off enforcement for a particular container with --security-opt label:disable
, and this is documented in the security configuration section of the online Docker manual:
sudo docker run --cap-add=sys_admin --security-opt label:disable \
-it fedora:rawhide /usr/bin/unshare -fmp /bin/sh -c \
'mount --make-private / ; mount -t proc proc /proc'
But that fails if the -U
and -r
flags are added back to unshare
. And, of course, adding --privileged
to the docker run command works just fine even with the -U
and -r
flags.
I'm currently trying to use the kernel tracing stuff to figure out what, exactly, is giving me an EPERM. It's a very unhelpfully unspecific error to get.