0

I am doing some security research on Kubernetes and I found something still mysterious to me, concerning capabilities.

Example of simple pod:

apiVersion: v1
kind: Pod
metadata:
  name: my-pod-httpd
spec:
  containers:
  - name: my-pod-httpd-c1
    image: httpd:2.4
    command: ["/bin/sh"]
    args: ["-c", 'sleep 60m']
    imagePullPolicy: IfNotPresent
    name: httpd
  restartPolicy: Always

By default, in the container (Running with UID 0):

cat /proc/1/status | grep Cap

CapInh: 00000000a80425fb
CapPrm: 00000000a80425fb
CapEff: 00000000a80425fb
CapBnd: 00000000a80425fb
CapAmb: 00000000a80425fb

With this pod, I try to add a capability (SETUID) but also running with specific UID/GID:

apiVersion: v1
kind: Pod
metadata:
  name: my-pod-httpd-2
spec:
  containers:
  - name: my-pod-httpd-2-c1
    image: httpd:2.4
    command: ["/bin/sh"]
    args: ["-c", 'sleep 60m']
    imagePullPolicy: IfNotPresent
    securityContext:
      runAsUser: 33
      runAsGroup: 33
      capabilities:
        add: ["SETUID"]
    name: httpd
  restartPolicy: Always

But, when I checked the capabilities, here what I get:

cat /proc/1/status | grep Cap

CapInh: 0000000000000000
CapPrm: 0000000000000000
CapEff: 0000000000000000
CapBnd: 00000000a80425fb
CapAmb: 0000000000000000

Why is my added capability not here and everything has been dropped? Any idea?

UndercoverDog
  • 612
  • 2
  • 17
cactuschibre
  • 155
  • 9

1 Answers1

1

So there's a couple of things at play here. First thing is that CAP_SETUID is in the default set for Docker/Containerd environments, so generally in Kubernetes you'll have that and adding it won't change the set you've got. We can demonstrate using amicontained which is a bit easier to read than the output of /proc/1/status :)

In capabilities list you'll see setuid :

kubectl run -it test2 --image=raesene/alpine-containertools /bin/bash
If you don't see a command prompt, try pressing enter.
bash-5.1# amicontained
Container Runtime: kube
Has Namespaces:
        pid: true
        user: false
AppArmor Profile: docker-default (enforce)
Capabilities:
        BOUNDING -> chown dac_override fowner fsetid kill setgid setuid setpcap net_bind_service net_raw sys_chroot mknod audit_write setfcap
Seccomp: disabled
Blocked Syscalls (23):
        MSGRCV SYSLOG SETSID VHANGUP PIVOT_ROOT ACCT SETTIMEOFDAY UMOUNT2 SWAPON SWAPOFF REBOOT SETHOSTNAME SETDOMAINNAME INIT_MODULE DELETE_MODULE LOOKUP_DCOOKIE KEXEC_LOAD PERF_EVENT_OPEN FANOTIFY_INIT OPEN_BY_HANDLE_AT FINIT_MODULE KEXEC_FILE_LOAD BPF

Then we've got the piece of the different types of capabilities, which is a bit... complex. You'll notice in your second example CapBnd is the same as it was when you ran as root. That's because while the container has those capabilities (including SETUID) you're not running as root so you don't have them, so they're in your bounding set but not your effective set.

If you had a program in that container which had the capabilities assigned to the file, they'd work just fine (in the confines of the container ofc).

There's a good longer read on capabilities in general here and capabilities in practice here and a piece that covers file capabilities in containers more specifically here

Rory McCune
  • 60,923
  • 14
  • 136
  • 217
  • Ok, thank you very much. I didn't know amicontained, very useful :) ... I understand caps on specific executable file, fine ... but what's the point to add or drop capabilities on containers if they are not effective if the user is not root ? No effect at all ? Also, should I understand that, for exemple, if CAP_CHOWN is active and the user is not "root", he is capable of using `chown` but only on files he has permissions ? – cactuschibre Sep 16 '22 at 08:00
  • So if you're running as non-root you're best to drop all capabilities as you won't be able to use them anyway :) There is an exception to that, which is that you can give a program capabilities and the it can use them *if* the container it's running in has them. – Rory McCune Sep 16 '22 at 08:06
  • Ok, I think I understand ... There is a kind of heritage, which is logical. I modified your container "no-root", but I tried to add `CAP_SYS_ADMIN` to `/bin/busybox`, but I let default capabilities to the container. The container is running "fine", but the "tester" is not able to use `/bin/busybox` features ("Operation not permitted" on ls, chmod, chown, etc.). It is a very interesting area of research ... I will more deeply read your articles, thanks. – cactuschibre Sep 16 '22 at 08:21
  • 1
    cool :) if you want a container that has those tools and runs as a non-root user there's `raesene/alpine-noroot-containertools` which has a setuid bash shell for privilege escalation as well https://github.com/raesene/alpine-noroot-containertools – Rory McCune Sep 16 '22 at 09:14