2

I have a bunch of libvirt-lxc containers whose configuration I migrated from Debian jessie to a fresh Debian buster host. I re-created the rootfs’ for the containers using lxc-create -t debian -- --release buster and later remapped the uid/gid numbers of the rootfs with a script I know to work correctly.

The container configuration looks like this:

<domain type='lxc'>
  <name>some-container</name>
  <uuid>1dbc80cf-e287-43cb-97ad-d4bdb662ca43</uuid>
  <title>Some Container</title>
  <memory unit='KiB'>2097152</memory>
  <currentMemory unit='KiB'>2097152</currentMemory>
  <memtune>
    <swap_hard_limit unit='KiB'>2306867</swap_hard_limit>
  </memtune>
  <vcpu placement='static'>1</vcpu>
  <resource>
    <partition>/machine</partition>
  </resource>
  <os>
    <type arch='x86_64'>exe</type>
    <init>/sbin/init</init>
  </os>
  <idmap>
    <uid start='0' target='200000' count='65535'/>
    <gid start='0' target='200000' count='65535'/>
  </idmap>
  <clock offset='utc'/>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>destroy</on_crash>
  <devices>
    <emulator>/usr/lib/libvirt/libvirt_lxc</emulator>
    <filesystem type='mount' accessmode='passthrough'>
      <source dir='/var/lib/lxc/some-container/rootfs/'/>
      <target dir='/'/>
    </filesystem>
    <filesystem type='mount' accessmode='passthrough'>
      <source dir='/var/www/some-container/static/'/>
      <target dir='/var/www/some-container/static/'/>
    </filesystem>
    <interface type='bridge'>
      <mac address='52:54:00:a1:98:03'/>
      <source bridge='guests0'/>
      <ip address='192.0.2.3' family='ipv4' prefix='24'/>
      <ip address='2001:db8::3' family='ipv6' prefix='112'/>
      <route family='ipv4' address='0.0.0.0' prefix='0' gateway='192.0.2.1'/>
      <route family='ipv6' address='2000::' prefix='3' gateway='fe80::1'/>
      <target dev='vcontainer0'/>
      <guest dev='eth0'/>
    </interface>
    <console type='pty' tty='/dev/pts/21'>
      <source path='/dev/pts/21'/>
      <target type='lxc' port='0'/>
      <alias name='console0'/>
    </console>
    <hostdev mode='capabilities' type='misc'>
      <source>
        <char>/dev/net/tun</char>
      </source>
    </hostdev>
  </devices>
</domain>

(IP addresses have been changed to use the documentation/example IPv4/IPv6 prefixes.) The mountpoints exist and are prepared. I have about 15 containers similar to this. The following things happen:

  • When the host is freshly booted, I can either:

    • start a container with user namespacing, and then only containers without user namespacing
    • start a container without user namespacing, and then no containers with user namespacing

When I run virsh -c lxc:/// start some-container after any other container is already started, libvirt claims to have started the container:

# virsh -c lxc:/// start some-container
Domain some-container started

It also shows as running in the virsh -c lxc:/// list output, but there is no process running under the root UID of the container. Running systemctl restart libvirtd makes libvirt recognize that the container is actually dead and mark it as shut off in virsh -c lxc:/// list again.

When looking into the libvirt logs, I can’t find anything useful:

2019-05-09 15:38:38.264+0000: starting up
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin LIBVIRT_DEBUG=4 LIBVIRT_LOG_OUTPUTS=4:stderr /usr/lib/libvirt/libvirt_lxc --name some-container --console 25 --security=apparmor --handshake 52 --veth vnet0
PATH=/bin:/sbin TERM=linux container=lxc-libvirt HOME=/ container_uuid=1dbc80cf-e287-43cb-97ad-d4bdb662ca43 LIBVIRT_LXC_UUID=1dbc80cf-e287-43cb-97ad-d4bdb662ca43 LIBVIRT_LXC_NAME=some-container /sbin/init

(NB: I tried with and without apparmor)

I became quite desperate and attached strace with strace -ff -o somedir/foo -p to libvirtd and then started a container. After a lot of digging, I found out that libvirt starts /sbin/init inside the container, which then quickly exits with status code 255. This is after an EACCESS upon doing something with cgroups:

openat(AT_FDCWD, "/sys/fs/cgroup/systemd/system.slice/libvirtd.service/init.scope/cgroup.procs", O_WRONLY|O_NOCTTY|O_CLOEXEC) = -1 EACCES (Permission denied)
writev(3, [{iov_base="\33[0;1;31m", iov_len=9}, {iov_base="Failed to create /system.slice/l"..., iov_len=91}, {iov_base="\33[0m", iov_len=4}, {iov_base="\n", iov_len=1}], 4) = 105
epoll_ctl(4, EPOLL_CTL_DEL, 5, NULL)    = 0
close(5)                                = 0
close(4)                                = 0
writev(3, [{iov_base="\33[0;1;31m", iov_len=9}, {iov_base="Failed to allocate manager objec"..., iov_len=52}, {iov_base="\33[0m", iov_len=4}, {iov_base="\n", iov_len=1}], 4) = 66
openat(AT_FDCWD, "/dev/console", O_WRONLY|O_NOCTTY|O_CLOEXEC) = 4
ioctl(4, TCGETS, {B38400 opost isig icanon echo ...}) = 0
ioctl(4, TIOCGWINSZ, {ws_row=0, ws_col=0, ws_xpixel=0, ws_ypixel=0}) = 0
writev(4, [{iov_base="[", iov_len=1}, {iov_base="\33[0;1;31m!!!!!!\33[0m", iov_len=19}, {iov_base="] ", iov_len=2}, {iov_base="Failed to allocate manager objec"..., iov_len=34}, {iov_base="\n", iov_len=1}], 5) = 57
close(4)                                = 0
writev(3, [{iov_base="\33[0;1;31m", iov_len=9}, {iov_base="Exiting PID 1...", iov_len=16}, {iov_base="\33[0m", iov_len=4}, {iov_base="\n", iov_len=1}], 4) = 30
exit_group(255)                         = ?
+++ exited with 255 +++

Digging further, I figured that libvirt is not creating a Cgroup namespace for the containers, and apparently they all try to use the same cgroup path. With that, the behaviour makes sense: If the first container which is started is user-namespaced, it takes ownership of the cgroup subtree and other user-namespaced containers cannot use it. The non-user-namespaced containers can simply take over the cgroup tree because they run as UID 0.

The question is now: why are the cgroups configured incorrectly? Is it a libvirt bug? Is it a misconfiguration on my system?

Jonas Schäfer
  • 295
  • 1
  • 11

1 Answers1

1

I came up with the idea of trying to use separate <partition/>s for each container, to try to isolate from one another.

When I tried that, I got the following error:

error: internal error: guest failed to start: Failure in libvirt_lxc startup: Failed to create v1 controller cpu for group: No such file or directory

And that was actually familiar. I once opened an invalid bugreport because of this.

This error is caused by libvirt not detecting systemd correctly, which is in turn caused by systemd-container not being installed. The fix is:

apt install systemd-container

That fixes both the original issue and my attempt to work around it.

Jonas Schäfer
  • 295
  • 1
  • 11