2

I am in the process of migrating some services from Ubuntu 18.04 to 20.04. In 18.04 I run these services under a non-root user. All these services start a docker container, and they're working just fine. Under Ubuntu 20.04 these services no longer start.

To illustrate, here's a very simple ~/.config/systemd/user/hello-world.service that works fine on Ubuntu 18.04:

# -*-systemd-*-
[Unit]
Description=Hello world
After=network.service
StartLimitIntervalSec=0

[Service]
Type=simple
Restart=always
RestartSec=1
TimeoutStartSec=0

ExecStartPre=/bin/echo user = $USER
ExecStartPre=/usr/bin/docker pull hello-world
ExecStart=/usr/bin/docker run \
  --name hello-world \
  --rm -a STDIN -a STDOUT -a STDERR \
  hello-world

ExecStop=/usr/bin/docker stop -t 2 %n

[Install]
WantedBy=default.target

I run the container in the shell directly as the non-root user and it runs fine, both on the 18.04 machine, as well as on the 20.04 machine:

/usr/bin/docker pull hello-world
/usr/bin/docker run \
  --name hello-world \
  --rm -a STDIN -a STDOUT -a STDERR \
  hello-world

For systemd I run the following:

systemctl --user enable hello-world.service
systemctl --user start hello-world.service

On Ubuntu 18.04 everything runs as expected when I investigate the out with journalctl -xe -f.

On Ubuntu 20.04 I get the dreaded:

Sep 15 14:56:26 m4 docker[107614]: Got permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock: Post http://%2Fvar%2Frun%2Fdocker.sock/v1.40/images/create?fromImage=hello-world&tag=latest: dial unix /var/run/docker.sock: connect: permission denied

I checked the permissions, groups and everything seems to be correct. Again, if I run the docker directly in the command line while logged in as username, docker runs just fine.

root@m4:/etc/apt> ll /var/run/docker.sock 
srw-rw---- 1 root docker 0 Sep 15 14:08 /var/run/docker.sock=
root@m4:/etc/apt> grep docker /etc/group
docker:x:998:docker,username

The only thing that's different is that on 18.04 systemd is at version 237, while on 20.04 is at version 245.

Docker is the same on both machines:

Docker version 19.03.12, build 48a66213fe

Both versions of systemd show the user echoed in ExecStartPre as being my non-root user.

It looks like systemd 245 is starting the docker process under the wrong user and/or group. Any thoughts?

Update

As @larsks suggested, I replaced $USER with /usr/bin/id. Here's the output I received:

Sep 15 21:36:09 m4 id[122143]: uid=1001(username) gid=1001(username) groups=1001(username)
Sep 15 21:36:09 m4 docker[122144]: Using default tag: latest
Sep 15 21:36:09 m4 docker[122144]: Got permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock: Post http://%2Fvar%2Frun%2Fdocker.sock/v1.40/images/create?fromImage=hello-world&tag=latest: dial unix /var/run/docker.sock: connect: permission denied

username is part of the docker group, as shown above.

ovidiu
  • 121
  • 1
  • 4
  • Replace `/bin/echo user = $USER` in your systemd unit with `/usr/bin/id`. This will output your uid, primary gid, and all additional groups of which the user is a member. Update the question to include that output. – larsks Sep 16 '20 at 03:42
  • Thanks @larsks, I updated my question to use `/usr/bin/id` instead of `$USER`. @Michael, I'm starting systemctl as `username` using `systemctl --user start hello-world`. If I were to start it as root it will likely work, I already have other systemd services that start docker like that. – ovidiu Sep 16 '20 at 04:44

3 Answers3

0

Your systemd user unit doesn't specify a Group=, thus the user's default group is used. Since docker is not the default group, systemd doesn't start the process with this group.

Set Group=docker in the [Service] section of the unit.

Michael Hampton
  • 237,123
  • 42
  • 477
  • 940
  • Thanks for the suggestion @michael-hampton. Unfortunately that doesn't work: since I'm starting the systemctl process as a non-privileged user, changing groups is not allowed, even though the user is part of the group. ``` [...]: hello-world.service: Failed to connect stdout to the journal socket, ignoring: Operation not permitted [...]: hello-world.service: Changing group credentials failed: Operation not permitted [...]: hello-world.service: Failed at step GROUP spawning /usr/bin/id: Operation not permitted ``` – ovidiu Sep 16 '20 at 07:31
  • BTW, `username` is in the right group too, this is what `groups` returns just before starting `systemctl`: `uid=1001(username) gid=1001(username) groups=1001(username),998(docker)` – ovidiu Sep 16 '20 at 07:42
  • That's strange, a user unit ought to be able to set the group. You might have run into a bug in systemd, or in Ubuntu's implementation of it (why on earth are you using Ubuntu anyway?!). In any case it should certainly work as a system unit, but since you're using Ubuntu you can't easily let the user manage a system unit without using `sudo` which you might not want to allow. – Michael Hampton Sep 16 '20 at 14:51
  • Thanks @Michael. I ended up rebooting the machine, after the reboot it seem the problem went away and everything is working as expected with the original configuration. I do need `username` to be in the `docker` group, but long-running services can now be run managed as regular users instead of root. More than few hours wasted on this :( – ovidiu Sep 16 '20 at 17:11
  • As to why I'm using Ubuntu: for LTS. I've been running Debian in the past, but support for older OS versions is dropped too soon IMO. I have machines where upgrading the OS is too much of a pain, especially when you run everything in a container that's kept up-to-date. – ovidiu Sep 16 '20 at 17:14
0

Looking at the example docker container unit file I believe what is missing is at least:

After=docker.socket in the [Unit] section of the unit file and Delegate=yes in the [Service] section so that

so that systemd does not reset the cgroups of docker containers

Setting:

KillMode=process

so that

kill only the docker process, not all processes in the cgroup

seems like a good idea to me as well. I would recommend to take a look at the linked example and configure the unit file accordingly.

Henrik Pingel
  • 8,676
  • 2
  • 24
  • 38
0

It turns out an old-fashioned reboot fixed the problem. :(

Before that I had tried restarting systemd which didn't fix the problem. I still don't know what happened, probably I hit some bug in the kernel and systemd configuration.

Kernel: 5.4.0-47-generic #51-Ubuntu

Systemd:

systemd 245 (245.4-4ubuntu3.2)
+PAM +AUDIT +SELINUX +IMA +APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD +IDN2 -IDN +PCRE2 default-hierarchy=hybrid
ovidiu
  • 121
  • 1
  • 4