4

My Docker Daemon seems to ignore /etc/docker/daemon.json on boot.

Similar to this question, I'm having some troubles telling the Docker daemon that it should not use the default 172.17.* range. That range is already claimed by our VPN and prevents people connected through that VPN from making a connection to the server Docker runs on.

The hugely annoying thing is that every time I reboot my server, Docker claims an IP from the VPN's range again, regardless of what I put in /etc/docker/daemon.json. I have to manually issue

# systemctl restart docker

directly after boot before people on the 172.17.* network can reach the server again.

This obviously gets forgotten quite often and leads to many problem tickets.

My /etc/docker/daemon.json looks like this:

{
 "default-address-pools": [
   {
      "base": "172.20.0.0/16",
      "size": 24
   }
 ]
}

and is permissioned like so:

-rw-r--r--   1 root root   123 Dec  8 10:43 daemon.json

I have no idea how to even start diagnosing this problem; any ideas?

For completeness:

  • Ubuntu 18.04.5 LTS
  • Docker version 19.03.6, build 369ce74a3c

EDIT: output of systemctl cat docker:

# /lib/systemd/system/docker.service
[Unit]
Description=Docker Application Container Engine
Documentation=https://docs.docker.com
After=network-online.target firewalld.service containerd.service
Wants=network-online.target
Requires=docker.socket
Wants=containerd.service

[Service]
Type=notify
# the default is not to use systemd for cgroups because the delegate issues still
# exists and systemd currently does not support the cgroup feature set required
# for containers run by docker
ExecStart=/usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock
ExecReload=/bin/kill -s HUP $MAINPID
TimeoutSec=0
RestartSec=2
Restart=always

# Note that StartLimit* options were moved from "Service" to "Unit" in systemd 229.
# Both the old, and new location are accepted by systemd 229 and up, so using the old location
# to make them work for either version of systemd.
StartLimitBurst=3

# Note that StartLimitInterval was renamed to StartLimitIntervalSec in systemd 230.
# Both the old, and new name are accepted by systemd 230 and up, so using the old name to make
# this option work for either version of systemd.
StartLimitInterval=60s

# Having non-zero Limit*s causes performance problems due to accounting overhead
# in the kernel. We recommend using cgroups to do container-local accounting.
LimitNOFILE=infinity
LimitNPROC=infinity
LimitCORE=infinity

# Comment TasksMax if your systemd version does not support it.
# Only systemd 226 and above support this option.
TasksMax=infinity

# set delegate yes so that systemd does not reset the cgroups of docker containers
Delegate=yes

# kill only the docker process, not all processes in the cgroup
KillMode=process

[Install]
WantedBy=multi-user.target

Output of sudo docker info (after systemctl restart docker):

Client:
 Debug Mode: false

Server:
 Containers: 34
  Running: 19
  Paused: 0
  Stopped: 15
 Images: 589
 Server Version: 19.03.6
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Native Overlay Diff: true
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 
 runc version: 
 init version: 
 Security Options:
  apparmor
  seccomp
   Profile: default
 Kernel Version: 4.15.0-140-generic
 Operating System: Ubuntu 18.04.5 LTS
 OSType: linux
 Architecture: x86_64
 CPUs: 16
 Total Memory: 47.16GiB
 Name: linuxsrv
 ID: <redacted>
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Username: <redacted>
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Registry Mirrors:
  http://172.16.30.33:6000/
 Live Restore Enabled: false

WARNING: No swap limit support
Rody Oldenhuis
  • 602
  • 4
  • 13

2 Answers2

1

There are multiple address pools used by docker. The default-address-pools applies to all new user created bridge networks. It's possible you'll need to delete and recreate those networks after changing this setting.

There's also bip, set in the daemon.json file with a line like:

"bip": "192.168.63.1/24"

The bip setting applies to the default bridge network named bridge and needs to be set to the CIDR for the gateway on that bridge network (so you can't define it to 192.168.63.0/24, the trailing .1 was important).

And if you are using swarm mode, overlay networks have their own address pools shared across nodes in the overlay network. That needs to be configured during docker swarm init with the --default-addr-pool flag.

Lastly if you are running docker via snap, the location of this file is /var/snap/docker/current/etc/docker/daemon.json and it doesn't appear that is preserved across updates, so you'll need to replace this file again after an update.

BMitch
  • 5,189
  • 1
  • 21
  • 30
  • I'm not using swarm nor have installed it via snap, so I'm giving the `bip` option a go. Although [the docs](https://docs.docker.com/network/bridge/) say that it's "not recommended for production use"... – Rody Oldenhuis Apr 07 '21 at 23:56
  • @RodyOldenhuis that's correct, using the network is not recommended. But it's the default so docker always creates it even if you don't use it. Which means it will have an IP CIDR block assigned and routing table entry added to the host. Unfortunately you have to set each of the ranges for different parts of docker separately. – BMitch Apr 07 '21 at 23:59
  • Unfortunately, that appears to be ineffective...Does my edit above tell you anything? – Rody Oldenhuis Apr 08 '21 at 01:17
  • @RodyOldenhuis do newly created networks stay in the correct range? Did you cleanup networks in the wrong range? Please include how you are creating networks, and the output of `docker info` in the question. – BMitch Apr 08 '21 at 01:57
  • See my most recent edits. Note that this is _after_ restarting the daemon; since we only have the 1 server and ~20 people's workday depend on it, I'm rather reluctant to reboot it. Anyway, if I understand you correctly, running `docker run ` or `docker-compose up ` does create networks in the correct range. Related observation: none of my `restart:always` containers are started directly after reboot... – Rody Oldenhuis Apr 08 '21 at 02:29
  • @RodyOldenhuis They may have added it in a newer release, but my `docker info` includes the address pools at the end. Not sure if this means your install isn't reading the daemon.json or just needs to be updated. The next places to debug are your existing networks (did you delete the old ones?) and for containers not restarting, that should be a different question with a `docker inspect` output of the container that didn't restart. – BMitch Apr 08 '21 at 12:20
  • Thanks for your help, much appreciated. Do you have any idea whether upgrading Ubuntu 18.04 → 20.04 would have an effect? Its repos probably will have a more up-to-date Docker release...I've been meaning to do that anyway, so I'll just see what happens after that – Rody Oldenhuis Apr 08 '21 at 20:26
  • @RodyOldenhuis if you're not seeing newer releases of docker, that's a sign you're installing from the Ubuntu repos rather than the Docker repos: https://docs.docker.com/engine/install/ubuntu/ – BMitch Apr 09 '21 at 00:31
  • 1
    Upgrading Docker over the weekend indeed resolved this problem. Many thanks for sticking with me – Rody Oldenhuis Apr 18 '21 at 20:37
1

Although I thought I resolved the problem using BMitch's answer, I was wrong - the docker0 address was still in the wrong 172.17.*.* range after boot.

After a lot more digging, it turned out that, somehow, I had multiple versions of dockerd installed:

  1. the one you get if you install as per the docs
  2. ...the one installed via Snap ‍♂️

Apparently, the one from Snap was the one started at boot, while the other one was the one started by running sudo systemctl restart docker.

Uninstalling & purging the one from Snap that escaped (...evaded?) my attention finally solved this pesky problem.

Rody Oldenhuis
  • 602
  • 4
  • 13