3

We're running a RHEL 7 VM with Docker to host a couple of internal services. Last time I ran docker-compose up to spin up a new Docker container, I lost my SSH connection to the server and could also not re-establish it.

A colleague was able to diagnose that the culprit was a new Docker network that was created as part of the execution of the docker-compose command and whose IP range happened to overlap with the one in which the machine I'm connecting from lies. My colleague was able to regain my access by shutting down the Docker daemon and deleting the static route created by Docker pointing to the bridge of the docker network in question. He was able to do that because he had another machine in the same subnet as the one in question at his disposal through which he could connect.

Now I have the problem that, while being able to connect to the server again, I cannot start the Docker daemon without it also reinstating that misbehaving network and cutting off my connection again. I also can't use docker network rm or docker network prune to delete the network first, however, as those commands only work if the Docker daemon is running. I do not have access to a machine with an IP that does not collide with said IP range and has the needed firewall rules to access the machine I'm trying to recover.

Is there some way to gracefully resolve this situation and get the Docker daemon running again without losing access to the machine? If necessary, it would be no problem to delete the containers related to said network. I'd be happy if I'd be able to get the machine back into a working state in the first place.

I have learned from this post that it's possible to configure which IP ranges Docker is allowed to use, which I will most certainly do once things are up and running again. Might that already solve the problem by itself or does it only affect networks created by Docker in the future?

1 Answers1

1

I found an arguably unorthodox solution, but it worked: I wrote a script that starts the Docker daemon, deletes the networks in question, logs which networks were deleted, and stops the Docker daemon again, just in case it didn't work. That way, I could have that script run in the background and execute the necessary commands even with my SSH session dropping. This way I could come back, check the logs to see if the problematic networks are gone, and finally start the docker Daemon again without getting kicked out.

It wasn't the most elegant script, so I'll have some cleanup to do, but Docker is working again.