How can I debug a docker container initialization?

Question

I had an issue with a container, even though it builds perfectly it does not properly start. The cause is a workaround I've added to the Dockerfile (for having a self-configured /etc/hosts routing)

RUN mkdir -p -- /lib-override /etc-override && cp /lib/libnss_files.so.2 /lib-override
ADD hosts.template /etc-override/hosts
RUN perl -pi -e 's:/etc/hosts:/etc-override/hosts:g' /lib-override/libnss_files.so.2
ENV LD_LIBRARY_PATH /lib-override

Obviously there's some error in there, but I wonder how can I get more info on what docker is doing while running. for example, this works:

$ docker run image ls
usr bin ...

But this doesn't:

$ docker run image ls -l
$

There is nothing in the logs and I can't call an interactive shell either. I can use strace to see what's happening but I was hoping theres a better way.

Is there any way I can set docker to be more verbose?

EDIT: Thanks to Andrew D. I now know what's wrong with the code above (I left it so his answer can be understood). Now the issue is still how might I debug something like this or get some insides at why ls -l failed why ls did not.

EDIT: The -D=true might give more output, though not in my case...

Please make the effort to mark one of the answers as "accepted", thanks! — Brian Topping, Dec 23 '17 at 03:04

Peter Lamberg · Accepted Answer · 2018-07-14T11:15:13.907

160

Docker events command may help and Docker logs command can fetch logs even after the image failed to start.

First start docker events in the background to see whats going on.

docker events&

Then run your failing docker run ... command. Then you should see something like the following on screen:

2015-12-22T15:13:05.503402713+02:00 xxxxxxxacd8ca86df9eac5fd5466884c0b42a06293ccff0b5101b5987f5da07d: (from xxx/xxx:latest) die

Then you can get the startup hex id from previous message or the output of the run command. Then you can use it with the logs command:

docker logs <copy the instance id from docker events messages on screen>

You should now see some output from the failed image startup.

As @alexkb suggested in a comment: docker events& can be troublesome if your container is being constantly restarted from something like AWS ECS service. In this scenario it may be easier to get the container hex id out of the logs in /var/log/ecs/ecs-agent.log.<DATE>. Then use docker logs <hex id>.

edited Jul 14 '18 at 11:15

answered Dec 22 '15 at 13:24

Peter Lamberg

1,716
1
11
6

Very helpful! New to docker and was trying to get portainer running. Solved it with these debugging steps. Found someone on Medium.com with the same problem: https://medium.com/@jameson_37151/hey-linda-had-the-same-issue-im-new-to-docker-i-was-running-30776b44d327 – Jameson Mar 12 '17 at 03:41
1

I get "container not found"!? – demented hedgehog Feb 04 '18 at 23:17
Strange. Just to make sure, @dementedhedgehog did you try copy the hex-id from the log message ending in "`(from xxx/xxx:latest) die`"? – Peter Lamberg Feb 05 '18 at 10:26
1

@dementedhedgehog Was the container deleted with the `docker run --rm` option? Can you see the container id using `docker container ls -a`? – Paul Jackson Feb 19 '18 at 18:19
Possibly, though not sure anymore. If I see it again I'll post with more details. – demented hedgehog Feb 20 '18 at 06:31
1

Thank you so much this answer, it is a life saver. Only thing to add is that `docker events&` can be troublesome if your container is being constantly restarted from something like AWS ECS service. So in this scenario it may be easier to get the container hex id out of the logs in `/var/log/ecs/ecs-agent.log.`. Then use `docker logs ` as suggested by this answer to see why things aren't booting. – alexkb Jul 13 '18 at 02:43
1

@alexkb Thanks! I added your suggestion to the end of the answer so others may find it more easily. – Peter Lamberg Jul 14 '18 at 11:16

score 25 · Answer 2 · edited Mar 20 '18 at 07:59

25

Well the best I have found out so far is:

#stop the current demon and start it in debug modus
sudo service docker stop
dockerd -D # --debug

The just start the client from a new shell. The misconception was to think the client actually does anything at all... well it's just communicating with the daemon, so you don't want to debug the client but the daemon itself (normally).

edited Mar 20 '18 at 07:59

Community

1

answered May 26 '14 at 09:07

estani

2,011
2
17
12

1

Dont get it really. But I ran your commands (with `sudo dockerd -D # --debug`) and now all my containers start as expected :) – Jono Jul 31 '22 at 11:56

score 22 · Answer 3 · answered Mar 29 '16 at 13:27

In my case, the -a (attach to STDOUT/STDERR) flag was enough:

user@machine:~$ docker start -a server_name
Error: The directory named as part of the path /log/log_path/app.log does not exist.
For help, use /usr/bin/supervisord -h

It showed the startup error (in our case, a missing log path used by supervisord). I assume most container startup errors would show up here as well.

score 4 · Answer 4 · edited May 23 '17 at 12:41

I can't answer your question on how to make docker output more complete but I can tell you that in-place regex replacing a string in a .so file is a bit insane: the string only has so much space allocated to it, and if you change the file offsets of other entries, the elf file becomes corrupted. Try running objdump or readelf on your .so file after running the perl command (before LD_LIBRARY_PATH change) outside of a container -- dollars to donuts it is now corrupt.

The reason it works in this sadly necessary hack is because "tmp" and "etc" are the same string length so no offsets change. Consider the directory /dkr or similar if you prefer not to use /tmp.

If you MUST take this approach and your desired paths are unchangeable, rebuild the library and change the default path for /etc/hosts in the source. Or better, when building your modified libnss_files.so rename it to something like libnss_altfiles.so and change nsswitch.conf to use hosts: altfiles when starting your docker container (unless docker has bind mounted nsswitch.conf as well, then you can't change it). This will let you have the libnss_altfiles.so in parallel with your normal libraries in the base system. If docker does bind-mount nsswitch.conf, leave a copy of your rebuilt libnss_files.so in your /lib-override directory ready to be loaded by LD_LIBRARY_PATH.

As a heads up, suid/sgid binaries ignore LD_LIBRARY_PATH and LD_PRELOAD, so some stuff is going to break (read: go back to using the default /etc/hosts) if you use those variables.

Thanks a lot for the great insight... I was too fast and see now what's happening. I still don't know why getting the stat needs to resolve a host (ls -l) while the simple file listing (ls), does not... — estani, May 20 '14 at 08:39

score 3 · Answer 5 · answered Mar 28 '19 at 01:30

3

Sometimes, you can find useful error messages by sshing into the node running the docker daemon and then doing:

$ tail -f /var/log/containers/* /var/log/docker.log 2>&1

On 'Docker Community edition' on Mac OS, you can connect into the docker vm by doing:

$  screen ~/Library/Containers/com.docker.docker/Data/vms/0/tty

answered Mar 28 '19 at 01:30

user674669

159
6

`tty` doesn't exist for me (macOS 11.2.3, Docker 20.10.5) – Grant Birchmeier Apr 16 '21 at 21:54

How can I debug a docker container initialization?

5 Answers5