I've been trying to get Docker installed and running on a Windows VM to get a better understanding of the runtime for downstream work, and I'm running into the issues starting the hello-world
container.
Environment:
- VMWare virtual hardware:
- 4 GB RAM
- Intel Xenon CPU (2 cores)
- Windows Server 2016 Standard (Version 1607)
- Some antivirus and firewall considerations (I'm getting more info on those)
Output from docker version
:
Client:
Version: 17.06.2-ee-6
API version: 1.30
Go version: go1.8.3
Git commit: e75fdb8
Built: Mon Nov 27 22:46:09 2017
OS/Arch: windows/amd64
Server:
Version: 17.06.2-ee-6
API version: 1.30 (minimum version 1.24)
Go version: go1.8.3
Git commit: e75fdb8
Built: Mon Nov 27 22:55:16 2017
OS/Arch: windows/amd64
Experimental: false
What's worked:
- Downloading and installing the Docker package following the script install deploy path from docs.docker.com
- Starting the Docker daemon itself.
- Downloading the layers for the
hello-world
image.
What hasn't:
Running any container. We've tried a few:
hello-world:nanoserver
hello-world:latest
microsoft/nanoserver:latest
microsoft/windowsservercore:latest
What have I already tried (with no success):
- Relaxing our Group Policy Settings
- Enabling the Hyper-V Windows Optional Component
What actually happens:
When I attempt to start a container using docker run {container-name-here}
, PowerShell hangs for a substantial amount of time (a couple of minutes) and prints the following message:
C:\Program Files\docker\docker.exe: Error response from daemon: container
{container-id-here} encountered an error during Start: failure in a
Windows system call: This operation returned because the timeout
period expired. (0x5b4).
In the docker events
log, I get the following messages at the same time:
2018-04-18T09:36:27.881680400-04:00 container create {container-id-here} (image=hello-world:nanoserver, name=confident_ardinghelli)
2018-04-18T09:36:27.883680800-04:00 container attach {container-id-here} (image=hello-world:nanoserver, name=confident_ardinghelli)
2018-04-18T09:36:28.753726900-04:00 network connect {network-id-here} (container={container-id-here}, name=nat, type=nat)
2018-04-18T09:40:21.373395500-04:00 network disconnect {network-id-here}(container={container-id-here}, name=nat, type=nat)
We get the timeout message between the network connect
and the network disconnect
.
The references I've found in my searching (here, and here) indicate that this may be an antivirus issue, but I've been unable to find any documentation on how to confirm that it's an antivirus problem or which antivirus component may be the problem short of disabling the antivirus and trying again. I'm working on getting with the folks that have access to that part of the system and trying again, I'll update with results.
So, what am I actually asking?
- Has anyone else seen this or a similar issue before? What steps were you able to take to diagnose the root cause, and what ended up being the issue in your case?
- Are there any other Docker or Windows logs I should be looking at to better diagnose the cause of the issue?
- Any other "shots in the dark" we should try? We're running out of ideas after we get through our security debug.
Update (2018-4-20):
We spoke with the security team, and went through enabling and disabling various antivirus components. When we turned off McAfee Host IPS (HIPS), we were able to start any of our containers, as expected. When we turn it back on, the containers break again! We've found an alert in the HIPS log for a denied registry read that matches up time-wise with our debug session, and we've traced that registry access back to the docker.exe process using Process Monitor from Microsoft Sysinternals. Looks like we have our culprit!
I'll report back after we add a whitelist entry for the rule and confirm the fix.