moby: Docker can hang indefinitely waiting for a nonexistant process to pull an image.
Running docker pull will simply hang waiting for a non-existant process to download the repository.
root@ip-172-31-18-106:~# docker pull ubuntu:trusty
Repository ubuntu already being pulled by another client. Waiting.
This is the same behavior as #3115 however there is no other docker process running.
The list of running docker containers:
# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
See here for a full process tree: https://gist.github.com/tfoote/c8a30e569c911f1977e2
When this happens my process monitor fails the job after 120 minutes, which happens regularly.
An strace of the docker instance can be found here: https://gist.github.com/tfoote/1dc3905eb9c235cb5c53
it is stuck on an epoll_wait call.
Here’s all the standard info.
root@ip-172-31-18-106:~# docker version
Client version: 1.5.0
Client API version: 1.17
Go version (client): go1.4.1
Git commit (client): a8a31ef
OS/Arch (client): linux/amd64
Server version: 1.5.0
Server API version: 1.17
Go version (server): go1.4.1
Git commit (server): a8a31ef
root@ip-172-31-18-106:~# docker -D info
Containers: 132
Images: 6667
Storage Driver: aufs
Root Dir: /var/lib/docker/aufs
Backing Filesystem: extfs
Dirs: 6953
Execution Driver: native-0.2
Kernel Version: 3.13.0-44-generic
Operating System: Ubuntu 14.04.1 LTS
CPUs: 4
Total Memory: 14.69 GiB
Name: ip-172-31-18-106
ID: SZWS:VD6O:CLP2:WRAM:KWIL:47HZ:HOEY:SR6R:ZOWR:E3HG:PS7P:TCZP
Debug mode (server): false
Debug mode (client): true
Fds: 27
Goroutines: 32
EventsListeners: 0
Init Path: /usr/bin/docker
Docker Root Dir: /var/lib/docker
WARNING: No swap limit support
root@ip-172-31-18-106:~# uname -a
Linux ip-172-31-18-106 3.13.0-44-generic #73-Ubuntu SMP Tue Dec 16 00:22:43 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
It’s running on AWS.
I’m running an instance of the ROS buildfarm which can reproduce this bad state once every couple days when fully loaded running debian package builds at ~ 100% cpu load. This happens when we are preparing a major release.
I have not been able to isolate the cause in a smaller example, it has happened on multiple different repositories. Sometimes it’s the official Ubuntu repository, sometimes it’s our own custom repositories. We’ve tracked a few instances of this error recently here. When one repository is failing to pull, others work fine. All the repositories are hosted on the public docker hub.
Here’s an example of one hanging while another passes.
root@ip-172-31-18-106:~# docker pull ubuntu:saucy
Pulling repository ubuntu
^Croot@ip-172-31-18-106:~# docker pull ubuntu:saucy^C
root@ip-172-31-18-106:~# docker pull osrf/ubuntu_32bit
Pulling repository osrf/ubuntu_32bit
FATA[0000] Tag latest not found in repository osrf/ubuntu_32bit
root@ip-172-31-18-106:~# docker pull osrf/ubuntu_32bit:saucy
Pulling repository osrf/ubuntu_32bit
d6a6e4bd19d5: Download complete
Status: Image is up to date for osrf/ubuntu_32bit:saucy
As determined in #3115 this can be fixed by restarting docker. However from that issue it is expected that this issue should not happen anymore. I think there has been a regression or we’ve found another edge case.
I will keep the machine online for a few days if anyone has suggestions on what I can run to debug the isse. Otherwise I’ll have to wait for it to reoccur to be able to test any debugging.
About this issue
- Original URL
- State: closed
- Created 9 years ago
- Reactions: 3
- Comments: 92 (24 by maintainers)
Commits related to this issue
- Set idle timeouts for HTTP reads and writes in communications with the registry Otherwise, some operations can get stuck indefinitely when the remote side is unresponsive. Fixes #12823 Signed-off-b... — committed to aaronlehmann/docker by aaronlehmann 8 years ago
- Set idle timeouts for HTTP reads and writes in communications with the registry Otherwise, some operations can get stuck indefinitely when the remote side is unresponsive. Fixes #12823 Signed-off-b... — committed to tiborvass/docker by aaronlehmann 8 years ago
- Set idle timeouts for HTTP reads and writes in communications with the registry Otherwise, some operations can get stuck indefinitely when the remote side is unresponsive. Fixes #12823 Signed-off-b... — committed to aditirajagopal/docker by aaronlehmann 8 years ago
Same here (restarting docker daemon solves the issue however)
I’m seeing this happen on AWS / ECS - we do a docker pull and for some reason the network connection drops. Then our deploy is stuck since the pull hangs indefinitely.
Same problem here. This is triggered by network reconfigure of host. “Fixed” via “docker-machine restart”
After this long not even an response from the team? It can’t be only us suffering from this problem?
We have been seeing a problem with identical symptoms. Most failed builds of Helios have tests that fail due to this issue.