moby: Ambiguous i/o timeouts

Description of problem:

I’ve had numerous people report an issue connecting to registries (on prem, Docker Hub, and Quay.io) that has been quite tricky to track down. It can begin and end at seemingly random times.

$ sudo docker pull ...
FATA[0021] Error response from daemon: v1 ping attempt failed with error: Get https://quay.io/v1/_ping: dial tcp: i/o timeout.
If this private registry supports only HTTP or HTTPS with an unknown CA certificate, please add `--insecure-registry quay.io` to the daemon's arguments.
In the case of HTTPS, if you have access to the registry's CA certificate, no need for the flag; simply place the CA certificate at /etc/docker/certs.d/quay.io/ca.crt

It doesn’t matter what API call is made (as long as it needs to connect to a registry), docker fails to establish a connection to the registry (in the case of every registry except the Docker Hub this endpoint is /v1/_ping). This problem persists despite docker daemon being restarted, but does not persist once the machine has been rebooted. Using curl to hit the endpoint works and dig resolves the domain correctly, yet the docker daemon will continue to fail connecting to the machine. This leads me to believe the issue is not related to the DNS cache.

The following data is taken from the last person reported suffering from this issue.

docker version:

$ docker version
Client version: 1.5.0
Client API version: 1.17
Go version (client): go1.4.1
Git commit (client): a8a31ef
OS/Arch (client): linux/amd64
Server version: 1.5.0
Server API version: 1.17
Go version (server): go1.4.1
Git commit (server): a8a31ef

docker info:

N/A

uname -a:

Linux Mint

$ uname -a
3.13.0-37-generic #64-Ubuntu SMP Mon Sep 22 21:28:38 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

$ cat /etc/os-release
NAME="Ubuntu"
VERSION="14.04.2 LTS, Trusty Tahr"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 14.04.2 LTS"
VERSION_ID="14.04"

Uptime for this box was only a few hours.

Environment details (AWS, VirtualBox, physical, etc.):

I’ve seen this occur specifically on version 1.5.0, build a8a31ef on Debian, Ubuntu, Amazon Linux via residential connections, GCE, and AWS. I’m not sure that this version is necessarily coupled with the issue, though.

How reproducible:

I haven’t been able to personally reproduce the issue.

Steps to Reproduce:

  1. normal docker usage
  2. docker i/o timeouts on commands that interact with registries

Actual Results:

Receive tcp i/o timeouts from a perfectly functioning registry.

Expected Results:

Never receive tcp i/o timeouts from a perfectly functioning registry.

Additional info:

See description.

About this issue

  • Original URL
  • State: closed
  • Created 9 years ago
  • Comments: 49 (20 by maintainers)

Most upvoted comments

My Solution. I got this kind of error when I’m trying to install Tensorflow in docker. Following the tutorial of tensorflow, i run the command sudo docker run -it -p 8888:8888 b.gcr.io/tensorflow/tensorflow, then i got the error Unable to find image 'gcr.io/tensorflow/tensorflow:latest' locally docker: Error response from daemon: Get https://gcr.io/v1/_ping: dial tcp 64.233.188.82:443: i/o timeout. . I guess it is because of GFW. I tried VPN but failed again. Finally, I tried pulling tensorflow image from Docker hub instead of https://gcr.io, Google Cloud Platform [https://gcr.io]. In terminal, I ran sudo docker run -it -p 8888:8888 tensorflow/tensorflow. It worked for me. Hope it provides insights for you guys!

I’m not sure I understand exactly why this could be related to IPv4/IPv6 resolution precedence issues. Our DNS setup is completely vanilla from Amazon Linux, and Quay.io does not expose AAAA records at all. Plus, our error message clearly shows an IPv4 address the i/o timeout happens on.

Our /etc/resolv.conf:

search ec2.internal
nameserver 169.254.169.253

And, as done from one of our instances:

$ host -t AAAA quay.io
quay.io has no AAAA record
$ host -t A quay.io
quay.io has address 23.21.59.93
quay.io has address 50.17.199.231
quay.io has address 107.22.188.65
quay.io has address 184.73.154.212

The only thing I can think of, is that Quay.io’s set of IP addresses changes, and when it does we somehow still try to hit one of the previous ones and it fails, so it would be more of a cache/TTL issue, either with Go’s DNS code, or with Amazon’s DNS servers?