moby: ERROR: failed to do request: Head dial tcp: lookup registry- 1.docker.io: no such host, failed to solve with frontend dockerfile.v0: failed to create LLB definition

Description

I am not sure why this happens. But sometimes one of my docker build fails with:

Untagged: deploy_image:latest
Deleted: sha256:c54236c653a3178e6debc8ffbffe4d2eba4f789ee236089032b9d2feaa0daa66

#1 [internal] load build definition from deploy.Dockerfile
#1 sha256:9048cafd6715b4e09a204545a8d8ef1cef6209684ecd7e3b69a2e38510a20e81 
#1 transferring dockerfile: 45B done
#1 DONE 0.1s

#2 [internal] load .dockerignore                                                                                            
#2 sha256:ef4db4ddb70d86cffc65cd1e615cd739f059f06ea9b7b98a3f4cc38cc834b527                                                  
#2 transferring context: 34B done                                                                                           
#2 DONE 0.1s                                                                                                                
                                                                                                                            
#3 [internal] load metadata for docker.io/library/debian:11                                                                 
#3 sha256:b037bde4559b1b41e623e1a26f1831a526e95f96feca7ab049a8fc9144d77cd7                                                  
#3 ERROR: failed to do request: Head https://registry-1.docker.io/v2/library/debian/manifests/11: dial tcp: lookup registry-
1.docker.io: no such host                                                                                                   
------                                                                                                                      
 > [internal] load metadata for docker.io/library/debian:11:                                                                
------                                                                                                                      
failed to solve with frontend dockerfile.v0: failed to create LLB definition: failed to do request: Head https://registry-1.
docker.io/v2/library/debian/manifests/11: dial tcp: lookup registry-1.docker.io: no such host 

And I just run the build process again, and the error stops.

Perhaps this is related to the group of commands I use. I have another docker build on the same pipeline which does not seem to fail with this error. The only difference between them is the usage of RUN --mount. The one that uses RUN --mount fails this the error once in a while.

Steps to reproduce the issue:

  1. Use an docker file without a run with RUN --mount and another with it (for example https://github.com/moby/moby/issues/42864).
FROM debian:11
WORKDIR /
RUN --mount=type=cache,target=/cache/ /bin/bash -c "set -ex;"
  1. Create the run.sh bash file with the following contents and run it. This is the pipeline I use and where the problem happens.
    # Remove the old image while we have a reference to it
    docker rmi build_image:latest || cd .;
    
    DOCKER_BUILDKIT=1 docker build \
        --progress=plain \
        --tag build_image \
        --file build.Dockerfile \
        .
    
    docker run --rm \
        --volume root/:/root \
        build_image \
        /root/build.sh \
        .
    
    docker rmi deploy_image:latest || cd .;
    
    DOCKER_BUILDKIT=1 docker build \
        --progress=plain \
        --tag deploy_image \
        --file deploy.Dockerfile \
        .
    
    # Clean up dangling build caches from the buildkit last build cache
    # https://github.com/moby/buildkit/issues/1359
    docker builder prune --force --filter type=regular;
    
  2. After running the run.sh several times, one of them is going to on the deploy_image creation which uses the RUN --mount. The first docker build with the build_image creation (which does not have a RUN --mount) does not failed so far.

Describe the results you received: Docker image build randomly fails.

Describe the results you expected: Docker image build never failing randomly because it cannot connect to the docker tcp daemon?

Output of docker version:

$ docker version
Client: Docker Engine - Community
 Version:           19.03.13
 API version:       1.40
 Go version:        go1.13.15
 Git commit:        4484c46d9d
 Built:             Wed Sep 16 17:02:52 2020
 OS/Arch:           linux/amd64
 Experimental:      false

Server: Docker Engine - Community
 Engine:
  Version:          20.10.2
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.13.15
  Git commit:       8891c58
  Built:            Mon Dec 28 16:15:19 2020
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.4.3
  GitCommit:        269548fa27e0089a8b8278fc4fc781d7f65a939b
 runc:
  Version:          1.0.0-rc92
  GitCommit:        ff819c7e9184c13b7c2607fe6c30ae19403a7aff
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

Output of docker info:

$ docker info
Client:
 Debug Mode: false

Server:
 Containers: 1
  Running: 1
  Paused: 0
  Stopped: 0
 Images: 6
 Server Version: 20.10.2
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Native Overlay Diff: true
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 io.containerd.runtime.v1.linux runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 269548fa27e0089a8b8278fc4fc781d7f65a939b
 runc version: ff819c7e9184c13b7c2607fe6c30ae19403a7aff
 init version: de40ad0
 Security Options:
  apparmor
  seccomp
   Profile: default
 Kernel Version: 5.4.0-47-generic
 Operating System: Ubuntu 20.04.1 LTS
 OSType: linux
 Architecture: x86_64
 CPUs: 8
 Total Memory: 15.52GiB
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false

WARNING: No swap limit support
WARNING: No blkio weight support
WARNING: No blkio weight_device support

May be related to: https://github.com/moby/moby/issues/18842

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 17 (3 by maintainers)

Most upvoted comments

FWIW it’s happening to me as well when switching between work and home networks.

I’ve found it’s enough to simply restart your builder container to get it working again.

Had a similar problem (https://community.fly.io/t/deploy-errors-fly-can-t-communicate-with-its-own-docker-hub-mirror-part-2-solved/12439/3):

GOOS=linux GOARCH=amd64 go build -o soc-server -tags=release -ldflags='-X github.com/andig/evcc-cloud/main.Commit=35abdff -s -w' github.com/andig/evcc-cloud
docker build --platform linux/amd64 -t andig/evcc-cloud --push .

[+] Building 30.0s (3/3) FINISHED
 => [internal] load build definition from Dockerfile                                                                                                               0.0s
 => => transferring dockerfile: 469B                                                                                                                               0.0s
 => [internal] load .dockerignore                                                                                                                                  0.0s
 => => transferring context: 2B                                                                                                                                    0.0s
 => ERROR [internal] load metadata for docker.io/library/alpine:3.17                                                                                              30.0s
------
 > [internal] load metadata for docker.io/library/alpine:3.17:
------
Dockerfile:1
--------------------
   1 | >>> FROM alpine:3.17 as builder
   2 |     RUN apk update && apk add --no-cache git ca-certificates tzdata && update-ca-certificates
   3 |
--------------------
ERROR: failed to solve: DeadlineExceeded: DeadlineExceeded: DeadlineExceeded: alpine:3.17: failed to do request: Head "https://registry-1.docker.io/v2/library/alpine/manifests/3.17": dial tcp 52.1.184.176:443: i/o timeout

Reproducible failure. After

docker buildx rm

things started working again. Looking like an issue inside the builder to me.

Such lookup errors happen on my system after I re-connect my host (laptop) to another network that has a different DNS server. E.g. suspend the laptop and wake it up on another wifi network. Docker daemon seemingly tries to use the DNS server which it has recorded during docker service startup, it would not update its DNS settings if the host system DNS settings change suddenly.

I use colima to build & run containers. Tearing down then restarting colima did the trick for me. My builds started working and had no further connectivity issues.

It was just a network problem . I was connected to a remote machine via RDP and figured out that the machine was not connected to the internet.