docker: latest 'dind' tag (19.03) gives error on Gitlab CI "failed to dial gRPC: cannot connect to the Docker daemon. Is 'docker daemon' running on this host?"

We are running a gitlab server and several gitlab-ci-runners. Today we woke up to several failed builds. We did several tests and we found out that the most likely culprits are the newest tags of the docker:dind and docker:git images. We tested with docker:18-dind and docker:18-git and the errors does not occur anymore.

The error is given below:

time=“2019-07-23T06:52:31Z” level=error msg=“failed to dial gRPC: cannot connect to the Docker daemon. Is ‘docker daemon’ running on this host?: dial tcp 172.17.0.3:2375: connect: connection refused”

The gitlab-runners are running in privileged mode.

EDIT: This is not a bug or unresolved issue: see: https://github.com/docker-library/docker/issues/170#issuecomment-514366149, https://about.gitlab.com/2019/07/31/docker-in-docker-with-docker-19-dot-03/

About this issue

Original URL
State: closed
Created 5 years ago
Reactions: 78
Comments: 34 (10 by maintainers)

Commits related to this issue

work around for regression with docker:dind see https://github.com/docker-library/docker/issues/170 — committed to repomaa/slide_server by deleted user 5 years ago
bugfix: docker:stable-dind is broken, locking to 18-dind instead. Updated all references to docker:stable-dind because the latest image pushed to stable-dind updates to Docker 19.03. Unlike 18.x, 19.... — committed to tmwack/gitlab-continuous-integration by tmwack 5 years ago
bugfix: docker:stable-dind is broken, locking to 18-dind instead. (#30) Updated all references to docker:stable-dind because the latest image pushed to stable-dind updates to Docker 19.03. Unlike 18... — committed to Cimpress-MCP/gitlab-continuous-integration by tmwack 5 years ago
[docker-build] fix docker registry build error (see https://github.com/docker-library/docker/issues/170) + merge cmake_review — committed to siconos/siconos by fperignon 5 years ago
attempted fix for https://github.com/docker-library/docker/issues/170 — committed to lintol/capstone by philtweir 5 years ago
instruct Docker not to start with TLS See https://github.com/docker-library/docker/issues/170 — committed to doowzs/DotOJ by doowzs 4 years ago

Most upvoted comments

@tianon While I appreciate that using the stable or latest tags on the docker image runs the risk of breaking changes, Docker 19.03 has been in beta and RC for over 4 months and this change to the image was made just 6 days ago. I’ve been testing the docker:19.03.0-rc* images in my GitLab CI pipelines for months in preparation for the release, and didn’t run into this breaking change because it wasn’t in any of the RCs.

I think it’s very poor form to introduce such a breaking change in the last few days of a major release without any notifications.

+32

kinghuang on Jul 23, 2019

I fix my self-hosted runners (debian, runners installed using apt-get):

$ nano /etc/gitlab-runner/config.toml

[[runners]]
-  environment = ["DOCKER_DRIVER=overlay2"]
+  environment = ["DOCKER_DRIVER=overlay2","DOCKER_TLS_VERIFY=1","DOCKER_CERT_PATH=/certs/client"]
  [runners.docker]
-    tls_verify = false
    image = "docker:dind"
    privileged = true
    disable_entrypoint_overwrite = false
    oom_kill_disable = false
    disable_cache = false
-    volumes = ["/cache"]
+    volumes = ["/cache","/certs"]

And then:

$ service gitlab-runner restart

+12

tarampampam on Jul 25, 2019

@JanMikes as the person responsible for the CI runners at our company: it’s already in full effect. People are assuming the runners are broken. 😣

Best example of why going with “blanket tags” like latest is a no-no.

janw on Jul 23, 2019

Since jubel-hans comments are no longer here and I will repost the part that did the trick for me.

Adding the following variable:

  DOCKER_TLS_CERTDIR: ''

Also changed the docker image tag from stable to stable-dind and not sure if it was needed. edit: and after further testing it was not needed

peter-c-larsson on Jul 23, 2019

There will not be an update in this repository to “fix” this as 19.03.0 is now released and GA and the TLS behavioral change was intentional (and applied to 19.03+ only by default to give folks two separate escape hatches to opt out – environment variable or downgrade).

See https://gitlab.com/gitlab-org/gitlab-runner/issues/4501#note_194648542 for a comment from a GitLab team member that sums up my thoughts even better than I could.

tianon on Jul 23, 2019

Same thing here. We reverted to the 18-dind tag in Gitab in the meantime.

joch on Jul 23, 2019

@kinghuang IMHO it’s always a poor practice to introduce breaking change where it could be easily avoided. In that case we have a new feature that breaks old functionality if a variable is set to true. The problem is, that default value is true. I don’t quite understand what people doing such things have in mind. Unfortunately, it’s not the first time I see something like that in stable and broadly used open source project.

llech on Jul 23, 2019

Specifying 18-dind as tag fixed it for us for now 😃

ghost on Jul 23, 2019

I thought I’d try setting my jenkins slaves to work correctly, using this manifest generated by the K8S plugin https://gist.github.com/REBELinBLUE/97a5c13c2589bb1f3df5a5b330718eb0

But it doesn’t seem to generate all the certificates before the job starts, I added ls /certs/** to the start of the job and I end up with

/certs/ca:
cert.pem
cert.srl
key.pem

/certs/client:
key.pem

/certs/server:
ca.pem
cert.pem
csr.pem
key.pem
openssl.cnf

if I add the liveness probe it seems to generate the certificates before it fully starts but then when I try to run docker commands I end up with Error response from daemon: Client sent an HTTP request to an HTTPS server. (yes I set the ports to 2376)

In the end I have given up and just set DOCKER_TLS_CERTDIR to an empty value and set the ports back to 2375 but I’d like to get it working properly

REBELinBLUE on Jul 23, 2019

TCP connection without tlsverify has been unrecommended for years.

AkihiroSuda on Jul 23, 2019

@janw Yep, exactly the same here, everyone was all over me this morning. My fault, shouldn’t have set the Jenkins slaves to use stable-dind, setting to 18-dind as suggested has fixed the issue. 🤦‍♂️

REBELinBLUE on Jul 23, 2019

daenny on Jul 23, 2019

GitLab now has a really nice blog post up describing the situation and how to fix it if your environment is affected: https://about.gitlab.com/2019/07/31/docker-in-docker-with-docker-19-dot-03/ 👍

tianon on Jul 31, 2019

Besides setting DOCKER_HOST to use port 2376, you need to set DOCKER_TLS_VERIFY=1 and DOCKER_CERT_PATH=/certs/client to tell Docker to use TLS (and where to get certificates to handshake with).

Also, you should only share /certs/client with your client containers.