docker: latest 'dind' tag (19.03) gives error on Gitlab CI "failed to dial gRPC: cannot connect to the Docker daemon. Is 'docker daemon' running on this host?"
We are running a gitlab server and several gitlab-ci-runners. Today we woke up to several failed builds.
We did several tests and we found out that the most likely culprits are the newest tags of the docker:dind
and docker:git
images.
We tested with docker:18-dind
and docker:18-git
and the errors does not occur anymore.
The error is given below:
time=“2019-07-23T06:52:31Z” level=error msg=“failed to dial gRPC: cannot connect to the Docker daemon. Is ‘docker daemon’ running on this host?: dial tcp 172.17.0.3:2375: connect: connection refused”
The gitlab-runners are running in privileged mode.
EDIT: This is not a bug or unresolved issue: see: https://github.com/docker-library/docker/issues/170#issuecomment-514366149, https://about.gitlab.com/2019/07/31/docker-in-docker-with-docker-19-dot-03/
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Reactions: 78
- Comments: 34 (10 by maintainers)
Commits related to this issue
- work around for regression with docker:dind see https://github.com/docker-library/docker/issues/170 — committed to repomaa/slide_server by deleted user 5 years ago
- bugfix: docker:stable-dind is broken, locking to 18-dind instead. Updated all references to docker:stable-dind because the latest image pushed to stable-dind updates to Docker 19.03. Unlike 18.x, 19.... — committed to tmwack/gitlab-continuous-integration by tmwack 5 years ago
- bugfix: docker:stable-dind is broken, locking to 18-dind instead. (#30) Updated all references to docker:stable-dind because the latest image pushed to stable-dind updates to Docker 19.03. Unlike 18... — committed to Cimpress-MCP/gitlab-continuous-integration by tmwack 5 years ago
- [docker-build] fix docker registry build error (see https://github.com/docker-library/docker/issues/170) + merge cmake_review — committed to siconos/siconos by fperignon 5 years ago
- attempted fix for https://github.com/docker-library/docker/issues/170 — committed to lintol/capstone by philtweir 5 years ago
- instruct Docker not to start with TLS See https://github.com/docker-library/docker/issues/170 — committed to doowzs/DotOJ by doowzs 4 years ago
@tianon While I appreciate that using the
stable
orlatest
tags on the docker image runs the risk of breaking changes, Docker 19.03 has been in beta and RC for over 4 months and this change to the image was made just 6 days ago. I’ve been testing the docker:19.03.0-rc* images in my GitLab CI pipelines for months in preparation for the release, and didn’t run into this breaking change because it wasn’t in any of the RCs.I think it’s very poor form to introduce such a breaking change in the last few days of a major release without any notifications.
I fix my self-hosted runners (debian, runners installed using
apt-get
):And then:
@JanMikes as the person responsible for the CI runners at our company: it’s already in full effect. People are assuming the runners are broken. 😣
Best example of why going with “blanket tags” like
latest
is a no-no.Since jubel-hans comments are no longer here and I will repost the part that did the trick for me.
Adding the following variable:
Also changed the docker image tag from
stable
tostable-dind
and not sure if it was needed. edit: and after further testing it was not neededThere will not be an update in this repository to “fix” this as 19.03.0 is now released and GA and the TLS behavioral change was intentional (and applied to 19.03+ only by default to give folks two separate escape hatches to opt out – environment variable or downgrade).
See https://gitlab.com/gitlab-org/gitlab-runner/issues/4501#note_194648542 for a comment from a GitLab team member that sums up my thoughts even better than I could.
Same thing here. We reverted to the
18-dind
tag in Gitab in the meantime.@kinghuang IMHO it’s always a poor practice to introduce breaking change where it could be easily avoided. In that case we have a new feature that breaks old functionality if a variable is set to true. The problem is, that default value is true. I don’t quite understand what people doing such things have in mind. Unfortunately, it’s not the first time I see something like that in stable and broadly used open source project.
Specifying
18-dind
as tag fixed it for us for now 😃I thought I’d try setting my jenkins slaves to work correctly, using this manifest generated by the K8S plugin https://gist.github.com/REBELinBLUE/97a5c13c2589bb1f3df5a5b330718eb0
But it doesn’t seem to generate all the certificates before the job starts, I added
ls /certs/**
to the start of the job and I end up withif I add the liveness probe it seems to generate the certificates before it fully starts but then when I try to run docker commands I end up with
Error response from daemon: Client sent an HTTP request to an HTTPS server.
(yes I set the ports to 2376)In the end I have given up and just set
DOCKER_TLS_CERTDIR
to an empty value and set the ports back to 2375 but I’d like to get it working properlyTCP connection without tlsverify has been unrecommended for years.
@janw Yep, exactly the same here, everyone was all over me this morning. My fault, shouldn’t have set the Jenkins slaves to use
stable-dind
, setting to18-dind
as suggested has fixed the issue. 🤦♂️See related issue: https://gitlab.com/gitlab-org/gitlab-runner/issues/4501
GitLab now has a really nice blog post up describing the situation and how to fix it if your environment is affected: https://about.gitlab.com/2019/07/31/docker-in-docker-with-docker-19-dot-03/ 👍
Besides setting
DOCKER_HOST
to use port 2376, you need to setDOCKER_TLS_VERIFY=1
andDOCKER_CERT_PATH=/certs/client
to tell Docker to use TLS (and where to get certificates to handshake with).Also, you should only share
/certs/client
with your client containers.See also:
https://github.com/docker-library/docker/blob/d45051476babc297257df490d22cbd806f1b11e4/docker-entrypoint.sh#L22-L33
We met this error with using the stable tags of
stable
andstable-dind
images. But18-dind
work.If you are running everything locally, it’s okay though.
Me and my company got caught by this issue, that’s fine but that commit seems a bit rushed, like a big breaking change just before the release… (plz don’t revert though)