docker: dockerd fails to start - RULE_APPEND failed (No such file or directory): rule in chain DOCKER-ISOLATION-STAGE-1

After the merge of https://github.com/docker-library/docker/pull/461, the docker containers we run as part of our CI jobs stopped working. The container fails to start with the following error:

failed to start daemon: Error initializing network controller: error obtaining controller instance: unable to add return rule in DOCKER-ISOLATION-STAGE-1 chain:  (iptables failed: iptables --wait -A DOCKER-ISOLATION-STAGE-1 -j RETURN: iptables v1.8.10 (nf_tables):  RULE_APPEND failed (No such file or directory): rule in chain DOCKER-ISOLATION-STAGE-1

The affected image is docker.io/library/docker@sha256:ae63bb7c7d3ae23884a2c5d206939640279f6d15730618192b58662a0619f182, while docker.io/library/docker@sha256:c90e58d30700470fc59bdaaf802340fd25c1db628756d7bf74e100c566ba9589 works fine. Both images are tagged as 24.0.7-dind

The environment is GKE 1.27 with Container-Optimized OS.

Workaround Use docker:24.0.7-dind-alpine3.18, as it points at the previous version of the image that was overwritten

About this issue

  • Original URL
  • State: closed
  • Created 7 months ago
  • Reactions: 30
  • Comments: 18 (6 by maintainers)

Commits related to this issue

Most upvoted comments

24.0.6-dind worked for us

@stevexuereb for us using 24.0.7 with alpine 3.18 fixed the issue.

But of course this is just a temporary workaround.

The change in #468 isn’t actually pushed all the way to the published images yet: https://github.com/docker-library/official-images/pull/16009

According to https://github.com/docker-library/docker/pull/468#issuecomment-1878086606, COS 105 will probably need --env DOCKER_IPTABLES_LEGACY=1 even with that change.

Ok, fix should be mostly deployed now. 👍

This is also happening for us when using ARC for GitHub runners.

Since it used dind:latest it broke our self-hosted runners pipeline.

We will try to force a previous version as mentioned above and will give feedback back asap.

EDIT (27/12/2023): This is how we, temporarily, solved it using the ARC helm chart.

We added this to our helm chart:

image:
  repository: "summerwind/actions-runner-controller"
  actionsRunnerRepositoryAndTag: "summerwind/actions-runner:latest"
  dindSidecarRepositoryAndTag: "docker:24.0.7-dind-alpine3.18"

@stevexuereb would you be able to test or help coordinate a test of #468 on GitLab to make sure I don’t cause a regression again? 😅

(docker build --pull 'https://github.com/docker-library/docker.git#refs/pull/468/merge:24/dind', in case that’s a helpful one-liner for you to get something running/tested)

@tianon certainly, I left my testing results in https://github.com/docker-library/docker/pull/468#issuecomment-1862504835 let me know if you needed something different 🙇


@tianon I’d be curious would it be possible to set up a test in GitHub Actions that builds the image in a PR, publishes it to some Registry (GitHub, DockerHub) and possibly trigger a pipeline on GitLab with that image (so there is no infra cost for you) so that we validate each PR going forward instead of running https://github.com/docker-library/docker/pull/468#issuecomment-1862504835 manually? We could also keep it vendor agnostic, and trigger jobs in GitHub actions as well since ARC was effected

I’d be happy to to see if I can try and coordinate with the Runner team to see if they can contribute this to the project, but was curious to think if this was a good idea for this project or not.

@stevexuereb would you be able to test or help coordinate a test of https://github.com/docker-library/docker/pull/468 on GitLab to make sure I don’t cause a regression again? 😅

(docker build --pull 'https://github.com/docker-library/docker.git#refs/pull/468/merge:24/dind', in case that’s a helpful one-liner for you to get something running/tested)

Thank you @tianon and @yosifkit for fixing this problem, we appreciate it 🙇 🚀

We (GitLab.com) have a similar issue (https://gitlab.com/gitlab-com/gl-infra/production/-/issues/17283) with 24.0.7 failing to start on Google Container Optimized OS.

As we can see with Alpine 3.19 changed the iptables version 👉 https://gitlab.com/gitlab-com/gl-infra/production/-/issues/17283#note_1695929693

We’ve also tried multiple Google Container Optmized OS versions and all of them seem to fail 👉 https://gitlab.com/gitlab-com/gl-infra/production/-/issues/17283#note_1696008058 and also tried some fixes in https://gitlab.com/gitlab-com/gl-infra/production/-/issues/17283#note_1696085805 which didn’t work.

The only thing that worked for us is changing the host image to Ubuntu 👉 https://gitlab.com/gitlab-com/gl-infra/production/-/issues/17283#note_1696057259 but this is not a viable option for us.


At the moment I’m not sure what our (GitLab.com) next steps are since Alpine 3.19 seems to be incompatible with Google Container Optimized OS, and it also seems like other users are having the same problem.

cc @tianon