docker-ipv6nat: Exits with "unable to detect hairpin mode (is the docker daemon running?)"

Version: v0.4.3 docker. Docker version: 20.10.1 and 20.10.2 OS: CentOS Linux release 8.3.2011 (Core)

After a system update, upon launching I get this error:

$ docker logs ipv6nat
2021/01/09 17:26:57 unable to detect hairpin mode (is the docker daemon running?)

After which the container exits and restarts.

Thinking it might be a permissions issue, I removed all --cap-adds, leaving only the --cap-drop ALL to test, but that broke it more:

2021/01/09 18:07:38 running [/sbin/iptables -t nat -C OUTPUT -m addrtype --dst-type LOCAL -j DOCKER --wait]: exit status 3: addrtype: Could not determine whether revision 1 is supported, assuming it is.
addrtype: Could not determine whether revision 1 is supported, assuming it is.
iptables v1.8.4 (legacy): can't initialize iptables table `nat': Permission denied (you must be root)
Perhaps iptables or your kernel needs to be upgraded.

I then tried to give it --cap-add ALL, but that did not fix it.

Since part of the system update was docker-ce, I thought maybe it had changed the backend rules, but:

# /sbin/iptables-save -t nat
# Generated by iptables-save v1.8.4 on Sat Jan  9 13:09:03 2021
*nat
...
-A OUTPUT ! -d 127.0.0.0/8 -m addrtype --dst-type LOCAL -j DOCKER
...

Clearly the right rule still exists. And checking manually:

# /sbin/iptables -t nat -C OUTPUT -m addrtype --dst-type LOCAL -j DOCKER --wait; echo "$?"
iptables: Bad rule (does a matching rule exist in that chain?).
1
# /sbin/iptables -t nat -C OUTPUT ! -d 127.0.0.0/8 -m addrtype --dst-type LOCAL -j DOCKER --wait; echo "$?"
0

The actual checking commands returns correctly as expected. I am using this code section as the reference: https://github.com/robbertkl/docker-ipv6nat/blob/v0.4.3/manager.go#L79-L86

At this point I downgraded dockerd back to 20.10.1, but I got the same error.

What is strange is that when I first did the system upgrade, dockerd restarted itself as usual, and all my containers came back online with IPv6 working. It was after an OS restart that this error started.

I tried to do a system rollback, but the old package versions couldn’t be found, so I’m stuck.

Full package list that I upgraded:

Package New Version Old Version
NetworkManager 1:1.26.0-12.el8_3.x86_64 1:1.26.0-9.el8_3.x86_64
NetworkManager-libnm 1:1.26.0-12.el8_3.x86_64 1:1.26.0-9.el8_3.x86_64
NetworkManager-team 1:1.26.0-12.el8_3.x86_64 1:1.26.0-9.el8_3.x86_64
NetworkManager-tui 1:1.26.0-12.el8_3.x86_64 1:1.26.0-9.el8_3.x86_64
gnutls 3.6.14-7.el8_3.x86_64 3.6.14-6.el8.x86_64
iptables 1.8.4-15.el8_3.3.x86_64 1.8.4-15.el8.x86_64
iptables-ebtables 1.8.4-15.el8_3.3.x86_64 1.8.4-15.el8.x86_64
iptables-libs 1.8.4-15.el8_3.3.x86_64 1.8.4-15.el8.x86_64
iptables-services 1.8.4-15.el8_3.3.x86_64 1.8.4-15.el8.x86_64
iwl100-firmware 39.31.5.1-101.el8_3.1.noarch 39.31.5.1-99.el8.1.noarch
iwl1000-firmware 1:39.31.5.1-101.el8_3.1.noarch 1:39.31.5.1-99.el8.1.noarch
iwl105-firmware 18.168.6.1-101.el8_3.1.noarch 18.168.6.1-99.el8.1.noarch
iwl135-firmware 18.168.6.1-101.el8_3.1.noarch 18.168.6.1-99.el8.1.noarch
iwl2000-firmware 18.168.6.1-101.el8_3.1.noarch 18.168.6.1-99.el8.1.noarch
iwl2030-firmware 18.168.6.1-101.el8_3.1.noarch 18.168.6.1-99.el8.1.noarch
iwl3160-firmware 1:25.30.13.0-101.el8_3.1.noarch 1:25.30.13.0-99.el8.1.noarch
iwl3945-firmware 15.32.2.9-101.el8_3.1.noarch 15.32.2.9-99.el8.1.noarch
iwl4965-firmware 228.61.2.24-101.el8_3.1.noarch 228.61.2.24-99.el8.1.noarch
iwl5000-firmware 8.83.5.1_1-101.el8_3.1.noarch 8.83.5.1_1-99.el8.1.noarch
iwl5150-firmware 8.24.2.2-101.el8_3.1.noarch 8.24.2.2-99.el8.1.noarch
iwl6000-firmware 9.221.4.1-101.el8_3.1.noarch 9.221.4.1-99.el8.1.noarch
iwl6000g2a-firmware 18.168.6.1-101.el8_3.1.noarch 18.168.6.1-99.el8.1.noarch
iwl6050-firmware 41.28.5.1-101.el8_3.1.noarch 41.28.5.1-99.el8.1.noarch
iwl7260-firmware 1:25.30.13.0-101.el8_3.1.noarch 1:25.30.13.0-99.el8.1.noarch
kexec-tools 2.0.20-34.el8_3.1.x86_64 2.0.20-34.el8.x86_64
linux-firmware 20200619-101.git3890db36.el8_3.noarch 20200619-99.git3890db36.el8.noarch
microcode_ctl 4:20200609-2.20201112.1.el8_3.x86_64 4:20200609-2.20201027.1.el8_3.x86_64
systemd 239-41.el8_3.1.x86_64 239-41.el8_3.x86_64
systemd-libs 239-41.el8_3.1.x86_64 239-41.el8_3.x86_64
systemd-pam 239-41.el8_3.1.x86_64 239-41.el8_3.x86_64
systemd-udev 239-41.el8_3.1.x86_64 239-41.el8_3.x86_64
tuned 2.14.0-3.el8_3.1.noarch 2.14.0-3.el8.noarch
tzdata 2020f-1.el8.noarch 2020d-1.el8.noarch
docker-ce 3:20.10.2-3.el8.x86_64 3:20.10.1-3.el8.x86_64
docker-ce-cli 1:20.10.2-3.el8.x86_64 1:20.10.1-3.el8.x86_64
docker-ce-rootless-extras 20.10.2-3.el8.x86_64 20.10.1-3.el8.x86_64

Seems like coreos/go-iptables/issues/79 could be related.

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 42 (19 by maintainers)

Commits related to this issue

Most upvoted comments

Hi all,

With Docker 20.10.6 the ipv6nat function is fully intergrated (experimental). You can add the following flags to your daemon.json: { "ipv6": true, "fixed-cidr-v6": "fd00::/80", "experimental": true, "ip6tables": true }

To so to be 100% clear, this “backport” in the RedHat packages happened between versions 1.8.4-15.el8 and 1.8.4-15.el8_3.3?

Correct, see the changelog here: https://centos.pkgs.org/8/centos-baseos-x86_64/iptables-services-1.8.4-15.el8_3.3.x86_64.rpm.html

I spoke to Phil Sutter from RedHat, who did both the upstream patch as well as its backport into RHEL8.3.

The commit in question is here. To quote Phil:

Sadly it is not part of an official release yet, ETA is v1.8.7.

About the issue we’re seeing in the Docker container:

Basically it’s a problem with data representation inside the container. The iptables binary in there doesn’t respect the reduced payload expression length and due to absence of the (not needed) bitwise expression assumes the full address is being matched.

So aside from the workaround (downgrading as detailed here) I guess the only solution would be to either wait for 1.8.7 (and its Alpine edge packages) or build a patched version and ship that in the container image.

Well that fixed it. 🤕

Wow, that was quite the journey. Great you figured it out! And thanks for the detailed fix.

Let’s leave it at this. I’ll keep an eye out for more reports of this issue.

Makes me wonder if I could have mounted the host xtables-nft-multi binary in the container to fix it. Probably only if it was statically linked, since the container runs on Alpine (MUSL based IIRC).

Yeah, that’s usually a no-go, for that exact reason.

Just pushed out a new release v0.4.4 which contains the fix for this issue! Docker images for all architectures are on Docker Hub as :0.4.4 and :latest. Thanks everyone!

If this is the issue, couldn’t it be fixed by upgrading the ipv6net Docker container to use a newer version of iptables? Maybe as an opt-in (eg. a new tag)

That seems to be the plan, once iptables is updated in Alpine. We all seem to agree that we should wait until it is “stable” before doing that.

I can confirm that this error occurs with a fresh install of CentOS. The downgrade solution helped for me.

Well that fixed it. 🤕

Feel free to close this issue if you want, since it seems to be a problem with an external package. Though if you think it is a compatibility issue, I would be happy to help you continue to debug it (though not on my home prod system).

Full fix detailed, in case anyone else has the exact same stuck package scenario:

# Get all "old" packages
$ wget http://mirror.centos.org/centos/8/BaseOS/x86_64/os/Packages/iptables-1.8.4-15.el8.x86_64.rpm
$ wget http://mirror.centos.org/centos/8/BaseOS/x86_64/os/Packages/iptables-ebtables-1.8.4-15.el8.x86_64.rpm
$ wget http://mirror.centos.org/centos/8/BaseOS/x86_64/os/Packages/iptables-libs-1.8.4-15.el8.x86_64.rpm
$ wget http://mirror.centos.org/centos/8/BaseOS/x86_64/os/Packages/iptables-services-1.8.4-15.el8.x86_64.rpm

$ sudo yum downgrade ./iptables-*

# Destroy the container, just in case any loaded kernel modules stick around
$ docker rm -f ipv6nat

# Rebooting was the only thing that fixed it for me
$ sudo reboot

# Recreate the container
$ docker run -d --name ipv6nat --cap-drop ALL --cap-add NET_ADMIN --cap-add NET_RAW --network host --restart unless-stopped -v /var/run/docker.sock:/var/run/docker.sock:ro robbertkl/ipv6nat
# or whatever your command is

Thanks for all your help tracking down what was causing the problem!

Your system uses nftables and I think it’s only iptables-nft in the container that talks directly to nftables. I don’t think it’s even using the iptables installed on the host.

You are right I think. I guess somehow the new package version has some bug that interacts with the kernel incorrectly, and then saves (and then later prints) the rules incorrectly?? Yeah, doesn’t make sense to me either. I wouldn’t even know how to go about reporting this as a bug to the package maintainer. I guess I would need to prove that the rule actually got saved wrong somehow.

Also wondering how iptables on the host could have an effect in the first place: I don’t think it’s even used?

Makes me wonder if I could have mounted the host xtables-nft-multi binary in the container to fix it. Probably only if it was statically linked, since the container runs on Alpine (MUSL based IIRC).