moby: RHEL7/CentOS7 cannot reach another container published network service

Description

On CentOS 7 and RHEL 7 (and possibly any Linux OS using firewalld) we have the following problem that when at least 2 containers are running on the same host, one container cannot access the network services offered by the other by using the “external” IP address or hostname. The error message returned is host unreachable or no route to host.

Note that some times even when setting a “hostname” to a docker container (via the --hostname option), and being able to ping that hostname from another container (the hostname is then resolved to the Docker internal IP), it might still not work because some applications (e.g. gitlab-runner) are still resolving the given hostname using the external DNS resolver and not the one of the Docker network. Weird but true.

Someone reported already the problem (#24370) but did not provide enough information, and thus the issue was closed. I have all necessary information, and I can provide more on demand.

Steps to reproduce the issue:

I have found a series of steps that are easy to do by anyone and can reproduce the problem. It is assuming that in your home directory you have a html-pub folder in which there is a static index.html file (mkdir ~/html-pub then download and simple HTML static file from the internet and put it in that folder). All commands are run on the host where Docker 17.03 is running.

It is also assumed that the IP address of the host is 192.168.1.2.

docker run --name nginx --detach -p 192.168.1.2:80:80 -v ~/html-pub:/usr/share/nginx/html:ro nginx:stable-alpine
docker run --rm -it alpine:3.5 wget http://192.168.1.2/

Describe the results you received:

On CentOS 7 with firewalld installed I receive this:

Connecting to 192.168.1.2 (192.168.1.2:80)
wget: can't connect to remote host (192.168.1.2): Host is unreachable

Describe the results you expected:

On Ubuntu without firewalld (but still with a firewall), I get this:

Connecting to 192.168.1.2 (192.168.1.2:80)
index.html           100% |*******************************|  3700   0:00:00 ETA

Additional information you deem important (e.g. issue happens only occasionally):

On CentOS 7, doing the following solved the problem. But I would expect the docker run command to do those extra steps as I used the -p flag.

sudo firewall-cmd --zone=trusted --add-interface=docker0
sudo firewall-cmd --zone=public --add-port=80/tcp

Note: The above command are for testing. If one wants them permanent, one needs to add the --permanent flag to both commands and then execute sudo firewall-cmd --reload.

Update 20170330: actually only the second command is enough, adding docker0 to the trusted zone has no effect.

The above is a dummy example. A real life test case where this is failing us is when running on the same host the GitLab and GitLab Runner containers. We had to use a different hostname for the docker run command than the real hostname users use to access our own internal instance of GitLab in order for the gitlab-runner toregister successfully. But then when trying to use that runner, it cannot clone the repository, GitLab provides the “external” FQDN for the repository the runner should clone, and the runner fails before even starting the job because the host is unreachable. The nginx example is therefore relevant and a much easier way of demonstrating the issue.

Output of docker version:

Client:
 Version:      17.03.0-ce
 API version:  1.26
 Go version:   go1.7.5
 Git commit:   3a232c8
 Built:        Tue Feb 28 08:10:07 2017
 OS/Arch:      linux/amd64

Server:
 Version:      17.03.0-ce
 API version:  1.26 (minimum version 1.12)
 Go version:   go1.7.5
 Git commit:   3a232c8
 Built:        Tue Feb 28 08:10:07 2017
 OS/Arch:      linux/amd64
 Experimental: false

Output of docker info:

Containers: 28
 Running: 10
 Paused: 0
 Stopped: 18
Images: 300
Server Version: 17.03.0-ce
Storage Driver: devicemapper
 Pool Name: vg_spc-thpl_docker
 Pool Blocksize: 524.3 kB
 Base Device Size: 10.74 GB
 Backing Filesystem: xfs
 Data file:
 Metadata file:
 Data Space Used: 23.87 GB
 Data Space Total: 1.44 TB
 Data Space Available: 1.416 TB
 Metadata Space Used: 12.45 MB
 Metadata Space Total: 16.98 GB
 Metadata Space Available: 16.97 GB
 Thin Pool Minimum Free Space: 144 GB
 Udev Sync Supported: true
 Deferred Removal Enabled: true
 Deferred Deletion Enabled: true
 Deferred Deleted Device Count: 0
 Library Version: 1.02.135-RHEL7 (2016-11-16)
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host macvlan null overlay
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 977c511eda0925a723debdc94d09459af49d082a
runc version: a01dafd48bc1c7cc12bdb01206f9fea7dd6feb70
init version: 949e6fa
Security Options:
 seccomp
  Profile: default
Kernel Version: 3.10.0-514.10.2.el7.x86_64
Operating System: CentOS Linux 7 (Core)
OSType: linux
Architecture: x86_64
CPUs: 12
Total Memory: 32 GiB
Name: *******************
ID: ****************************
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
WARNING: bridge-nf-call-iptables is disabled
WARNING: bridge-nf-call-ip6tables is disabled
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false

Additional environment details (AWS, VirtualBox, physical, etc.):

The host runs CentOS 7.3 on bare-metal (so physical). But I have also reproduced it inside a VM (KVM) during the investigation.

I also tried the above (and got the expected result) on a Ubuntu 16.04 LTS running the 4.8 HWE kernel, this is also a bare-metal machine, x86_64 too but with only 2 CPUs and 8GiB RAM, the storage driver is btrfs.

About this issue

Original URL
State: open
Created 7 years ago
Reactions: 9
Comments: 26 (20 by maintainers)

Most upvoted comments

Hi there!

I had same issue. But I add this.

I resoleved !

firewall-cmd --zone=public --add-masquerade --permanent
firewall-cmd --reload
systemctl restart docker

if not work that try this.


firewall-cmd --permanent --zone=trusted --add-interface=docker0
firewall-cmd --permanent --zone=trusted --add-port=4243/tcp

firewall-cmd --reload
systemctl restart docker

Hope it is help your issue.

+13

yuta-hidaka on Feb 7, 2020

Hey guys! I faced the same problem using SIP, to solve I used the following command:

sudo firewall-cmd --add-service=sip --permanent
sudo firewall-cmd --reload

On your case @jcberthon try this:

sudo firewall-cmd --add-service=http --permanent
sudo firewall-cmd --reload

eltonplima on Jul 8, 2018

@eltonplima save my day i trying 6h to fix it! THANKS!

But in my case i need to use: sudo firewall-cmd --add-service=https --permanent sudo firewall-cmd --reload

makoso on Oct 24, 2018

I forgot to give credits to the person who solved the problem: Nena on StackOverflow.

jcberthon on Mar 27, 2017

Any updates on this? I am experiencing this problem on a system that doesn’t use firewalld but only plain iptables. This really is a blocker.

I have a mail server and several other services running on a host. The mail server runs in a different docker bridge network than the other services, but publicly exposes its ports. The other services should be able to access the mail server using the publicly exposed ports, but no connection can be made.

cdauth on Jun 2, 2020

Hi @whgibbo

I haven’t received any news, and I’m still using the proposed work around.

jcberthon on Jul 20, 2017

Hello @thaJeztah

Any news on this front? How can I help further on this topic?

jcberthon on Apr 5, 2017

Hello @thaJeztah

I did some further investigation.

I did a new test using a Ubuntu droplet. I set it up like I did for the CentOS one, but of course using apt instead of yum, etc. To make the test more relevant, I activated ufw before installing docker:

# ufw allow 22
# ufw enable

Then I installed Docker and run the same docker container (as described for CentOS). But instead of getting host unreachable right away. After a long period (ca. a minute) I get a timeout. So here again the firewall is blocking inter-container communication when using a public IP.

To solve that, on CentOS one can use the command in my original issue post or refer to here https://serverfault.com/questions/684602/how-to-open-port-for-a-specific-ip-address-with-firewall-cmd-on-centos if one wants to restrict which source IP can connect to the opened port (e.g. giving the docker network IP range as source). e.g.

# firewall-cmd --zone=public --add-rich-rule='
  rule family="ipv4"
  source address="172.18.0.1/24"
  port protocol="tcp" port="80" accept'

Or you can do (it is similar to the above)

# firewall-cmd --permanent --new-zone=special
# firewall-cmd --reload
# firewall-cmd --zone=special --add-source=172.18.0.1/24
# firewall-cmd --zone=special --add-port=80/tcp

On Ubuntu do:

# ufw allow proto tcp from 172.18.0.1/24 to any port 80

So it really depends on how the firewall is configured in the first place. Perhaps Docker could make sure that Docker containers can connect to each others when using the external IP address by creating these rules automatically.

jcberthon on Mar 31, 2017

No problem, thanks for helping 😃

I know I cannot do on the main server the restart of Docker because it is configured to restart all container (We are running CentOS 7.3 so we have this bug where we need to set mounts as private in the systemd unit in order to avoid leaking LVM mounts, but this implies that we cannot use live restore).

However, I can do that on my KVM instance (virtual machine), on which nothing important is running. So on this VM, I’m also running CentOS 7.3 with all updates applied (as of this weekend), the setup is similar than on the other host where we run Docker 17.03.0-ce. The only difference is that the storage driver is overlay instead of LVM.

So when I do this after a clean reboot:

systemctl restart docker
docker run --name nginx --detach -p 192.168.1.2:80:80 -v ~/html-pub:/usr/share/nginx/html:ro nginx:stable-alpine
docker run --rm -it alpine:3.5 wget http://192.168.1.2/

This fails with host unreachable as reported. And this is the same error I see in production on the other host.

jcberthon on Apr 19, 2017