moby: RHEL7/CentOS7 cannot reach another container published network service
Description
On CentOS 7 and RHEL 7 (and possibly any Linux OS using firewalld
) we have the following problem that when at least 2 containers are running on the same host, one container cannot access the network services offered by the other by using the “external” IP address or hostname. The error message returned is host unreachable
or no route to host
.
Note that some times even when setting a “hostname” to a docker container (via the --hostname
option), and being able to ping that hostname from another container (the hostname is then resolved to the Docker internal IP), it might still not work because some applications (e.g. gitlab-runner) are still resolving the given hostname using the external DNS resolver and not the one of the Docker network. Weird but true.
Someone reported already the problem (#24370) but did not provide enough information, and thus the issue was closed. I have all necessary information, and I can provide more on demand.
Steps to reproduce the issue:
I have found a series of steps that are easy to do by anyone and can reproduce the problem. It is assuming that in your home directory you have a html-pub
folder in which there is a static index.html
file (mkdir ~/html-pub
then download and simple HTML static file from the internet and put it in that folder). All commands are run on the host where Docker 17.03 is running.
It is also assumed that the IP address of the host is 192.168.1.2
.
- docker run --name nginx --detach -p 192.168.1.2:80:80 -v ~/html-pub:/usr/share/nginx/html:ro nginx:stable-alpine
- docker run --rm -it alpine:3.5 wget http://192.168.1.2/
Describe the results you received:
On CentOS 7 with firewalld installed I receive this:
Connecting to 192.168.1.2 (192.168.1.2:80)
wget: can't connect to remote host (192.168.1.2): Host is unreachable
Describe the results you expected:
On Ubuntu without firewalld (but still with a firewall), I get this:
Connecting to 192.168.1.2 (192.168.1.2:80)
index.html 100% |*******************************| 3700 0:00:00 ETA
Additional information you deem important (e.g. issue happens only occasionally):
On CentOS 7, doing the following solved the problem. But I would expect the docker run
command to do those extra steps as I used the -p
flag.
sudo firewall-cmd --zone=trusted --add-interface=docker0
sudo firewall-cmd --zone=public --add-port=80/tcp
Note: The above command are for testing. If one wants them permanent, one needs to add the --permanent
flag to both commands and then execute sudo firewall-cmd --reload
.
Update 20170330: actually only the second command is enough, adding docker0 to the trusted zone has no effect.
The above is a dummy example. A real life test case where this is failing us is when running on the same host the GitLab and GitLab Runner containers. We had to use a different hostname for the docker run
command than the real hostname users use to access our own internal instance of GitLab in order for the gitlab-runner toregister successfully. But then when trying to use that runner, it cannot clone the repository, GitLab provides the “external” FQDN for the repository the runner should clone, and the runner fails before even starting the job because the host is unreachable. The nginx example is therefore relevant and a much easier way of demonstrating the issue.
Output of docker version
:
Client:
Version: 17.03.0-ce
API version: 1.26
Go version: go1.7.5
Git commit: 3a232c8
Built: Tue Feb 28 08:10:07 2017
OS/Arch: linux/amd64
Server:
Version: 17.03.0-ce
API version: 1.26 (minimum version 1.12)
Go version: go1.7.5
Git commit: 3a232c8
Built: Tue Feb 28 08:10:07 2017
OS/Arch: linux/amd64
Experimental: false
Output of docker info
:
Containers: 28
Running: 10
Paused: 0
Stopped: 18
Images: 300
Server Version: 17.03.0-ce
Storage Driver: devicemapper
Pool Name: vg_spc-thpl_docker
Pool Blocksize: 524.3 kB
Base Device Size: 10.74 GB
Backing Filesystem: xfs
Data file:
Metadata file:
Data Space Used: 23.87 GB
Data Space Total: 1.44 TB
Data Space Available: 1.416 TB
Metadata Space Used: 12.45 MB
Metadata Space Total: 16.98 GB
Metadata Space Available: 16.97 GB
Thin Pool Minimum Free Space: 144 GB
Udev Sync Supported: true
Deferred Removal Enabled: true
Deferred Deletion Enabled: true
Deferred Deleted Device Count: 0
Library Version: 1.02.135-RHEL7 (2016-11-16)
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host macvlan null overlay
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 977c511eda0925a723debdc94d09459af49d082a
runc version: a01dafd48bc1c7cc12bdb01206f9fea7dd6feb70
init version: 949e6fa
Security Options:
seccomp
Profile: default
Kernel Version: 3.10.0-514.10.2.el7.x86_64
Operating System: CentOS Linux 7 (Core)
OSType: linux
Architecture: x86_64
CPUs: 12
Total Memory: 32 GiB
Name: *******************
ID: ****************************
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
WARNING: bridge-nf-call-iptables is disabled
WARNING: bridge-nf-call-ip6tables is disabled
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
Additional environment details (AWS, VirtualBox, physical, etc.):
The host runs CentOS 7.3 on bare-metal (so physical). But I have also reproduced it inside a VM (KVM) during the investigation.
I also tried the above (and got the expected result) on a Ubuntu 16.04 LTS running the 4.8 HWE kernel, this is also a bare-metal machine, x86_64 too but with only 2 CPUs and 8GiB RAM, the storage driver is btrfs.
About this issue
- Original URL
- State: open
- Created 7 years ago
- Reactions: 9
- Comments: 26 (20 by maintainers)
Hi there!
I had same issue. But I add this.
I resoleved !
if not work that try this.
Hope it is help your issue.
Hey guys! I faced the same problem using SIP, to solve I used the following command:
On your case @jcberthon try this:
@eltonplima save my day i trying 6h to fix it! THANKS!
But in my case i need to use:
sudo firewall-cmd --add-service=https --permanent
sudo firewall-cmd --reload
I forgot to give credits to the person who solved the problem: Nena on StackOverflow.
Any updates on this? I am experiencing this problem on a system that doesn’t use
firewalld
but only plainiptables
. This really is a blocker.I have a mail server and several other services running on a host. The mail server runs in a different docker bridge network than the other services, but publicly exposes its ports. The other services should be able to access the mail server using the publicly exposed ports, but no connection can be made.
Hi @whgibbo
I haven’t received any news, and I’m still using the proposed work around.
Hello @thaJeztah
Any news on this front? How can I help further on this topic?
Hello @thaJeztah
I did some further investigation.
I did a new test using a Ubuntu droplet. I set it up like I did for the CentOS one, but of course using apt instead of yum, etc. To make the test more relevant, I activated
ufw
before installing docker:Then I installed Docker and run the same docker container (as described for CentOS). But instead of getting host unreachable right away. After a long period (ca. a minute) I get a timeout. So here again the firewall is blocking inter-container communication when using a public IP.
To solve that, on CentOS one can use the command in my original issue post or refer to here https://serverfault.com/questions/684602/how-to-open-port-for-a-specific-ip-address-with-firewall-cmd-on-centos if one wants to restrict which source IP can connect to the opened port (e.g. giving the docker network IP range as source). e.g.
Or you can do (it is similar to the above)
On Ubuntu do:
So it really depends on how the firewall is configured in the first place. Perhaps Docker could make sure that Docker containers can connect to each others when using the external IP address by creating these rules automatically.
No problem, thanks for helping 😃
I know I cannot do on the main server the restart of Docker because it is configured to restart all container (We are running CentOS 7.3 so we have this bug where we need to set mounts as private in the systemd unit in order to avoid leaking LVM mounts, but this implies that we cannot use live restore).
However, I can do that on my KVM instance (virtual machine), on which nothing important is running. So on this VM, I’m also running CentOS 7.3 with all updates applied (as of this weekend), the setup is similar than on the other host where we run Docker 17.03.0-ce. The only difference is that the storage driver is overlay instead of LVM.
So when I do this after a clean reboot:
systemctl restart docker
docker run --name nginx --detach -p 192.168.1.2:80:80 -v ~/html-pub:/usr/share/nginx/html:ro nginx:stable-alpine
docker run --rm -it alpine:3.5 wget http://192.168.1.2/
This fails with
host unreachable
as reported. And this is the same error I see in production on the other host.