moby: service in docker-compose resolved wrong ip ,resulting in a connection refused.
Description
I am using docker-compose to manage my docker service.I have some containers which are running in a same docker-compose network.but it gave me surprise that when container A connect to container B by service name,it was refused as the ip of service was resolved to a wrong ip.
Steps to reproduce the issue: it is my first time to see this strange behavior
I can not reproduce it
it works after I restarted the containert B
Describe the results you received: another container in a same docker-compose network is refused when trying to connect it as it is resolved to a wrong ip
Describe the results you expected: I hope the ip is resovled right
Additional information you deem important (e.g. issue happens only occasionally):
Output of docker version:
Client:
 Version:           18.09.2
 API version:       1.39
 Go version:        go1.10.6
 Git commit:        6247962
 Built:             Sun Feb 10 04:13:50 2019
 OS/Arch:           linux/amd64
 Experimental:      false
Server: Docker Engine - Community
 Engine:
  Version:          18.09.2
  API version:      1.39 (minimum version 1.12)
  Go version:       go1.10.6
  Git commit:       6247962
  Built:            Sun Feb 10 03:42:13 2019
  OS/Arch:          linux/amd64
  Experimental:     false
Output of docker info:
Containers: 14
 Running: 14
 Paused: 0
 Stopped: 0
Images: 84
Server Version: 18.09.2
Storage Driver: overlay2
 Backing Filesystem: extfs
 Supports d_type: true
 Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 9754871865f7fe2f4e74d43e2fc7ccd237edcbce
runc version: 09c8266bf2fcf9519a651b04ae54c967b9ab86ec
init version: fec3683
Security Options:
 apparmor
 seccomp
  Profile: default
Kernel Version: 4.4.0-117-generic
Operating System: Ubuntu 16.04.4 LTS
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 7.796GiB
Name: hk-gino-dev-03
ID: 3OS3:JEQT:O7VV:4ZPA:TL7E:IIFD:GDEQ:VYP3:4IX5:WABO:7C7X:K25G
Docker Root Dir: /data/docker/docker-data
Debug Mode (client): false
Debug Mode (server): false
Username: zffocus
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false
Product License: Community Engine
WARNING: No swap limit support
Additional environment details (AWS, VirtualBox, physical, etc.): ubuntu 16
docker-compose info
docker-compose version 1.23.2, build 1110ad01
docker-py version: 3.6.0
CPython version: 3.6.7
OpenSSL version: OpenSSL 1.1.0f  25 May 2017
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 18 (2 by maintainers)
_EDIT: Please use the updated patch of the post below https://github.com/moby/moby/issues/41766#issuecomment-1265190043_
I can confirm this behavior of the DNS system reporting the wrong container IPv4 address with one increment difference between the actual and reported IPv4 address when using docker in swarm-mode inside a LXC container (Arch linux image) created by LXD.
I do however found a small workaround fix by adding the "hostname: " entry to the docker-compose stack file. The hostname field is then made equal to the service name, resulting in the correct IPv4 address when querying by the service name.
What the fix looks like for the latest discussed situation:
What the fix looks like for the OP situation:
Version info of the test system used:
@debugtux @jigneshkhatri
I found another workaround that might not have the race condition. It seems that is you use endpoint_mode dnsrr (https://docs.docker.com/network/overlay/#bypass-the-routing-mesh-for-a-swarm-service) it works as expected.
I also noted that this bug (so without hostname or endpoint_mode fix) makes replicas undiscoverable. If you have a service replicated 4 times and you try to list them with PHP as follows:
print_r(gethostbynamel('SERVICENAME'));PHP Will return only one IP and the IP is going to be -1 to one of the real IP.[PATCH UPDATE]
I did some more debugging as I noticed a very rare instability on the containers (failed connections due to wrong ip addr). Turns out the patch creates a race-condition on the name to be resolved by giving the name both the correct and incorrect ip address at the same time. Therefore, every so often the wrong ip is provided by the resolver.
However, the patch still works and is stable with a different servicename to hostname. Only query by the hostname given and the correct ip address will be provided consistently; the servicename ip address will then again be off by one increment. My preferred way of implementing this is adding “service_” to the servicename and query the desired hostname as previously done.
An example of the updated patch with a nginx container to be resolved at ‘nginx’:
Last thing to note is that I am unable to replicate this problematic ip assignment behavior on docker-compose. I only encounter this problematic behavior with docker swarm (docker stack deploy); as this issue is originally directed to docker-compose and I am not the OP.
@thaJeztah I think I am having the same issue. Essentially the docker swarm DNS server has the wrong IP address in it’s A records. The IP addresses are all minus 1 from what they should be, ie
10.0.4.8->10.0.4.7.I deploy a stack to docker swarm, I use docker-compose to create three services and add them to the same overlay network. I should be able to ping one from the other using
ping stack_service-name.stack_network-name, ieping nextcloud-mariadb.infraTest_infraNet. The resulting output shows the IP address that it is resolving to is shifted by 1. Given that it is always shifted by 1, I think it is reasonable to assume there is a bug somewhere in the DNS records of the overlay network.It is looking at IP 10.0.4.7, however if I inspect the network, I see the actual IP for that service is 10.0.4.8 and indeed, I can ping 10.0.4.8 and it works. This behaviour is the same for all the services that I deploy via a stack and is always the actual IP -1.
Notes:
infraTest_nextcloud-mariadb.1.icovmyqaweew7co5il5tef1khor the correct IP address10.0.4.8.infraTest_nextcloud.infraTest_infraNetping infraTest_nextcloud.infraTest_infraNetas the IP address that it looks for is the correct IP -1nslookup infraTest_nextcloud.infraTest_infraNetas it doesn’t not seem to find a DNS entrySteps to reproduce:
Using this command to deploy:
docker stack deploy --compose-file docker-compose.yml infraTestRun
docker network inspect infraNetExec into any other container such as busybox
docker exec -it infraTest_busybox.1.zesvs77dm5z91drms72mvu0zo /bin/shPing infraTest_nextcloud.infraTest_infraNetwill show that it is looking for 10.0.4.7 instead of the correct 10.0.4.8This can be repeated with any service in the stack, the IPs that it looks for are always -1 the actual IP. It can also be tested using ping from any container (I have tested busybox and the adminer container that I was using)
Networks can be seen with
docker network lsand
System Info:
and
@thaJeztah, thanks for your reply. Yes, you are correct, using the verbose option I see that the IP I’m seeing is the VIP of the service (which is 1 less than the container IPs). So I guess that leads to the question of why is it not able to connect on that IP? Why can’t I ping the service from another container on the same overlay network? Using the adminer example, I try to connect to the database and it gives an error of “Host is unreachable”. Shouldn’t any requests to the service VIP get re-directed to one of the container IPs in that service?
Note: I can connect using this
InfraTest_nextcloud-mariadb.1.wztwijuhouvab5m1s4vt0g6xwand10.0.4.3but notInfraTest_nextcloud-mariadbwhich is what I want. Also10.0.4.2which is the service VIP also doesn’t work.