moby: service in docker-compose resolved wrong ip ,resulting in a connection refused.

Description

I am using docker-compose to manage my docker service.I have some containers which are running in a same docker-compose network.but it gave me surprise that when container A connect to container B by service name,it was refused as the ip of service was resolved to a wrong ip.

Steps to reproduce the issue: it is my first time to see this strange behavior

I can not reproduce it

it works after I restarted the containert B

Describe the results you received: another container in a same docker-compose network is refused when trying to connect it as it is resolved to a wrong ip

Describe the results you expected: I hope the ip is resovled right

Additional information you deem important (e.g. issue happens only occasionally):

Output of docker version:

Client:
 Version:           18.09.2
 API version:       1.39
 Go version:        go1.10.6
 Git commit:        6247962
 Built:             Sun Feb 10 04:13:50 2019
 OS/Arch:           linux/amd64
 Experimental:      false

Server: Docker Engine - Community
 Engine:
  Version:          18.09.2
  API version:      1.39 (minimum version 1.12)
  Go version:       go1.10.6
  Git commit:       6247962
  Built:            Sun Feb 10 03:42:13 2019
  OS/Arch:          linux/amd64
  Experimental:     false

Output of docker info:

Containers: 14
 Running: 14
 Paused: 0
 Stopped: 0
Images: 84
Server Version: 18.09.2
Storage Driver: overlay2
 Backing Filesystem: extfs
 Supports d_type: true
 Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 9754871865f7fe2f4e74d43e2fc7ccd237edcbce
runc version: 09c8266bf2fcf9519a651b04ae54c967b9ab86ec
init version: fec3683
Security Options:
 apparmor
 seccomp
  Profile: default
Kernel Version: 4.4.0-117-generic
Operating System: Ubuntu 16.04.4 LTS
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 7.796GiB
Name: hk-gino-dev-03
ID: 3OS3:JEQT:O7VV:4ZPA:TL7E:IIFD:GDEQ:VYP3:4IX5:WABO:7C7X:K25G
Docker Root Dir: /data/docker/docker-data
Debug Mode (client): false
Debug Mode (server): false
Username: zffocus
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false
Product License: Community Engine

WARNING: No swap limit support

Additional environment details (AWS, VirtualBox, physical, etc.): ubuntu 16

docker-compose info

docker-compose version 1.23.2, build 1110ad01
docker-py version: 3.6.0
CPython version: 3.6.7
OpenSSL version: OpenSSL 1.1.0f  25 May 2017

About this issue

Original URL
State: closed
Created 4 years ago
Comments: 18 (2 by maintainers)

Most upvoted comments

_EDIT: Please use the updated patch of the post below https://github.com/moby/moby/issues/41766#issuecomment-1265190043_

I can confirm this behavior of the DNS system reporting the wrong container IPv4 address with one increment difference between the actual and reported IPv4 address when using docker in swarm-mode inside a LXC container (Arch linux image) created by LXD.

I do however found a small workaround fix by adding the "hostname: " entry to the docker-compose stack file. The hostname field is then made equal to the service name, resulting in the correct IPv4 address when querying by the service name.

What the fix looks like for the latest discussed situation:

version: "3.8"

services:
  nextcloud-mariadb:
    image: mariadb
    hostname: nextcloud-mariadb
    ...

What the fix looks like for the OP situation:

version: '3'

services:
  nginx:
    container_name: nginx
    hostname: nginx
    image: nginx
    ...
  doxturbo:
    container_name: doxturbo
    hostname: doxturbo
    image: "doxturbo:local"
    ...

Version info of the test system used:

LXD/LXC version: 4.13
Container kernel version: 5.12.1-arch1-1
Docker version: 20.10.7 (OS/Arch: linux/amd64)
Containerd version: v1.5.2
Stack yml compose version: 3.8

+10

debugtux on Oct 3, 2022

@debugtux @jigneshkhatri

I found another workaround that might not have the race condition. It seems that is you use endpoint_mode dnsrr (https://docs.docker.com/network/overlay/#bypass-the-routing-mesh-for-a-swarm-service) it works as expected.

services:
  nginx:
    image: nginx
    deploy:
        endpoint_mode: dnsrr

I also noted that this bug (so without hostname or endpoint_mode fix) makes replicas undiscoverable. If you have a service replicated 4 times and you try to list them with PHP as follows: print_r(gethostbynamel('SERVICENAME')); PHP Will return only one IP and the IP is going to be -1 to one of the real IP.

fchapo on Mar 15, 2023

[PATCH UPDATE]

I did some more debugging as I noticed a very rare instability on the containers (failed connections due to wrong ip addr). Turns out the patch creates a race-condition on the name to be resolved by giving the name both the correct and incorrect ip address at the same time. Therefore, every so often the wrong ip is provided by the resolver.

However, the patch still works and is stable with a different servicename to hostname. Only query by the hostname given and the correct ip address will be provided consistently; the servicename ip address will then again be off by one increment. My preferred way of implementing this is adding “service_” to the servicename and query the desired hostname as previously done.

An example of the updated patch with a nginx container to be resolved at ‘nginx’:

services:
  service_nginx:
    image: nginx
    hostname: nginx
    ...

Last thing to note is that I am unable to replicate this problematic ip assignment behavior on docker-compose. I only encounter this problematic behavior with docker swarm (docker stack deploy); as this issue is originally directed to docker-compose and I am not the OP.

debugtux on Oct 3, 2022

@thaJeztah I think I am having the same issue. Essentially the docker swarm DNS server has the wrong IP address in it’s A records. The IP addresses are all minus 1 from what they should be, ie 10.0.4.8 -> 10.0.4.7.

I deploy a stack to docker swarm, I use docker-compose to create three services and add them to the same overlay network. I should be able to ping one from the other using ping stack_service-name.stack_network-name, ie ping nextcloud-mariadb.infraTest_infraNet. The resulting output shows the IP address that it is resolving to is shifted by 1. Given that it is always shifted by 1, I think it is reasonable to assume there is a bug somewhere in the DNS records of the overlay network.

# ping nextcloud-mariadb.infraTest_infraNet
PING nextcloud-mariadb.infraTest_infraNet (10.0.4.7): 56 data bytes
^C
--- nextcloud-mariadb.infraTest_infraNet ping statistics ---
6 packets transmitted, 0 packets received, 100% packet loss

It is looking at IP 10.0.4.7, however if I inspect the network, I see the actual IP for that service is 10.0.4.8 and indeed, I can ping 10.0.4.8 and it works. This behaviour is the same for all the services that I deploy via a stack and is always the actual IP -1.

Notes:

Using adminer, I can connect to the nextcloud-mariadb database using either the full container ID infraTest_nextcloud-mariadb.1.icovmyqaweew7co5il5tef1kh or the correct IP address 10.0.4.8.
Using busybox and adminer, I can ping the above host name and IP address as well
Using adminer I cannot access the database using infraTest_nextcloud.infraTest_infraNet
Using adminer or busybox, I cannot ping infraTest_nextcloud.infraTest_infraNet as the IP address that it looks for is the correct IP -1
using busybox, I cannot nslookup infraTest_nextcloud.infraTest_infraNet as it doesn’t not seem to find a DNS entry

Steps to reproduce:

Deploy this docker-compose or any docker compose that has at least 2 services in a user defined network.

version: "3.8"

services:

  nextcloud-mariadb:
    image: mariadb
    volumes:
      - /zfs/nextcloud-mariadb:/var/lib/mysql
    environment:
      - MYSQL_ROOT_PASSWORD=xxx
      - MYSQL_PASSWORD=xxx
      - MYSQL_DATABASE=nextcloud
      - MYSQL_USER=nextcloud
    networks:
      - infraNet
    ports:
      - "3306:3306"

  adminer:
    image: adminer
    networks:
      - infraNet
    ports:
      - target: 8080
        published: 8081
        protocol: tcp
        mode: host

  busybox:
    image: busybox
    networks:
      - infraNet
    command: sleep 3000

networks:
  infraNet:
    external: false

Using this command to deploy: docker stack deploy --compose-file docker-compose.yml infraTest
Run docker network inspect infraNet

"1c1d429ff92f91b0784259ed220729fd854c7f8ed7f1ce63724b19b07d2f0ce2": {
    "Name": "infraTest_busybox.1.o7558rpwomdn2qq49zrp7gdtc",
    "EndpointID": "813071bee520338ba994610492fa1187da2af873f227d564b6eccd8e0adad885",
    "MacAddress": "02:42:0a:00:04:06",
    "IPv4Address": "10.0.4.6/24",
    "IPv6Address": ""
},
"5c5b95629752eefec659dc3a5ca68a3cccd0f5f4a0cb84884eaab2b27c861db1": {
    "Name": "infraTest_nextcloud-mariadb.1.icovmyqaweew7co5il5tef1kh",
    "EndpointID": "b3902f3fb0e47fe32b54ab21b3a8a972cf3b4dc983780e11b0666872a89cf6fd",
    "MacAddress": "02:42:0a:00:04:08",
    "IPv4Address": "10.0.4.8/24",
    "IPv6Address": ""
},
"690e876df510e2daa6705b613b404db689d93ddeb278a36066273e9e4ea94f09": {
    "Name": "infraTest_adminer.1.twjwt7bditsipwme1f3ksun20",
    "EndpointID": "5b6097bbad23b8ab5dfaa9dc2c16be9f6395ba8fbf31d489667b4880bd0e3c1a",
    "MacAddress": "02:42:0a:00:04:03",
    "IPv4Address": "10.0.4.3/24",
    "IPv6Address": ""
},
"lb-infraTest_infraNet": {
    "Name": "infraTest_infraNet-endpoint",
    "EndpointID": "74fbc070ae07a41ad242d8a8011edd6eaa7f90a37051786471259f9d131e9b54",
    "MacAddress": "02:42:0a:00:04:04",
    "IPv4Address": "10.0.4.4/24",
    "IPv6Address": ""
}

Exec into any other container such as busybox docker exec -it infraTest_busybox.1.zesvs77dm5z91drms72mvu0zo /bin/sh
Ping infraTest_nextcloud.infraTest_infraNet will show that it is looking for 10.0.4.7 instead of the correct 10.0.4.8
This can be repeated with any service in the stack, the IPs that it looks for are always -1 the actual IP. It can also be tested using ping from any container (I have tested busybox and the adminer container that I was using)
Networks can be seen with docker network ls

# docker network ls
NETWORK ID          NAME                 DRIVER              SCOPE
110358e5eaf6        bridge               bridge              local
05f25e23a5b0        docker_gwbridge      bridge              local
07fa0145d282        host                 host                local
ef1v7e8m0f8v        infraTest_infraNet   overlay             swarm
s4enetps5h3q        ingress              overlay             swarm
7585fb8b387e        none                 null                local

See also from an ubuntu container I made for testing

# dig nextcloud-mariadb.infraTest_infraNet

; <<>> DiG 9.16.1-Ubuntu <<>> nextcloud-mariadb.infraTest_infraNet
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 35788
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;nextcloud-mariadb.infraTest_infraNet. IN A

;; ANSWER SECTION:
nextcloud-mariadb.infraTest_infraNet. 600 IN A  10.0.4.7

;; Query time: 0 msec
;; SERVER: 127.0.0.11#53(127.0.0.11)
;; WHEN: Mon Dec 14 00:28:33 PST 2020
;; MSG SIZE  rcvd: 106

and

# nslookup nextcloud-mariadb.infraTest_infraNet
Server:         127.0.0.11
Address:        127.0.0.11#53

Non-authoritative answer:
Name:   nextcloud-mariadb.infraTest_infraNet
Address: 10.0.4.7

System Info:

# docker version
Client: Docker Engine - Community
 Version:           20.10.0
 API version:       1.41
 Go version:        go1.13.15
 Git commit:        7287ab3
 Built:             Tue Dec  8 18:59:40 2020
 OS/Arch:           linux/amd64
 Context:           default
 Experimental:      true

Server: Docker Engine - Community
 Engine:
  Version:          20.10.0
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.13.15
  Git commit:       eeddea2
  Built:            Tue Dec  8 18:57:45 2020
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.4.3
  GitCommit:        269548fa27e0089a8b8278fc4fc781d7f65a939b
 runc:
  Version:          1.0.0-rc92
  GitCommit:        ff819c7e9184c13b7c2607fe6c30ae19403a7aff
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

and

# docker info
Client:
 Context:    default
 Debug Mode: false
 Plugins:
  app: Docker App (Docker Inc., v0.9.1-beta3)
  buildx: Build with BuildKit (Docker Inc., v0.4.2-docker)

Server:
 Containers: 8
  Running: 6
  Paused: 0
  Stopped: 2
 Images: 13
 Server Version: 20.10.0
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Native Overlay Diff: false
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Cgroup Version: 1
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: active
  NodeID: q4sjkufcnz78gmr1hy4vto2hm
  Is Manager: true
  ClusterID: ki7cos41zqioz2hzgtl4lkgk9
  Managers: 1
  Nodes: 1
  Default Address Pool: 10.0.0.0/8  
  SubnetSize: 24
  Data Path Port: 4789
  Orchestration:
   Task History Retention Limit: 5
  Raft:
   Snapshot Interval: 10000
   Number of Old Snapshots to Retain: 0
   Heartbeat Tick: 1
   Election Tick: 10
  Dispatcher:
   Heartbeat Period: 5 seconds
  CA Configuration:
   Expiry Duration: 3 months
   Force Rotate: 0
  Autolock Managers: false
  Root Rotation In Progress: false
  Node Address: 192.168.1.26
  Manager Addresses:
   192.168.1.26:2377
 Runtimes: io.containerd.runc.v2 io.containerd.runtime.v1.linux runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 269548fa27e0089a8b8278fc4fc781d7f65a939b
 runc version: ff819c7e9184c13b7c2607fe6c30ae19403a7aff
 init version: de40ad0
 Security Options:
  apparmor
  seccomp
   Profile: default
 Kernel Version: 5.4.73-1-pve
 Operating System: Ubuntu 20.04.1 LTS
 OSType: linux
 Architecture: x86_64
 CPUs: 6
 Total Memory: 16GiB
 Name: dockerHost
 ID: CIMP:V2AO:ZOX2:ZJEU:HH7K:FZEQ:ITJO:QTHP:NPSN:32D5:FCXY:Y2F4
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false

WARNING: No blkio weight support
WARNING: No blkio weight_device support

laxmanpradhan on Dec 14, 2020

@thaJeztah, thanks for your reply. Yes, you are correct, using the verbose option I see that the IP I’m seeing is the VIP of the service (which is 1 less than the container IPs). So I guess that leads to the question of why is it not able to connect on that IP? Why can’t I ping the service from another container on the same overlay network? Using the adminer example, I try to connect to the database and it gives an error of “Host is unreachable”. Shouldn’t any requests to the service VIP get re-directed to one of the container IPs in that service?

Note: I can connect using this InfraTest_nextcloud-mariadb.1.wztwijuhouvab5m1s4vt0g6xw and 10.0.4.3 but not InfraTest_nextcloud-mariadb which is what I want. Also 10.0.4.2 which is the service VIP also doesn’t work.

# docker network inspect -v InfraTest_infraNet 
[
    {
        "Name": "InfraTest_infraNet",
        "Id": "a0p98pu4ixc18fv4awi315o9k",
        "Created": "2020-12-14T21:12:29.314191832Z",
        "Scope": "swarm",
        "Driver": "overlay",
        "EnableIPv6": false,
        "IPAM": {
            "Driver": "default",
            "Options": null,
            "Config": [
                {
                    "Subnet": "10.0.4.0/24",
                    "Gateway": "10.0.4.1"
                }
            ]
        },
        "Internal": false,
        "Attachable": false,
        "Ingress": false,
        "ConfigFrom": {
            "Network": ""
        },
        "ConfigOnly": false,
        "Containers": {
            "0f818cdd9a4106bf5078e595417fb31120898664a571e5b379df6d79add4db56": {
                "Name": "InfraTest_nextcloud-mariadb.1.wztwijuhouvab5m1s4vt0g6xw",
                "EndpointID": "a8c0a7fff99201e8bf60f9ac0a1c557ecc83442a515362917d53f5785f75de1a",
                "MacAddress": "02:42:0a:00:04:03",
                "IPv4Address": "10.0.4.3/24",
                "IPv6Address": ""
            },
            "64667a9e80de0cca7a8f31b4fbc2097306c96c986d93f20571a7d2829ce3d34b": {
                "Name": "InfraTest_busybox.1.3lrfk3j6owz9ahmz74oqhpmxt",
                "EndpointID": "916e5b1621cbd140c31128f4dc977894e05267a525fc03a79c97a4bd486f587f",
                "MacAddress": "02:42:0a:00:04:08",
                "IPv4Address": "10.0.4.8/24",
                "IPv6Address": ""
            },
            "6e84e46f454c58ab5c095188e9c7d6bc76ba0bf288084226dbaf83f7b8ead857": {
                "Name": "InfraTest_adminer.1.ri5e33zuh2dhy545woqbcxwbc",
                "EndpointID": "858bfb50ae800058e5b5ff6a69f688aaa60a8bf368c55ac82e1466e09d25dcce",
                "MacAddress": "02:42:0a:00:04:06",
                "IPv4Address": "10.0.4.6/24",
                "IPv6Address": ""
            },
            "lb-InfraTest_infraNet": {
                "Name": "InfraTest_infraNet-endpoint",
                "EndpointID": "53fdfefbd4d51110aec01e9edaf32e964121b6083b975d76e78f57f38812e19f",
                "MacAddress": "02:42:0a:00:04:04",
                "IPv4Address": "10.0.4.4/24",
                "IPv6Address": ""
            }
        },
        "Options": {
            "com.docker.network.driver.overlay.vxlanid_list": "4100"
        },
        "Labels": {
            "com.docker.stack.namespace": "InfraTest"
        },
        "Peers": [
            {
                "Name": "4dc98c7e5f08",
                "IP": "192.168.1.26"
            }
        ],
        "Services": {
            "InfraTest_adminer": {
                "VIP": "10.0.4.5",
                "Ports": [],
                "LocalLBIndex": 266,
                "Tasks": [
                    {
                        "Name": "InfraTest_adminer.1.ri5e33zuh2dhy545woqbcxwbc",
                        "EndpointID": "858bfb50ae800058e5b5ff6a69f688aaa60a8bf368c55ac82e1466e09d25dcce",
                        "EndpointIP": "10.0.4.6",
                        "Info": {
                            "Host IP": "192.168.1.26"
                        }
                    }
                ]
            },
            "InfraTest_busybox": {
                "VIP": "10.0.4.7",
                "Ports": [],
                "LocalLBIndex": 267,
                "Tasks": [
                    {
                        "Name": "InfraTest_busybox.1.3lrfk3j6owz9ahmz74oqhpmxt",
                        "EndpointID": "916e5b1621cbd140c31128f4dc977894e05267a525fc03a79c97a4bd486f587f",
                        "EndpointIP": "10.0.4.8",
                        "Info": {
                            "Host IP": "192.168.1.26"
                        }
                    }
                ]
            },
            "InfraTest_nextcloud-mariadb": {
                "VIP": "10.0.4.2",
                "Ports": [],
                "LocalLBIndex": 264,
                "Tasks": [
                    {
                        "Name": "InfraTest_nextcloud-mariadb.1.wztwijuhouvab5m1s4vt0g6xw",
                        "EndpointID": "a8c0a7fff99201e8bf60f9ac0a1c557ecc83442a515362917d53f5785f75de1a",
                        "EndpointIP": "10.0.4.3",
                        "Info": {
                            "Host IP": "192.168.1.26"
                        }
                    }
                ]
            }
        }
    }
]

laxmanpradhan on Dec 14, 2020