moby: 1.12 docker service issue: service can't reach another service with internal dns name

Output of docker version:

Client:
 Version:      1.12.0-rc2
 API version:  1.24
 Go version:   go1.6.2
 Git commit:   906eacd
 Built:
 OS/Arch:      linux/amd64
 Experimental: true

Server:
 Version:      1.12.0-rc2
 API version:  1.24
 Go version:   go1.6.2
 Git commit:   906eacd
 Built:
 OS/Arch:      linux/amd64
 Experimental: true

Output of docker info:

Containers: 3
 Running: 2
 Paused: 0
 Stopped: 1
Images: 2
Server Version: 1.12.0-rc2
Storage Driver: devicemapper
 Pool Name: docker-253:1-50373608-pool
 Pool Blocksize: 65.54 kB
 Base Device Size: 10.74 GB
 Backing Filesystem: xfs
 Data file: /dev/loop0
 Metadata file: /dev/loop1
 Data Space Used: 296.7 MB
 Data Space Total: 107.4 GB
 Data Space Available: 41.2 GB
 Metadata Space Used: 1.114 MB
 Metadata Space Total: 2.147 GB
 Metadata Space Available: 2.146 GB
 Thin Pool Minimum Free Space: 10.74 GB
 Udev Sync Supported: true
 Deferred Removal Enabled: false
 Deferred Deletion Enabled: false
 Deferred Deleted Device Count: 0
 Data loop file: /var/lib/docker/devicemapper/devicemapper/data
 WARNING: Usage of loopback devices is strongly discouraged for production use. Either use `--storage-opt dm.thinpooldev` or use `--storage-opt dm.no_warn_on_loop_devices=true` to suppress this warning.
 Metadata loop file: /var/lib/docker/devicemapper/devicemapper/metadata
 Library Version: 1.02.107-RHEL7 (2015-12-01)
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: host bridge overlay null
Swarm: active
 NodeID: 0cvppkel734omitklgpta1l9k
 IsManager: Yes
 Managers: 1
 Nodes: 2
 CACertHash: sha256:4acd2bf14e1745e0b8183e8fb8831509b715439d32e1bff992c020e2808ba266
Runtimes: default
Default Runtime: default
Security Options: seccomp
Kernel Version: 3.10.0-327.13.1.el7.x86_64
Operating System: CentOS Linux 7 (Core)
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 3.703 GiB
Name: peti-test-1
ID: CFD3:Y722:6FPU:6J7T:SM6C:D5X4:YLBS:PDAR:XSON:ZWWR:OCEE:ZW3M
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
WARNING: bridge-nf-call-ip6tables is disabled
Experimental: true
Insecure Registries:
 127.0.0.0/8

Steps to reproduce the issue:

  1. machine-1: docker swarm init
  2. machine-2: docker swarm join machine-1:2377
  3. machine-1: docker network -d overlay test_network
  4. machine-1: docker service create --name first_service --network test_network instavote/vote
  5. machine-1: docker service create --name second_service --network test_network instavote/vote
  6. machine-1: docker exec -it $(docker ps -q | awk ‘NR==1{print $1}’) ping first_service OR machine-1: docker exec -it $(docker ps -q | awk ‘NR==1{print $1}’) ping second_service

Describe the results you received: TIMEOUT

Describe the results you expected: successful ping

About this issue

  • Original URL
  • State: closed
  • Created 8 years ago
  • Reactions: 4
  • Comments: 44 (19 by maintainers)

Most upvoted comments

If we decide to not support ping, we need to update the documentation, first of all to explain it’s not supported, but also update the examples in the “networking” docs, because all examples use “ping” to demonstrate how container networking works.

But I have some problem. If the first service is on node1 and the second service is on node2 then it does not works.

Node 1: [root@peti-test-1 centos]# docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 28623fe07d5c nginx:latest "nginx -g 'daemon off" 25 minutes ago Up 25 minutes 80/tcp, 443/tcp third_service.1.caqdk6789e7ic24omeaiu7ff6 fa3d800440db nginx:latest "nginx -g 'daemon off" 26 minutes ago Up 26 minutes 80/tcp, 443/tcp second_service.1.1g0hcrd3a8lv26ua25g3bys3v 7d37c676f8bd nginx:latest "nginx -g 'daemon off" 33 minutes ago Up 33 minutes 80/tcp, 443/tcp test1.1.0vq9zbnpcqshtxtpgs24lbqkk

Node 2: [root@peti-test-2 centos]# docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 6479648a6e9a nginx:latest "nginx -g 'daemon off" 30 minutes ago Up 30 minutes 80/tcp, 443/tcp first_service.1.a9pv25uiecjzsfovxjht9ifbp ed00900d7f33 nginx:latest "nginx -g 'daemon off" 34 minutes ago Up 34 minutes 80/tcp, 443/tcp test2.1.4ux4f5mxjthakt7mbhqzy7av3

If I try on Node 1: [root@peti-test-1 centos]# docker exec -it test1.1.0vq9zbnpcqshtxtpgs24lbqkk bash root@7d37c676f8bd:/# curl first_service curl: (6) Could not resolve host: first_service root@7d37c676f8bd:/# curl second_service

...
<title>Welcome to nginx!</title>
...

I am experiencing a similar issue. From what I can tell the internal DNS name is resolving to the wrong IP address.

Here second_service resolves to 10.0.1.4

docker exec -it 87055a2bf81a ping second_service
PING second_service (10.0.1.4): 56 data bytes

But inspecting the network lists it as 10.0.1.5

docker network inspect test_network
[
    {
        "Name": "test_network",
        "Id": "75sik5dzqkdsz2jq8v8j2oqk1",
        "Scope": "swarm",
        "Driver": "overlay",
        "EnableIPv6": false,
        "IPAM": {
            "Driver": "default",
            "Options": null,
            "Config": [
                {
                    "Subnet": "10.0.1.0/24",
                    "Gateway": "10.0.1.1"
                }
            ]
        },
        "Internal": false,
        "Containers": {
            "3d19ff52bb73c2416429737f0e1ca5148b86ae066a53364cfd30a1e6d2077839": {
                "Name": "second_service.1.9umoi0f7nb6oftjjmgwo6yjsl",
                "EndpointID": "2e8862a4a903ccbb7a970bd1b05909271d1426f7677a42e4d7497165fde60ca6",
                "MacAddress": "02:42:0a:00:01:05",
                "IPv4Address": "10.0.1.5/24",
                "IPv6Address": ""
            }
        },
        "Options": {
            "com.docker.network.driver.overlay.vxlanid_list": "258"
        },
        "Labels": {}
    }
]

Pinging from first_service using the actual IP works.

docker exec -it 87055a2bf81a ping 10.0.1.5
PING 10.0.1.5 (10.0.1.5): 56 data bytes
64 bytes from 10.0.1.5: seq=0 ttl=64 time=13.759 ms

I was coming across a similar issue to @ab666 where the IP was off by 1, and indeed I was finding the VIP on lookup. While this was correct, the VIP was not forwarding a HTTP request on port 80 to the container. Maybe I was doing something wrong here, but my setup had worked before.

I was using docker 1.12.1, upgraded to docker 1.12.2 with no change. System/Service reboot did not change behavior. I am using namespace remapping and LVM with devicemapper on RHEL 7

Anyway, I needed the resolution for Apache. I worked around my problem by prefixing the lookup name with ‘tasks.’ which gives the IP of the containers associated with a service rather than the VIP.

I will be revisiting the service configuration at a later date and will update if I come across something.

@ab666 Yes you should be able to access all the containers in all nodes as long as they are in the same network. One reason this may not be working as intended is when --listen-addr is not configured properly or not configured at all and is causing a problem in your env(like you have multiple host nics)

@thaJeztah, I had a similar issue as described in the OP and setting --listen-addr resolved it, what reason could there be that it wouldn’t work without specifying since it defaults to 0.0.0.0 by default? This was on docker 1.12.6.

@alon-totango If you are testing in virtual-box or any host which has multiple host nics please make sure you initialized swarm by using a proper IP address as the --listen-addr option. If not please take a look at docker logs for any errors.

@sodre90 Can you try to reproduce this issue using curl or something different than ping? Thanks