weave: Service Discovery on Docker Swarm not working

Hi there,

I’m trying to get service discovery working with weave-net.

I’m using a docker stack file like this:

version: "3"

services:
  nginx3:
    image: cm6051/nginxcurlping
    ports:
      - 8003:80
    deploy:
      mode: replicated
      replicas: 2

  nginx4:
    image: cm6051/nginxcurlping
    ports:
      - 8004:80
    deploy:
      mode: replicated
      replicas: 2

networks:
  default:
    driver: store/weaveworks/net-plugin:2.4.0

I would expect to be able to ping the service names “nginx3” and “nginx4” from containers in this stack, but it doesn’t work:

root@1e04d376eb0e:/# ping -c 2 nginx3
PING nginx3 (10.0.6.2) 56(84) bytes of data.
From 1e04d376eb0e (10.0.6.7) icmp_seq=1 Destination Host Unreachable
From 1e04d376eb0e (10.0.6.7) icmp_seq=2 Destination Host Unreachable

--- nginx3 ping statistics ---
2 packets transmitted, 0 received, +2 errors, 100% packet loss, time 1009ms
root@1e04d376eb0e:/# ping -c 2 nginx4
PING nginx4 (10.0.6.5) 56(84) bytes of data.
From 1e04d376eb0e (10.0.6.7) icmp_seq=1 Destination Host Unreachable
From 1e04d376eb0e (10.0.6.7) icmp_seq=2 Destination Host Unreachable

--- nginx4 ping statistics ---
2 packets transmitted, 0 received, +2 errors, 100% packet loss, time 1001ms

The error Unable to find load balancing endpoint for network mh6gnsbfiatinqiluv6aterbb occurs - I guess this is a symptom of the problem…

A similar stack file, not using weave-net, is like this:

version: "3"

services:
  nginx1:
    image: cm6051/nginxcurlping
    ports:
      - 8001:80
    deploy:
      mode: replicated
      replicas: 2

  nginx2:
    image: cm6051/nginxcurlping
    ports:
      - 8002:80
    deploy:
      mode: replicated
      replicas: 2

With this one it works OK:

root@d0aa2a463e1c:/# ping -c 2 nginx1
PING nginx1 (10.0.4.4) 56(84) bytes of data.
64 bytes from 10.0.4.4 (10.0.4.4): icmp_seq=1 ttl=64 time=0.085 ms
64 bytes from 10.0.4.4 (10.0.4.4): icmp_seq=2 ttl=64 time=0.075 ms

--- nginx1 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1001ms
rtt min/avg/max/mdev = 0.075/0.080/0.085/0.005 ms
root@d0aa2a463e1c:/# ping -c 2 nginx2
PING nginx2 (10.0.4.7) 56(84) bytes of data.
64 bytes from 10.0.4.7 (10.0.4.7): icmp_seq=1 ttl=64 time=0.070 ms
64 bytes from 10.0.4.7 (10.0.4.7): icmp_seq=2 ttl=64 time=0.067 ms

--- nginx2 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1000ms
rtt min/avg/max/mdev = 0.067/0.068/0.070/0.008 ms

Versions:

$ weave version
root@host1:~# docker plugin ls
ID                  NAME                                DESCRIPTION                   ENABLED
17c5a2fb4ac5        store/weaveworks/net-plugin:2.4.0   Weave Net plugin for Docker   true

root@host1:~# docker version
Client:
 Version:           18.06.0-ce
 API version:       1.38
 Go version:        go1.10.3
 Git commit:        0ffa825
 Built:             Wed Jul 18 19:09:54 2018
 OS/Arch:           linux/amd64
 Experimental:      false

Server:
 Engine:
  Version:          18.06.0-ce
  API version:      1.38 (minimum version 1.12)
  Go version:       go1.10.3
  Git commit:       0ffa825
  Built:            Wed Jul 18 19:07:56 2018
  OS/Arch:          linux/amd64
  Experimental:     false

$ uname -a
root@host1:~# uname -a
Linux host1 4.15.0-32-generic #35-Ubuntu SMP Fri Aug 10 17:58:07 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

$ kubectl version
N/A (Docker Swarm)

Logs:

$ journalctl -u docker.service --no-pager
Aug 20 13:20:41 host1 dockerd[7356]: time="2018-08-20T13:20:41Z" level=error msg="INFO: 2018/08/20 13:20:41.202614 [net] NetworkAllocate mh6gnsbfiatinqiluv6aterbb" plugin=17c5a2fb4ac5d2b7f6096383dd2f8d4a73c9cc974ed3733a8f20a8239aa2c700
Aug 20 13:20:45 host1 dockerd[7356]: time="2018-08-20T13:20:45Z" level=error msg="INFO: 2018/08/20 13:20:45.327510 [net] CreateNetwork mh6gnsbfiatinqiluv6aterbb" plugin=17c5a2fb4ac5d2b7f6096383dd2f8d4a73c9cc974ed3733a8f20a8239aa2c700
Aug 20 13:20:45 host1 dockerd[7356]: time="2018-08-20T13:20:45.482865744Z" level=info msg="No non-localhost DNS nameservers are left in resolv.conf. Using default external servers: [nameserver 8.8.8.8 nameserver 8.8.4.4]"
Aug 20 13:20:45 host1 dockerd[7356]: time="2018-08-20T13:20:45.483019413Z" level=info msg="IPv6 enabled; Adding default IPv6 external servers: [nameserver 2001:4860:4860::8888 nameserver 2001:4860:4860::8844]"
Aug 20 13:20:45 host1 dockerd[7356]: time="2018-08-20T13:20:45Z" level=error msg="INFO: 2018/08/20 13:20:45.564019 [net] CreateEndpoint c29ea1de8935ab24151a6b85cdc3083c29ae05f652ca2b85396721a9c2f2ae00" plugin=17c5a2fb4ac5d2b7f6096383dd2f8d4a73c9cc974ed3733a8f20a8239aa2c700
Aug 20 13:20:45 host1 dockerd[7356]: time="2018-08-20T13:20:45Z" level=error msg="INFO: 2018/08/20 13:20:45.585080 [net] JoinEndpoint mh6gnsbfiatinqiluv6aterbb:c29ea1de8935ab24151a6b85cdc3083c29ae05f652ca2b85396721a9c2f2ae00 to /var/run/docker/netns/f70781a1f9a9" plugin=17c5a2fb4ac5d2b7f6096383dd2f8d4a73c9cc974ed3733a8f20a8239aa2c700
Aug 20 13:20:45 host1 dockerd[7356]: time="2018-08-20T13:20:45Z" level=info msg="shim docker-containerd-shim started" address="/containerd-shim/moby/fa428d276a9353e04c5641a46f1af77409bc286dd41714bfb5ea8f5280449754/shim.sock" debug=false pid=16359
Aug 20 13:20:47 host1 dockerd[7356]: time="2018-08-20T13:20:47.001048409Z" level=error msg="addLBBackend mh6gnsbfiatinqiluv6aterbb/nginxweave_default: Unable to find load balancing endpoint for network mh6gnsbfiatinqiluv6aterbb"
Aug 20 13:20:47 host1 dockerd[7356]: time="2018-08-20T13:20:47.001321749Z" level=error msg="addLBBackend mh6gnsbfiatinqiluv6aterbb/nginxweave_default: Unable to find load balancing endpoint for network mh6gnsbfiatinqiluv6aterbb"
Aug 20 13:20:48 host1 dockerd[7356]: time="2018-08-20T13:20:48.386548234Z" level=info msg="No non-localhost DNS nameservers are left in resolv.conf. Using default external servers: [nameserver 8.8.8.8 nameserver 8.8.4.4]"
Aug 20 13:20:48 host1 dockerd[7356]: time="2018-08-20T13:20:48.387729192Z" level=info msg="IPv6 enabled; Adding default IPv6 external servers: [nameserver 2001:4860:4860::8888 nameserver 2001:4860:4860::8844]"
Aug 20 13:20:48 host1 dockerd[7356]: time="2018-08-20T13:20:48Z" level=error msg="INFO: 2018/08/20 13:20:48.496566 [net] CreateEndpoint 4c175ec7baaeaf0d8aabc50c25ce0ab92f1594b736cd3951d981d7583be402d3" plugin=17c5a2fb4ac5d2b7f6096383dd2f8d4a73c9cc974ed3733a8f20a8239aa2c700
Aug 20 13:20:48 host1 dockerd[7356]: time="2018-08-20T13:20:48Z" level=error msg="INFO: 2018/08/20 13:20:48.511611 [net] JoinEndpoint mh6gnsbfiatinqiluv6aterbb:4c175ec7baaeaf0d8aabc50c25ce0ab92f1594b736cd3951d981d7583be402d3 to /var/run/docker/netns/736ab85021e5" plugin=17c5a2fb4ac5d2b7f6096383dd2f8d4a73c9cc974ed3733a8f20a8239aa2c700
Aug 20 13:20:48 host1 dockerd[7356]: time="2018-08-20T13:20:48Z" level=info msg="shim docker-containerd-shim started" address="/containerd-shim/moby/1e04d376eb0ea6822c4d875b8b7b66c4cc30822e58382cc5671605232110af2d/shim.sock" debug=false pid=16559
Aug 20 13:20:49 host1 dockerd[7356]: time="2018-08-20T13:20:49.722577530Z" level=error msg="addLBBackend mh6gnsbfiatinqiluv6aterbb/nginxweave_default: **Unable to find load balancing endpoint for network mh6gnsbfiatinqiluv6aterbb**"
Aug 20 13:20:49 host1 dockerd[7356]: time="2018-08-20T13:20:49.775719997Z" level=error msg="addLBBackend mh6gnsbfiatinqiluv6aterbb/nginxweave_default: Unable to find load balancing endpoint for network mh6gnsbfiatinqiluv6aterbb"

Network:

root@host1:~# ip route
default via 10.0.2.2 dev enp0s3 proto dhcp src 10.0.2.15 metric 100
10.0.2.0/24 dev enp0s3 proto kernel scope link src 10.0.2.15
10.0.2.2 dev enp0s3 proto dhcp scope link src 10.0.2.15 metric 100
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 linkdown
172.18.0.0/16 dev docker_gwbridge proto kernel scope link src 172.18.0.1
192.168.43.0/24 dev enp0s8 proto kernel scope link src 192.168.43.11
224.0.0.0/4 dev enp0s8 scope link
root@host1:~# ip -4 -o addr
1: lo    inet 127.0.0.1/8 scope host lo\       valid_lft forever preferred_lft forever
2: enp0s3    inet 10.0.2.15/24 brd 10.0.2.255 scope global dynamic enp0s3\       valid_lft 71996sec preferred_lft 71996sec
3: enp0s8    inet 192.168.43.11/24 brd 192.168.43.255 scope global enp0s8\       valid_lft forever preferred_lft forever
4: docker0    inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0\       valid_lft forever preferred_lft forever
9: docker_gwbridge    inet 172.18.0.1/16 brd 172.18.255.255 scope global docker_gwbridge\       valid_lft forever preferred_lft forever
root@host1:~# iptables-save
# Generated by iptables-save v1.6.1 on Mon Aug 20 13:21:42 2018
*mangle
:PREROUTING ACCEPT [106886:232629203]
:INPUT ACCEPT [67747:129705231]
:FORWARD ACCEPT [39139:102923972]
:OUTPUT ACCEPT [55895:10215454]
:POSTROUTING ACCEPT [95034:113139426]
COMMIT
# Completed on Mon Aug 20 13:21:42 2018
# Generated by iptables-save v1.6.1 on Mon Aug 20 13:21:42 2018
*nat
:PREROUTING ACCEPT [5:300]
:INPUT ACCEPT [5:300]
:OUTPUT ACCEPT [5:300]
:POSTROUTING ACCEPT [5:300]
:DOCKER - [0:0]
:DOCKER-INGRESS - [0:0]
:WEAVE - [0:0]
-A PREROUTING -m addrtype --dst-type LOCAL -j DOCKER-INGRESS
-A PREROUTING -m addrtype --dst-type LOCAL -j DOCKER
-A OUTPUT -m addrtype --dst-type LOCAL -j DOCKER-INGRESS
-A OUTPUT ! -d 127.0.0.0/8 -m addrtype --dst-type LOCAL -j DOCKER
-A POSTROUTING -o docker_gwbridge -m addrtype --src-type LOCAL -j MASQUERADE
-A POSTROUTING -s 172.18.0.0/16 ! -o docker_gwbridge -j MASQUERADE
-A POSTROUTING -s 172.17.0.0/16 ! -o docker0 -j MASQUERADE
-A POSTROUTING -j WEAVE
-A DOCKER -i docker_gwbridge -j RETURN
-A DOCKER -i docker0 -j RETURN
-A DOCKER-INGRESS -p tcp -m tcp --dport 8004 -j DNAT --to-destination 172.18.0.2:8004
-A DOCKER-INGRESS -p tcp -m tcp --dport 8003 -j DNAT --to-destination 172.18.0.2:8003
-A DOCKER-INGRESS -j RETURN
COMMIT
# Completed on Mon Aug 20 13:21:42 2018
# Generated by iptables-save v1.6.1 on Mon Aug 20 13:21:42 2018
*filter
:INPUT ACCEPT [409:63110]
:FORWARD DROP [0:0]
:OUTPUT ACCEPT [331:81591]
:DOCKER - [0:0]
:DOCKER-INGRESS - [0:0]
:DOCKER-ISOLATION-STAGE-1 - [0:0]
:DOCKER-ISOLATION-STAGE-2 - [0:0]
:DOCKER-USER - [0:0]
:WEAVE-EXPOSE - [0:0]
-A FORWARD -j DOCKER-USER
-A FORWARD -j DOCKER-INGRESS
-A FORWARD -j DOCKER-ISOLATION-STAGE-1
-A FORWARD -i weave -o weave -j ACCEPT
-A FORWARD -o weave -j WEAVE-EXPOSE
-A FORWARD -i weave ! -o weave -j ACCEPT
-A FORWARD -o weave -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A FORWARD -o docker_gwbridge -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A FORWARD -o docker_gwbridge -j DOCKER
-A FORWARD -i docker_gwbridge ! -o docker_gwbridge -j ACCEPT
-A FORWARD -o docker0 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A FORWARD -o docker0 -j DOCKER
-A FORWARD -i docker0 ! -o docker0 -j ACCEPT
-A FORWARD -i docker0 -o docker0 -j ACCEPT
-A FORWARD -i docker_gwbridge -o docker_gwbridge -j DROP
-A DOCKER-INGRESS -p tcp -m tcp --dport 8004 -j ACCEPT
-A DOCKER-INGRESS -p tcp -m state --state RELATED,ESTABLISHED -m tcp --sport 8004 -j ACCEPT
-A DOCKER-INGRESS -p tcp -m tcp --dport 8003 -j ACCEPT
-A DOCKER-INGRESS -p tcp -m state --state RELATED,ESTABLISHED -m tcp --sport 8003 -j ACCEPT
-A DOCKER-INGRESS -j RETURN
-A DOCKER-ISOLATION-STAGE-1 -i docker_gwbridge ! -o docker_gwbridge -j DOCKER-ISOLATION-STAGE-2
-A DOCKER-ISOLATION-STAGE-1 -i docker0 ! -o docker0 -j DOCKER-ISOLATION-STAGE-2
-A DOCKER-ISOLATION-STAGE-1 -j RETURN
-A DOCKER-ISOLATION-STAGE-2 -o docker_gwbridge -j DROP
-A DOCKER-ISOLATION-STAGE-2 -o docker0 -j DROP
-A DOCKER-ISOLATION-STAGE-2 -j RETURN
-A DOCKER-USER -j RETURN
COMMIT
# Completed on Mon Aug 20 13:21:42 2018

About this issue

  • Original URL
  • State: open
  • Created 6 years ago
  • Reactions: 3
  • Comments: 23 (6 by maintainers)

Most upvoted comments

We had the same problem, seem to be happening on docker version 18.06

It works on 18.03.1-ce

I’m talking about the addresses created by Docker for the purpose of routing requests inside the cluster. The “ingress network” is for routing requests that arrive at the host.

I think we are in the same boat (although the title of this issue should be re-worded).

Effectively, weave is not working in swarm mode at all, yet overlay works fine in it’s place.

We have Docker 18.06.1-ce and launch two stacks wherein a container in each shares the very same network. The only particular networking characteristic we have applied is an alias when the container is attached to the shared network. We do not specify replicas. When I exec in to the containers they can resolve each other but ping reports the destination unreachable:

activemq@graves:/opt/apache-activemq-5.13.4$ ping billing-activemq
PING billing-activemq (10.101.0.6) 56(84) bytes of data.
From graves (10.101.0.9) icmp_seq=1 Destination Host Unreachable
From graves (10.101.0.9) icmp_seq=2 Destination Host Unreachable
From graves (10.101.0.9) icmp_seq=3 Destination Host Unreachable
From graves (10.101.0.9) icmp_seq=4 Destination Host Unreachable
From graves (10.101.0.9) icmp_seq=5 Destination Host Unreachable
From graves (10.101.0.9) icmp_seq=6 Destination Host Unreachable

Here’s the routing table if it helps at all:

root@graves:/opt/apache-activemq-5.13.4# routel
         target            gateway          source    proto    scope    dev tbl
        default         172.18.0.1                                     eth1 
       10.0.9.0 24                       10.0.9.49   kernel     link   eth0 
     10.101.0.0 28                      10.101.0.9   kernel     link ethwe0 
     172.18.0.0 16                     172.18.0.13   kernel     link   eth1 
       10.0.9.0          broadcast       10.0.9.49   kernel     link   eth0 local
      10.0.9.49              local       10.0.9.49   kernel     host   eth0 local
     10.0.9.255          broadcast       10.0.9.49   kernel     link   eth0 local
     10.101.0.0          broadcast      10.101.0.9   kernel     link ethwe0 local
     10.101.0.9              local      10.101.0.9   kernel     host ethwe0 local
    10.101.0.15          broadcast      10.101.0.9   kernel     link ethwe0 local
      127.0.0.0          broadcast       127.0.0.1   kernel     link     lo local
      127.0.0.0 8            local       127.0.0.1   kernel     host     lo local
      127.0.0.1              local       127.0.0.1   kernel     host     lo local
127.255.255.255          broadcast       127.0.0.1   kernel     link     lo local
     172.18.0.0          broadcast     172.18.0.13   kernel     link   eth1 local
    172.18.0.13              local     172.18.0.13   kernel     host   eth1 local
 172.18.255.255          broadcast     172.18.0.13   kernel     link   eth1 local
        default        unreachable                   kernel              lo 
        default        unreachable                   kernel              lo 

Now, if we take down our stack and remove the shared network, then re-create the shared network using the overlay driver, the problem disappears.

One other thing (that I cannot imagine is related) - the documentation for installing the swarm plugin asks us to install weaveworks/net-plugin:latest_release which is not found. If we refer to store/weaveworks/net-plugin:latest_release it does work. Here it is under docker plugin ls:

ID                  NAME                                         DESCRIPTION                          ENABLED
c40c08f82ac5        store/weaveworks/net-plugin:latest_release   Weave Net plugin for Docker          true