moby: Docker 1.12 RC3: network connectivity problem between the containers in a service

Output of docker version:

root@c910f04x19k03:~# docker version
Client:
 Version:      1.12.0-rc3
 API version:  1.24
 Go version:   go1.6.2
 Git commit:   91e29e8
 Built:        Sat Jul  2 00:38:44 2016
 OS/Arch:      linux/amd64

Server:
 Version:      1.12.0-rc3
 API version:  1.24
 Go version:   go1.6.2
 Git commit:   91e29e8
 Built:        Sat Jul  2 00:38:44 2016
 OS/Arch:      linux/amd64
root@c910f04x19k03:~# 

Output of docker info:

root@c910f04x19k03:~# docker info
Containers: 1
 Running: 1
 Paused: 0
 Stopped: 0
Images: 5
Server Version: 1.12.0-rc3
Storage Driver: aufs
 Root Dir: /var/lib/docker/aufs
 Backing Filesystem: extfs
 Dirs: 21
 Dirperm1 Supported: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: null host overlay bridge
Swarm: active
 NodeID: 07wntcacagsx697lbxwglgabu
 IsManager: Yes
 Managers: 1
 Nodes: 5
 CACertHash: sha256:e6e343bf771b0d6ee561deea4effa386d3c4f73208a091f782dd6bc526fbd0fe
Runtimes: runc
Default Runtime: runc
Security Options: apparmor seccomp
Kernel Version: 4.4.0-22-generic
Operating System: Ubuntu 16.04 LTS
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 3.859 GiB
Name: c910f04x19k03
ID: HF6O:3RRR:UUWB:WRJE:ZEB2:E7JN:WFRE:MEGC:UKNQ:YA55:L5LY:XG75
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
WARNING: No swap limit support
Insecure Registries:
 127.0.0.0/8
root@c910f04x19k03:~# 

Additional environment details (AWS, VirtualBox, physical, etc.): Docker swarm nodes are 5 KVM guests that connects to the linux bridge

root@c910f04x19k03:~# docker node ls
ID                           HOSTNAME       MEMBERSHIP  STATUS  AVAILABILITY  MANAGER STATUS
07wntcacagsx697lbxwglgabu *  c910f04x19k03  Accepted    Ready   Active        Leader
4l4154pxnax251zxhq168ma9t    c910f04x19k06  Accepted    Ready   Active        
4mf6v47xxg5k1ife7j45tx4bx    c910f04x19k07  Accepted    Ready   Active        
8z8ex1v57urkh0jsmbbmlhvgw    c910f04x19k04  Accepted    Ready   Active        
ewv3w07kjncrbi6zmgk1ywa4b    c910f04x19k05  Accepted    Ready   Active        
root@c910f04x19k03:~# 

Docker service tasks:

root@c910f04x19k03:~# docker service tasks httpclient
ID                         NAME          SERVICE     IMAGE                                                   LAST STATE          DESIRED STATE  NODE
5b5aj5a4mtklxodmyje0moypj  httpclient.1  httpclient  liguangcheng/ubuntu-16.04-x86_64-apache2-benchmark-new  Running 12 minutes  Running        c910f04x19k03
7ngcg71fi4tji0tuyia150o19  httpclient.2  httpclient  liguangcheng/ubuntu-16.04-x86_64-apache2-benchmark-new  Running 12 minutes  Running        c910f04x19k07
am57ovhrxatzdbvesbldvtdym  httpclient.3  httpclient  liguangcheng/ubuntu-16.04-x86_64-apache2-benchmark-new  Running 12 minutes  Running        c910f04x19k05
1ov8zznmytamghkg88bm9a6e2  httpclient.4  httpclient  liguangcheng/ubuntu-16.04-x86_64-apache2-benchmark-new  Running 12 minutes  Running        c910f04x19k06
44rag1o41yeyj877auf6y5y30  httpclient.5  httpclient  liguangcheng/ubuntu-16.04-x86_64-apache2-benchmark-new  Running 12 minutes  Running        c910f04x19k04
root@c910f04x19k03:~# 

Steps to reproduce the issue:

1 Setup the swarm with 5 nodes

root@c910f04x19k03:~# docker node ls
ID                           HOSTNAME       MEMBERSHIP  STATUS  AVAILABILITY  MANAGER STATUS
07wntcacagsx697lbxwglgabu *  c910f04x19k03  Accepted    Ready   Active        Leader
4l4154pxnax251zxhq168ma9t    c910f04x19k06  Accepted    Ready   Active        
4mf6v47xxg5k1ife7j45tx4bx    c910f04x19k07  Accepted    Ready   Active        
8z8ex1v57urkh0jsmbbmlhvgw    c910f04x19k04  Accepted    Ready   Active        
ewv3w07kjncrbi6zmgk1ywa4b    c910f04x19k05  Accepted    Ready   Active        
root@c910f04x19k03:~# 

2 Run docker service create --replicas 5 --publish 22 --name httpclient liguangcheng/ubuntu-16.04-x86_64-apache2-benchmark-new to deploy the service with 5 containers

root@c910f04x19k03:~# docker service create --replicas 5  --publish 22 --name httpclient liguangcheng/ubuntu-16.04-x86_64-apache2-benchmark-new
40b52xd74ds9emyginbusuq9r
root@c910f04x19k03:~#
root@c910f04x19k03:~# docker service tasks httpclient
ID                         NAME          SERVICE     IMAGE                                                   LAST STATE          DESIRED STATE  NODE
5b5aj5a4mtklxodmyje0moypj  httpclient.1  httpclient  liguangcheng/ubuntu-16.04-x86_64-apache2-benchmark-new  Running 25 minutes  Running        c910f04x19k03
7ngcg71fi4tji0tuyia150o19  httpclient.2  httpclient  liguangcheng/ubuntu-16.04-x86_64-apache2-benchmark-new  Running 25 minutes  Running        c910f04x19k07
am57ovhrxatzdbvesbldvtdym  httpclient.3  httpclient  liguangcheng/ubuntu-16.04-x86_64-apache2-benchmark-new  Running 25 minutes  Running        c910f04x19k05
1ov8zznmytamghkg88bm9a6e2  httpclient.4  httpclient  liguangcheng/ubuntu-16.04-x86_64-apache2-benchmark-new  Running 25 minutes  Running        c910f04x19k06
44rag1o41yeyj877auf6y5y30  httpclient.5  httpclient  liguangcheng/ubuntu-16.04-x86_64-apache2-benchmark-new  Running 25 minutes  Running        c910f04x19k04
root@c910f04x19k03:~# 

3 Attach to the containers using docker exec -it <container_id> /bin/bash

4 Run fping on one container to test the network connectivity to the other containers The ip addresses of these containers are 10.255.0.4 10.255.0.5 10.255.0.6 10.255.0.7 10.255.0.10; through the fping result, it shows that 10.255.0.4 and 10.255.0.7 could not connect to any other container.

root@0ba8a9bd2ce1:/# fping 10.255.0.4 10.255.0.5 10.255.0.6 10.255.0.7 10.255.0.10
10.255.0.5 is alive
10.255.0.6 is alive
10.255.0.10 is alive
10.255.0.4 is unreachable
10.255.0.7 is unreachable
root@0ba8a9bd2ce1:/# 

Describe the results you received: Some containers on the overlay network could not connect to the other containers that are on the same overlay network.

Describe the results you expected: All the containers on the overlay network should be reachable.

Additional information you deem important (e.g. issue happens only occasionally):

About this issue

  • Original URL
  • State: closed
  • Created 8 years ago
  • Comments: 49 (21 by maintainers)

Most upvoted comments

Also experiencing the exact same issue in the 1.12.1 release. Only recreating the entire cluster solves the issue.

@ghoranyi I missed your point on pinging the VIP. Yes, pinging VIP will not work (IPVS doesnt support ICMP). can you try pinging either tasks.{service-name} or individual container-ip ? Or if your service is exposing a TCP or UDP port, you can try to access the VIP via any L4+ access such as nc or tools like https://github.com/tyru/srvtools