moby: Overlay / ingress network routing breaks on service restart
Description Overlay / ingress network routing breaks on service restart. I was able to reproduce this on in docker swarm mode on a swarm with 5 nodes and on a different swarm with 3 nodes, both running Ubuntu 16.04 and docker 1.12.1.
Steps to reproduce the issue:
When deploying a service for the first time for a port, the service is available from all nodes in a swarm
docker service create -p 84:80 --name httpd4 httpd
root@swarm2 ~ # curl http://swarm1:84
<html><body><h1>It works!</h1></body></html>
root@swarm2 ~ # curl http://swarm5:84
<html><body><h1>It works!</h1></body></html>
(works as expected)
However, now when I do
root@swarm2 ~ # docker service rm httpd4
and then
root@swarm2 ~ # docker service create -p 84:80 --name httpd4 httpd
root@swarm2 ~ # docker service ps httpd4
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR
6avk26h1ks9xs7ovywas0klta httpd4.1 httpd swarm5 Running Running about a minute ago
root@swarm2 ~ # curl http://swarm1:84
curl: (7) Failed to connect to swarm1 port 84: Connection timed out
root@swarm2 ~ # curl http://swarm5:84
<html><body><h1>It works!</h1></body></html>
Describe the results you received:
root@swarm2 ~ # curl http://swarm1:84
curl: (7) Failed to connect to swarm1 port 84: Connection timed out
Describe the results you expected:
root@swarm2 ~ # curl http://swarm1:84
<html><body><h1>It works!</h1></body></html>
Additional information you deem important (e.g. issue happens only occasionally):
Output of docker version
:
Client:
Version: 1.12.1
API version: 1.24
Go version: go1.6.3
Git commit: 23cf638
Built: Thu Aug 18 05:33:38 2016
OS/Arch: linux/amd64
Server:
Version: 1.12.1
API version: 1.24
Go version: go1.6.3
Git commit: 23cf638
Built: Thu Aug 18 05:33:38 2016
OS/Arch: linux/amd64
Output of docker info
:
Containers: 3
Running: 2
Paused: 0
Stopped: 1
Images: 5
Server Version: 1.12.1
Storage Driver: aufs
Root Dir: /var/lib/docker/aufs
Backing Filesystem: extfs
Dirs: 43
Dirperm1 Supported: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: overlay bridge null host
Swarm: active
NodeID: dvj6xuikc39a6it98sywz7ckk
Is Manager: true
ClusterID: 389htp3t8ruwg4e1cjzg0pg3b
Managers: 5
Nodes: 5
Orchestration:
Task History Retention Limit: 5
Raft:
Snapshot Interval: 10000
Heartbeat Tick: 1
Election Tick: 3
Dispatcher:
Heartbeat Period: 5 seconds
CA Configuration:
Expiry Duration: 3 months
Node Address: 138.201.138.24
Runtimes: runc
Default Runtime: runc
Security Options: apparmor seccomp
Kernel Version: 4.4.0-36-generic
Operating System: Ubuntu 16.04.1 LTS
OSType: linux
Architecture: x86_64
CPUs: 8
Total Memory: 62.75 GiB
Name: ...
ID: UPSO:UJ57:XQVJ:VU7M:GP7Q:XP5O:VPLR:FEYH:AXHX:ZNWC:WVAJ:UDOV
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
WARNING: No swap limit support
Insecure Registries:
127.0.0.0/8
Additional environment details (AWS, VirtualBox, physical, etc.): 5 physical nodes (all master) in docker swarm mode. I also reproduced this in a swarm mode with 3 other physical nodes. All running Ubuntu 16.04LTS
About this issue
- Original URL
- State: closed
- Created 8 years ago
- Reactions: 10
- Comments: 55 (25 by maintainers)
@dangra #25962 brought the fix to master but it may not be easy to back port that particular change on top of 1.12
@niau Yes I know this but this is also what I want to do. The httpd should be available via port 84 to the outside world. even if I do a curl from another PC using the full domain name, I havbe the same results and also from a webbrowser.
For example, I can easily call http://swarm5.domain.com:84 but not http://swarm1.domain.com:84
However, indeed some nodes rebooted a while ago, so I assume that this problem is connected to #24496