moby: Swarm service / overlay breakage - starting container failed: Address already in use
Description
Swarm service / overlay breakage - starting container failed: Address already in use
Also reported by @nickjj who said it prevented a production roll-out of Swarm.
Possibly related: https://github.com/moby/moby/issues/31698
Steps to reproduce the issue:
- Deploy a service
- Remove service
- Deploy same service with same name
Describe the results you received:
"starting container failed: Address already in use"
(scroll right)
docker service ps --no-trunc=true node_info
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS
ig7sqbw8bz554fw2fckqr20gq node_info.1 alexellis2/faas-node_info moby Shutdown Failed 59 seconds ago "starting container failed: Address already in use"
Describe the results you expected:
1/1 replicas.
Additional information you deem important (e.g. issue happens only occasionally):
Output of docker version
:
$ docker version
Client:
Version: 17.06.0-ce
API version: 1.30
Go version: go1.8.3
Git commit: 02c1d87
Built: Fri Jun 23 21:31:53 2017
OS/Arch: darwin/amd64
Server:
Version: 17.06.0-ce
API version: 1.30 (minimum version 1.12)
Go version: go1.8.3
Git commit: 02c1d87
Built: Fri Jun 23 21:51:55 2017
OS/Arch: linux/amd64
Experimental: true
Dockerfile:
https://github.com/alexellis/faas-cli/blob/master/template/python/Dockerfile
(watchdog process binds to port 8080)
Python file:
https://github.com/alexellis/faas-cli/blob/master/sample/url_ping/handler.py
About this issue
- Original URL
- State: open
- Created 7 years ago
- Reactions: 6
- Comments: 22 (13 by maintainers)
Any update on this? Running into it quite a lot
So we follow the docker stack deploy CI pipeline to deploy our services to the cluster as opposed to docker service create.
Attempt #1 Engine version 17.06-ce Had run into this very problem on a 1 master/1 worker Swarm cluster. Tried “rescuing” the cluster by manually deleting the stale VIP endpoints. Worked well for a while before going back to how it was before. This cluster had a few services that did not expose their ports. Forcing them to expose their ports did not resolve the problem.
Attempt #2 Upgrading to a 3 master, 2 worker cluster(while also going from a /24 network to a /16 custom overlay network) delayed the problem for a while before it hit us again. This time around the Swarm was loaded with around 48 services. The endpoint mode on the services that did not expose their ports was set to dnsrr(since a few users reported problems with using the default VIP mode). Didn’t help much.
It has been a major roadblock for us. Not really sure how everyone else has been getting their clusters to work. Rebuilt the cluster with 17.06.1-ce today(encountered some problems with IPTables and node communication breakdown while upgrading sequentially - but that is for another day). Will monitor how it pans out and update here accordingly.
Would be more than happy to share relevant logs. 👍