moby: Swarm service / overlay breakage - starting container failed: Address already in use

Description

Swarm service / overlay breakage - starting container failed: Address already in use

Also reported by @nickjj who said it prevented a production roll-out of Swarm.

Possibly related: https://github.com/moby/moby/issues/31698

Steps to reproduce the issue:

  1. Deploy a service
  2. Remove service
  3. Deploy same service with same name

Describe the results you received:

"starting container failed: Address already in use"  

(scroll right)

docker service ps --no-trunc=true node_info
ID                          NAME                IMAGE                       NODE                DESIRED STATE       CURRENT STATE           ERROR                                                 PORTS
ig7sqbw8bz554fw2fckqr20gq   node_info.1         alexellis2/faas-node_info   moby                Shutdown            Failed 59 seconds ago   "starting container failed: Address already in use"   

Describe the results you expected:

1/1 replicas.

Additional information you deem important (e.g. issue happens only occasionally):

Output of docker version:

$ docker version
Client:
 Version:      17.06.0-ce
 API version:  1.30
 Go version:   go1.8.3
 Git commit:   02c1d87
 Built:        Fri Jun 23 21:31:53 2017
 OS/Arch:      darwin/amd64

Server:
 Version:      17.06.0-ce
 API version:  1.30 (minimum version 1.12)
 Go version:   go1.8.3
 Git commit:   02c1d87
 Built:        Fri Jun 23 21:51:55 2017
 OS/Arch:      linux/amd64
 Experimental: true

Dockerfile:

https://github.com/alexellis/faas-cli/blob/master/template/python/Dockerfile

(watchdog process binds to port 8080)

Python file:

https://github.com/alexellis/faas-cli/blob/master/sample/url_ping/handler.py

About this issue

  • Original URL
  • State: open
  • Created 7 years ago
  • Reactions: 6
  • Comments: 22 (13 by maintainers)

Most upvoted comments

Any update on this? Running into it quite a lot

So we follow the docker stack deploy CI pipeline to deploy our services to the cluster as opposed to docker service create.

Attempt #1 Engine version 17.06-ce Had run into this very problem on a 1 master/1 worker Swarm cluster. Tried “rescuing” the cluster by manually deleting the stale VIP endpoints. Worked well for a while before going back to how it was before. This cluster had a few services that did not expose their ports. Forcing them to expose their ports did not resolve the problem.

Attempt #2 Upgrading to a 3 master, 2 worker cluster(while also going from a /24 network to a /16 custom overlay network) delayed the problem for a while before it hit us again. This time around the Swarm was loaded with around 48 services. The endpoint mode on the services that did not expose their ports was set to dnsrr(since a few users reported problems with using the default VIP mode). Didn’t help much.

It has been a major roadblock for us. Not really sure how everyone else has been getting their clusters to work. Rebuilt the cluster with 17.06.1-ce today(encountered some problems with IPTables and node communication breakdown while upgrading sequentially - but that is for another day). Will monitor how it pans out and update here accordingly.

Would be more than happy to share relevant logs. 👍