moby: Docker swarm randomly stops taking connections
Description
I have 2 swarm managers and one swarm node in a testing/dev environment and every so often at random times, it just stops receiving connections on the ports. The docker service is still running and is not throwing any errors in the error log. If I restart the docker swarm service on the node, it starts everything back up and works again, but only for a few days, then stops. I thought it might be the firewall so I turned it off, but the problem still happens even with the firewall off. Anyone else having this issue?
Steps to reproduce the issue: I am unable to reproduce this, it just happens at random times.
Describe the results you received: I can not connect to any of the open ports or any of the services.
Describe the results you expected: For the service to be running properly.
Additional information you deem important (e.g. issue happens only occasionally):
Output of docker version
:
Client:
Version: 17.03.0-ce
API version: 1.26
Go version: go1.7.5
Git commit: 60ccb22
Built: Thu Feb 23 11:02:43 2017
OS/Arch: linux/amd64
Server:
Version: 17.03.0-ce
API version: 1.26 (minimum version 1.12)
Go version: go1.7.5
Git commit: 60ccb22
Built: Thu Feb 23 11:02:43 2017
OS/Arch: linux/amd64
Experimental: false
Output of docker info
:
Containers: 8
Running: 8
Paused: 0
Stopped: 0
Images: 6
Server Version: 17.03.0-ce
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host macvlan null overlay
Swarm: active
NodeID: 53cwvu71y8c2cso3cg6ojb4fz
Is Manager: false
Node Address: 10.0.1.0
Manager Addresses:
10.0.0.2:2377
10.0.0.3:2377
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 977c511eda0925a723debdc94d09459af49d082a
runc version: a01dafd48bc1c7cc12bdb01206f9fea7dd6feb70
init version: 949e6fa
Security Options:
apparmor
seccomp
Profile: default
Kernel Version: 4.4.0-62-generic
Operating System: Ubuntu 16.04.2 LTS
OSType: linux
Architecture: x86_64
CPUs: 1
Total Memory: 488.4 MiB
Name: sms-swarm-01
ID: EGUA:D6CP:ZERE:BHEU:YUBE:GTK3:VGEU:R3Z4:VVVC:B7IZ:UCIY:3ATE
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
WARNING: No swap limit support
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
Additional environment details (AWS, VirtualBox, physical, etc.): Running Ubuntu 16.04 on Digital Ocean on all nodes.
About this issue
- Original URL
- State: closed
- Created 7 years ago
- Reactions: 7
- Comments: 20 (5 by maintainers)
Just had a similar problem on 17.05.0-ce, build 89658be
In production, one of our application started having connexion timeout issue. The replicas were sometimes flapping because the application healthcheck wasn’t working but all the application dependencies were running great (mysql, redis etc) and container healthcheck was working locally in the containers.
We were able to reproduce the timeout from the docker node with a curl 0.0.0.0:{servicePort}
Impossible from that point to know which container might be failing, we started scaling up and down the containers to reboot them all. Same problem.
So we suspect one of the docker node to have some network issue. We started draining the nodes one at a time and the problem persist after all.
So we finally docker service rm the application service on the swarm manager and redeploy it, problem solved.
Some kind of network glitch in the swarm load balancing layer ? No clue. First time it occured in 2 years.
Unfortunately, I have no logs to provide at all. Nothing of interest in our whole logging stack regarding that issue. All I can tell is the container healthcheck was receiving a timeout, and it was the only application with that issue in the cluster.
¯_(ツ)_/¯
I am having the same issue with docker 17.03 on Ubuntu Azure.
I don’t have swarm mode enabled. Just a single docker node.
An nginx container is running and binds to 80->80 and 443->443
At some point (not sure when, have not looked much into it) I can’t reach the container through the eth0 interface.
Localhost works fine
I actually forgot about this issue. I have not experienced this anymore on the new versions of docker (I have experienced other issues especially with docker 18.09 on boot2docker) so I am closing this.
Same issue here Docker version 17.11.0-ce, build 1caf76c