moby: v1.9.1 unable to remove network due to active endpoints
This is the same symptoms of #17555 but I am experiencing it with 1.9.1 and since #17558 was suppose to fix it, I’m assuming this should be a new issue.
I have a Jenkins server running tests for a rails app. The build script sets up all the external dependencies in containers, runs the tests, and then removes all the containers. It was running on a t2.micro (single core) on AWS with no problems. I scaled it up to a t2.medium (2 cores) yesterday and the first time I ran the build, it failed when trying to remove the network, due to still having active endpoints.
The build script is basically cleaning up anything left over from older builds (in case of a failure), and then starts fresh. Here is the relevant portion of the build script:
docker stop ${db} || echo 'db already stopped'
docker rm ${db} || echo 'db already cleaned up'
docker network rm ${net} || echo 'network already cleaned up'
docker network create --driver bridge ${net}
docker run -d --net ${net} --name ${db} postgres
Removing the network doesn’t actually fail the build since the ||
is there, but it is erring with active endpoints
. It actually fails when trying to create the network, since it already exists.
I can see there are no containers connected to the network, but I can’t remove the network. As a work around, I don’t mind using a sleep
between creating the network and running the container, but I’d love to know if there is a way to force remove the zombie network.
$ sudo docker network inspect demo_net
[
{
"Name": "demo_net",
"Id": "ec089055c30c327935772574961f1b25dd2701c4d27ea1e34a70960595bd5f1f",
"Scope": "local",
"Driver": "bridge",
"IPAM": {
"Driver": "default",
"Config": [
{}
]
},
"Containers": {},
"Options": {}
}
]
$ sudo docker network rm demo_net
Error response from daemon: network demo_net has active endpoints
Here’s all the basic docker stuff:
$ sudo docker version
Client:
Version: 1.9.1
API version: 1.21
Go version: go1.4.2
Git commit: a34a1d5
Built: Fri Nov 20 13:12:04 UTC 2015
OS/Arch: linux/amd64
$ sudo docker info
Containers: 8
Images: 133
Server Version: 1.9.1
Storage Driver: aufs
Root Dir: /var/lib/docker/aufs
Backing Filesystem: extfs
Dirs: 149
Dirperm1 Supported: false
Execution Driver: native-0.2
Logging Driver: json-file
Kernel Version: 3.13.0-48-generic
Operating System: Ubuntu 14.04.3 LTS
CPUs: 2
Total Memory: 3.859 GiB
Name: jenkins
ID: UQJY:AGBB:HSJL:6XSW:4KBD:EVAB:W6A3:X7XN:RJRT:JDLO:MJBL:D3ZS
WARNING: No swap limit support
$ uname -a
Linux jenkins 3.13.0-48-generic #80-Ubuntu SMP Thu Mar 12 11:16:15 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
t2.medium AWS instance running an official Ubuntu 14.04 AMI.
About this issue
- Original URL
- State: open
- Created 8 years ago
- Reactions: 4
- Comments: 51 (14 by maintainers)
We fixed a few issues for the bridge driver in 1.11 and we would encourage folks to try the 1.11-RC.
But if the issue is still seen in 1.11-RC for either bridge or overlay driver, pls execute
docker network inspect {network-name}
where there can be containers or stale endpoint still left over. One can disconnect the container/endpoint usingdocker network disconnect -f {network-name} {endpoint-name}
. Once all the containers / endpoints are disconnected, the network can be removed.I solved my problem by restarting docker daemon.
Faced the same issue. Unable to remove docker network. There are no running containers but still I am unable to remove the network.
Please fix, I have this bug with Docker 1.12 too.
@fitz123 have you tried disconnecting with
-f
?Same bug.
Docker version 18.09.0, build 4d60db4
This same issue is still present in Docker version 18.06.1-ce
@thaJeztah
docker network disconnect -f <network-id> <container-name>
works for me, thanks!!I am have been experiencing the same with v1.12.1 and I suspect that the problem was the etcd KV store, as I have been reproducing it quite often with coreos/etcd:v2.0.8. I cannot reproduce it with coreos/etcd:v2.3.7, but in any case I am providing my setup and the steps I had to do in order to reproduce it almost every time (v2.0.8)
2 identical nodes
both are connected to an coreos/etcd:v2.0.8 discovery service.
Steps to reproduce:
Error response from daemon: network test-net has active endpoints
Although inspect would return something likeShell script to reproduce: https://gist.github.com/tgeorgiev/b18a3e0d02b0efec09a7341f6d12ec16
It happened in 1.9.x, it happened in 1.10.x, it happened in 1.11.x and now it’s happening to us in 1.12.0.
1.10 Same problem here. Changing the network count to 0 allowed me to remove my overlay network.
I had the same problem after upgrading to 1.10 and I solved it the same way with Consul:
To remove the zombie networks, I had to umount the entries in /var/run/docker/netns that corresponded to the zombie network. umount the entries with the id as well as entries starting with
lb_
and1-
followed by the id of that zombie network.Then I deleted the folders in /var/run/docker/containerd/ that corresponded to any containers that were not disconnected to this network (in my case, the containers could not be disconnected because the containers didn’t exist).
Then
systemctl restart docker
or your distros equivalent.IMO, this is a pretty dangerous solution, so only use it if you don’t have any other options. The only reason I chose to take the risk was because this happened on a machine I didn’t mind reinstalling if I messed up, and it was important enough for me to be able to access volumes whose name relied on the name of the docker stack. As in, without me doing this, I was not able to access a database that was in a container on this stack because the stack couldn’t be brought up.
That is great news - is there a fix scheduled? We have seen this issue happen about 3x a week. Really very annoying.
On Wed, Mar 14, 2018 at 9:26 AM, antoinetran notifications@github.com wrote:
As a work around, I wrote this bash script to remove any orphaned endpoints from all docker networks.
+1 - this is a pretty big bug, would be great to get some feedback on it
v1.10.3 Same problem here in bridge network.