moby: v1.9.1 unable to remove network due to active endpoints

This is the same symptoms of #17555 but I am experiencing it with 1.9.1 and since #17558 was suppose to fix it, I’m assuming this should be a new issue.

I have a Jenkins server running tests for a rails app. The build script sets up all the external dependencies in containers, runs the tests, and then removes all the containers. It was running on a t2.micro (single core) on AWS with no problems. I scaled it up to a t2.medium (2 cores) yesterday and the first time I ran the build, it failed when trying to remove the network, due to still having active endpoints.

The build script is basically cleaning up anything left over from older builds (in case of a failure), and then starts fresh. Here is the relevant portion of the build script:

docker stop ${db} || echo 'db already stopped'
docker rm ${db} || echo 'db already cleaned up'
docker network rm ${net} || echo 'network already cleaned up'

docker network create --driver bridge ${net}
docker run -d --net ${net} --name ${db} postgres

Removing the network doesn’t actually fail the build since the || is there, but it is erring with active endpoints. It actually fails when trying to create the network, since it already exists.

I can see there are no containers connected to the network, but I can’t remove the network. As a work around, I don’t mind using a sleep between creating the network and running the container, but I’d love to know if there is a way to force remove the zombie network.

$ sudo docker network inspect demo_net
[
    {
        "Name": "demo_net",
        "Id": "ec089055c30c327935772574961f1b25dd2701c4d27ea1e34a70960595bd5f1f",
        "Scope": "local",
        "Driver": "bridge",
        "IPAM": {
            "Driver": "default",
            "Config": [
                {}
            ]
        },
        "Containers": {},
        "Options": {}
    }
]
$ sudo docker network rm demo_net
Error response from daemon: network demo_net has active endpoints

Here’s all the basic docker stuff:

$ sudo docker version
Client:
 Version:      1.9.1
 API version:  1.21
 Go version:   go1.4.2
 Git commit:   a34a1d5
 Built:        Fri Nov 20 13:12:04 UTC 2015
 OS/Arch:      linux/amd64

$ sudo docker info
Containers: 8
Images: 133
Server Version: 1.9.1
Storage Driver: aufs
 Root Dir: /var/lib/docker/aufs
 Backing Filesystem: extfs
 Dirs: 149
 Dirperm1 Supported: false
Execution Driver: native-0.2
Logging Driver: json-file
Kernel Version: 3.13.0-48-generic
Operating System: Ubuntu 14.04.3 LTS
CPUs: 2
Total Memory: 3.859 GiB
Name: jenkins
ID: UQJY:AGBB:HSJL:6XSW:4KBD:EVAB:W6A3:X7XN:RJRT:JDLO:MJBL:D3ZS
WARNING: No swap limit support

$ uname -a
Linux jenkins 3.13.0-48-generic #80-Ubuntu SMP Thu Mar 12 11:16:15 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

t2.medium AWS instance running an official Ubuntu 14.04 AMI.

About this issue

  • Original URL
  • State: open
  • Created 8 years ago
  • Reactions: 4
  • Comments: 51 (14 by maintainers)

Most upvoted comments

We fixed a few issues for the bridge driver in 1.11 and we would encourage folks to try the 1.11-RC.

But if the issue is still seen in 1.11-RC for either bridge or overlay driver, pls execute docker network inspect {network-name} where there can be containers or stale endpoint still left over. One can disconnect the container/endpoint using docker network disconnect -f {network-name} {endpoint-name}. Once all the containers / endpoints are disconnected, the network can be removed.

I solved my problem by restarting docker daemon.

Faced the same issue. Unable to remove docker network. There are no running containers but still I am unable to remove the network.

# docker network rm 54f84fde7d37
Error response from daemon: network CFJzQhVL id 54f84fde7d3732777846993f388eda0527939ae65e7e1c45eea76de710bccfd6 has active endpoints
# docker network inspect 54f84fde7d37
[
    {
        "Name": "CFJzQhVL",
        "Id": "54f84fde7d3732777846993f388eda0527939ae65e7e1c45eea76de710bccfd6",
        "Created": "2018-03-22T16:07:15.465038819Z",
        "Scope": "global",
        "Driver": "overlay",
        "EnableIPv6": false,
        "IPAM": {
            "Driver": "default",
            "Options": {},
            "Config": [
                {
                    "Subnet": "172.16.173.128/26"
                }
            ]
        },
        "Internal": false,
        "Attachable": false,
        "Ingress": false,
        "ConfigFrom": {
            "Network": ""
        },
        "ConfigOnly": false,
        "Containers": {},
        "Options": {},
        "Labels": {}
    }
]

$ sudo docker version
Client:
 Version:      17.10.0-ce
 API version:  1.33
 Go version:   go1.8.3
 Git commit:   f4ffd25
 Built:        Tue Oct 17 19:04:05 2017
 OS/Arch:      linux/amd64
# sudo docker info
Containers: 0
 Running: 0
 Paused: 0
 Stopped: 0
Images: 5
Server Version: 17.10.0-ce
Storage Driver: overlay2
 Backing Filesystem: extfs
 Supports d_type: true
 Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: systemd
Plugins:
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 06b9cb35161009dcb7123345749fef02f7cea8e0
runc version: 0351df1c5a66838d0c392b4ac4cf9450de844e2d
init version: 949e6fa
Security Options:
 seccomp
  WARNING: You're not using the default seccomp profile
  Profile: /etc/docker/seccomp.json
Kernel Version: 3.10.0-693.17.1.el7.x86_64
Operating System: CentOS Linux 7 (Core)
OSType: linux
Architecture: x86_64
CPUs: 32
Total Memory: 125.7GiB
Name: xxxxxxx
ID: ASCQ:2GJR:XBMZ:GBLK:FJ3D:BZJ7:SSHS:VV5A:KEWN:JCZU:U5ZY:A6TX
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: true

Please fix, I have this bug with Docker 1.12 too.

@fitz123 have you tried disconnecting with -f?

docker network disconnect -f <network-name> <container-id>

Same bug. Docker version 18.09.0, build 4d60db4

This same issue is still present in Docker version 18.06.1-ce

@thaJeztah docker network disconnect -f <network-id> <container-name> works for me, thanks!!

I am have been experiencing the same with v1.12.1 and I suspect that the problem was the etcd KV store, as I have been reproducing it quite often with coreos/etcd:v2.0.8. I cannot reproduce it with coreos/etcd:v2.3.7, but in any case I am providing my setup and the steps I had to do in order to reproduce it almost every time (v2.0.8)

2 identical nodes

Client:
 Version:      1.12.1
 API version:  1.24
 Go version:   go1.6.3
 Git commit:   f1e1b83
 Built:
 OS/Arch:      linux/amd64

Server:
 Version:      1.12.1
 API version:  1.24
 Go version:   go1.6.3
 Git commit:   f1e1b83
 Built:
 OS/Arch:      linux/amd64

both are connected to an coreos/etcd:v2.0.8 discovery service.

Steps to reproduce:

  1. Create an overlay network on one of the nodes
  2. Create multiple containers on both hosts, connected to this network
  3. Delete the containers in parallel
  4. Try to delete the network
  5. You would get Error response from daemon: network test-net has active endpoints Although inspect would return something like
[
    {
        "Name": "test-net",
        "Id": "fd773ef7d7639cb736b99111f675173ced5a7f3b1401ee8d757266b07549709c",
        "Scope": "global",
        "Driver": "overlay",
        "EnableIPv6": false,
        "IPAM": {
            "Driver": "default",
            "Options": {},
            "Config": [
                {
                    "Subnet": "10.0.26.0/24",
                    "Gateway": "10.0.26.1/24"
                }
            ]
        },
        "Internal": false,
        "Containers": {},
        "Options": {},
        "Labels": {}
    }
]

Shell script to reproduce: https://gist.github.com/tgeorgiev/b18a3e0d02b0efec09a7341f6d12ec16

It happened in 1.9.x, it happened in 1.10.x, it happened in 1.11.x and now it’s happening to us in 1.12.0.

$ docker network rm ub_london
Error response from daemon: Error response from daemon: network ub_london has active endpoints
$ docker network inspect ub_london
[
    {
        "Name": "ub_london",
        "Id": "6ca5f4125ed6093b259aa534dd9a175181c794b0c810e544b7fda5b05bec3f20",
        "Scope": "global",
        "Driver": "overlay",
        "EnableIPv6": false,
        "IPAM": {
            "Driver": "default",
            "Options": null,
            "Config": [
                {
                    "Subnet": "10.0.0.0/24",
                    "Gateway": "10.0.0.1/24"
                }
            ]
        },
        "Internal": false,
        "Containers": {},
        "Options": {},
        "Labels": {}
    }
]
$ docker network rm ub_london
Error response from daemon: Error response from daemon: network ub_london has active endpoints

1.10 Same problem here. Changing the network count to 0 allowed me to remove my overlay network.

I had the same problem after upgrading to 1.10 and I solved it the same way with Consul:

curl -XPUT http://127.0.0.1:8500/v1/kv/docker/network/v1.0/endpoint_count/<network id>/ -d '{"Count":0}'

To remove the zombie networks, I had to umount the entries in /var/run/docker/netns that corresponded to the zombie network. umount the entries with the id as well as entries starting with lb_ and 1- followed by the id of that zombie network.

Then I deleted the folders in /var/run/docker/containerd/ that corresponded to any containers that were not disconnected to this network (in my case, the containers could not be disconnected because the containers didn’t exist).

Then systemctl restart docker or your distros equivalent.

IMO, this is a pretty dangerous solution, so only use it if you don’t have any other options. The only reason I chose to take the risk was because this happened on a machine I didn’t mind reinstalling if I messed up, and it was important enough for me to be able to access volumes whose name relied on the name of the docker stack. As in, without me doing this, I was not able to access a database that was in a container on this stack because the stack couldn’t be brought up.

That is great news - is there a fix scheduled? We have seen this issue happen about 3x a week. Really very annoying.

On Wed, Mar 14, 2018 at 9:26 AM, antoinetran notifications@github.com wrote:

We reproduced again in CentOs 7.4.1708 / docker-ce-17.12.1-ce / swarm image 1.2.8 / zookeeper HA 3.4.11. Had to use the workaround with zookeeper delete endpoint.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/moby/moby/issues/19261#issuecomment-373019588, or mute the thread https://github.com/notifications/unsubscribe-auth/ABW81X7QnAkRZB5eVDy7B-bI5mrs90nVks5teRqRgaJpZM4HDScu .

As a work around, I wrote this bash script to remove any orphaned endpoints from all docker networks.

docker network ls -q | while read networkid; do
        docker network inspect $networkid | jq -r '.[0].Containers | keys[]' | while read endpoint; do
                if [[ $endpoint == "ep-"* ]]
                then
                        container=`docker network inspect $networkid | jq -r '.[0].Containers."'"$endpoint"'".Name'`
                        echo "docker network disconnect -f $networkid $container"
                        docker network disconnect -f $networkid $container
                fi
        done
done

+1 - this is a pretty big bug, would be great to get some feedback on it

v1.10.3 Same problem here in bridge network.