moby: Cannot remove network due to active endpoint, but cannot stop/remove containers

Output of docker version:

Client:
 Version:      1.11.1
 API version:  1.23
 Go version:   go1.5.4
 Git commit:   5604cbe
 Built:        Tue Apr 26 23:30:23 2016
 OS/Arch:      linux/amd64

Server:
 Version:      1.11.1
 API version:  1.23
 Go version:   go1.5.4
 Git commit:   5604cbe
 Built:        Tue Apr 26 23:30:23 2016
 OS/Arch:      linux/amd64

Output of docker info:

Containers: 15
 Running: 13
 Paused: 0
 Stopped: 2
Images: 215
Server Version: 1.11.1
Storage Driver: aufs
 Root Dir: /var/lib/docker/aufs
 Backing Filesystem: extfs
 Dirs: 248
 Dirperm1 Supported: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins: 
 Volume: local
 Network: bridge null host overlay
Kernel Version: 4.4.0-22-generic
Operating System: Ubuntu 16.04 LTS
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 7.686 GiB
Docker Root Dir: /var/lib/docker
Debug mode (client): false
Debug mode (server): false
Registry: https://index.docker.io/v1/
WARNING: No swap limit support
Cluster store: consul://xxx
Cluster advertise: yyy

I am trying to delete a network with docker network rm <network>, but it complains with

Error response from daemon: network xxx_default has active endpoints

Indeed, when I run docker inspect xxx_default I got:

"Containers": {
            "ep-3dd9d8a572c1bfa877da875f3f0640dba9fe0bdf7ff6090a2171dcbebc926b55": {
                "Name": "release_diyaserver_1",
                "EndpointID": "3dd9d8a572c1bfa877da875f3f0640dba9fe0bdf7ff6090a2171dcbebc926b55",
                "MacAddress": "02:42:0a:00:03:04",
                "IPv4Address": "10.0.3.4/24",
                "IPv6Address": ""
            },
            "ep-da1587e9a9fed7d767d79e1ff724a6f6afe56126dae097d9967a9196022ad103": {
                "Name": "release_server-postgresql_1",
                "EndpointID": "da1587e9a9fed7d767d79e1ff724a6f6afe56126dae097d9967a9196022ad103",
                "MacAddress": "02:42:0a:00:03:03",
                "IPv4Address": "10.0.3.3/24",
                "IPv6Address": ""
            }
        }

But when I try to docker stop/rm any of these two containers (either by name or ID) I got:

Error response from daemon: No such container: release_diyaserver_1

So basically I’m stuck with a useless network, which I can’t rm, and this is a real problem because I need to recreate the containers having that same name, but it complains with I try to recreate them.

Is there a way I can get out of this?

It’s overlay networks, and I run consul as at the KV store. There is only one consul node, on the same host (because I don’t need the multi host network know)

Thx in advance.

About this issue

  • Original URL
  • State: closed
  • Created 8 years ago
  • Reactions: 33
  • Comments: 66 (15 by maintainers)

Most upvoted comments

Can you try using --force to disconnect the container?

docker network disconnect --force <network> release_diyaserver_1

If I wanted to fix issues by restarting things, I’d have stayed on Microsoft Windows.

When no command works then do sudo service docker restart your problem will be solved

@michaellenhart use docker network disconnect -f [network] [container name|id] to disconnect a not exist container from network.

Restart docker daemon seemed to work for me as well and Im on 17.04.0-ce-rc1

Here, docker network prune did not remove the network because it still thought there were containers, due to the remaining endpoints, however there weren’t, and these endpioints are completely invalid stale hanging markers.

Like I said here -> https://github.com/moby/moby/issues/17217 it still happens on Docker version 18.06.1-ce, build e68fc7a, I think one of these tickets might be a duplicate of the other (or was that already decided?)

It’s as above - needs a network disconnect with force, with the name from the docker inspect call of the network - IDs won’t be found.

I had the same issue with the swarm running on top of etcd, but was able to recover from it. Issue might have been caused by reboot of the CoreOS box.

Recovery procedure was: find trouble network endpoint in etcd and delete it. Here is what I did:

  • Using docker network inspect <network> find trouble endpoint and it’s ID:
"ep-abaeb8d676bb077553d254b563931bfae0d38275d9aaf2888298ea34a49d0bb3": {
             "Name": "topbeat_[...]",
              "EndpointID": "abaeb8d676bb077553d254b563931bfae0d38275d9aaf2888298ea34a49d0bb3",
             "MacAddress": "02:42:0a:00:00:1f",
             "IPv4Address": "10.0.0.31/24",
             "IPv6Address": ""
          },
  • Use etcdctl ls --recursive /docker/network to find endpoint in etcd and delete it:
etcdctl rm /docker/network/v1.0/endpoint/[...]/abaeb8d676bb077553d254b563931bfae0d38275d9aaf2888298ea34a49d0bb3

Hi again, I have some “news”. I keep getting this error and this has become a real problem since once of our production servers was shutdown, and restarted, all (docker) systems were unable to restart due to this error.

I have dug deeper into the problem and I think I have found a way to reproduce the error (at least I have been doing this three or four times and it failed every time, so I suppose this counts as reproducible!).

NOTE: I previously thought this was a problem due to the fact that I was running a single-node consul cluster and this node was itself a docker on the host I was making crash. But I have successfully ruled that out: I have created the consul cluster on a remote server, and the consul node that I run on my docker host is only a consul client that connects to the remote consul server.
So I’m going to show the steps to reproduce the error with a remote server, but I’m confident it will be the same if you don’t have a remote server: you just have to create the consul server on the same host—this was the way I was before.

So, first versions and stuff:

$ docker info

Containers: 3
 Running: 1
 Paused: 0
 Stopped: 2
Images: 111
Server Version: 1.11.2
Storage Driver: aufs
 Root Dir: /var/lib/docker/aufs
 Backing Filesystem: extfs
 Dirs: 143
 Dirperm1 Supported: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins: 
 Volume: local
 Network: overlay bridge null host
Kernel Version: 4.4.0-31-generic
Operating System: Ubuntu 16.04.1 LTS
OSType: linux
Architecture: x86_64
CPUs: 8
Total Memory: 7.616 GiB
Name: nschoe-PC
Docker Root Dir: /var/lib/docker
Debug mode (client): false
Debug mode (server): false
Registry: https://index.docker.io/v1/
WARNING: No swap limit support
Cluster store: consul://127.0.0.1:8500
$ docker version

Client:
 Version:      1.11.2
 API version:  1.23
 Go version:   go1.5.4
 Git commit:   b9f10c9
 Built:        Wed Jun  1 22:00:43 2016
 OS/Arch:      linux/amd64

Server:
 Version:      1.11.2
 API version:  1.23
 Go version:   go1.5.4
 Git commit:   b9f10c9
 Built:        Wed Jun  1 22:00:43 2016
 OS/Arch:      linux/amd64

I’m running on Ubuntu 16.04.

Steps to reproduce the error

# On my remote server (same docker version), I create a consul server inside a docker container. Here is the docker-compose.yml file:

version: '2'
services:
    consul:
        image: consul
        hostname: "node-remote"
        command: "consul agent -server -bootstrap-expect 1 -data-dir /consul/data -bind 192.168.0.1"
        volumes:
            - consul-kv-store:/consul/data
        network_mode: "host"
        restart: always

volumes:
    consul-kv-store:
        driver: local

So nothing fancy: I simply create a consul node server, which bootstraps itself (because only 1 server is needed). Just in case you’re wondering about 192.168.0.1, it’s because I’ve set up a OpenVPN tunnel between the remote and my computer, and it’s its interface.

# On my local computer, I start a consul client that connects to this remote server. The compose file is:

version: '2'
services:
    consul:
        image: consul
        hostname: "node-nschoePC"
        command: "consul agent -data-dir /consul/data -bind 192.168.0.6 -join 192.168.0.1"
        volumes:
            - consul-kv-store:/consul/data
        network_mode: "host"
        restart: always

volumes:
    consul-kv-store:
        driver: local

Very similar: simply a consul client.

# My docker host daemon is configured with such (systemd service file:)

[Unit]
Description=Docker Application Container Engine (insecure registry and consul)
After=network-online.target docker.socket openvpn.service

[Service]
TasksMax=infinity
ExecStart=
ExecStart=/usr/bin/docker daemon -H fd:// --cluster-store=consul://127.0.0.1:8500     

Nothing fancy: I set the TaskMax to infinity because usually I need to create a big number of stacks and it quickly reaches the max default number.
The interesting line is the --cluster-store=consul://127.0.0.1:8500 which instructs the docker daemon to contact the consul cluster. This is the address of our dockerized consul client.
Note that we have After=openvpn.service to make sure docker waits for the VPN tunnel to be effective before trying to start and reach the consul server.

# Now the containers stuff

First create some overlay networks:

docker network create --driver overlay network1
docker network create --driver overlay network2

Check that there have been created (but we did not have any error so…)

docker network ls
NETWORK ID          NAME                DRIVER
ce53302f8b87        bridge              bridge               
03d2e9971ea8        docker_gwbridge     bridge              
96bdb8ad89f9        host                host                
d15fa9fe60e1        network1            overlay             
4cbcac116962        network2            overlay             
25f20e6716fd        none                null   

Second, create containers using the overlay network:

docker run -d -t --name cont1 --net network1 --restart=always ubuntu bash
docker run -d -t --name cont2 --net network2 --restart=always ubuntu bash

Just two containers, each one using an overlay network. The -t option and the bash command keeps them running. Note the use of --restart=always

Last, issue a machine reboot: reboot.

Upon machine reboot,

docker ps

NAMES               STATUS
consul_consul_1     2 minutes

So my consul container restarted correctly: but now my two containers cont1 and cont2. When I query with the -a option:

docker ps -a

CONTAINER ID        IMAGE               COMMAND                  CREATED             STATUS                       PORTS               NAMES
5c1c9354b3dd        ubuntu              "bash"                   2 minutes ago       Exited (128) 2 minutes ago                       cont1
c14fd5737e10        ubuntu              "bash"                   3 minutes ago       Exited (128) 2 minutes ago                       cont2
48c31a4f9a74        consul              "docker-entrypoint.sh"   48 minutes ago      Up About a minute                                consul_consul_1

So both of my containers have exited with return code 128 (I’m not sure what this means, btw).

When trying to see the logs, nothing seems anormal:

docker logs cont1

root@5c1c9354b3dd:/# exit

It seems to have exited gracefully on host restart. But I have the errors that I mention in my first post: I cannot start the container because I’ve got the “network already has endpoint with name cont1”, then I try deleting the container, and disconnecting from the network. but then I’ve got the error saying there is not such container.

The only solution here is to log on the remote server running the consul server (which has not crashed of course), run docker-compose down -v to delete the persistent volume, and restart the consul.

But there are many things that I don’t understand: why won’t the containers automatically restart upon reboot? They have been created with --restart-always, and besides, the container running the consul client does reboot! Why note cont1 and cont2?
And then, why is there this broken state? I previously thought it was due to the fact that the docker daemon was shutdown before it could commit the changes to it’s consul server—since it was itself a docker. But now the server is outside, so it seems it still doesn’t have a chance to commit the changes?

Does this mean that running overlay network makes docker installations not reliable and especially not reliable to host crash?

What worries me too, is that it was a graceful shutdown, run with reboot, what will it be when the kernel panics, or the machine crashes so hards that it has to restart?

Reading the messages again I see @aboch suggested that it might be the --cluster-advertise parameter. I am re-running my tests right now and will keep you up to date, but with this new intel, do you guys see anything that might be the problem?

Result of docker version:

Client:
 Version:	18.03.0-ce
 API version:	1.30 (downgraded from 1.37)
 Go version:	go1.9.4
 Git commit:	0520e24
 Built:	Wed Mar 21 23:06:22 2018
 OS/Arch:	darwin/amd64
 Experimental:	false
 Orchestrator:	swarm

Server:
 Engine:
  Version:	ucp/2.2.5
  API version:	1.30 (minimum version 1.20)
  Go version:	go1.8.3
  Git commit:	42d28d140
  Built:	Wed Jan 17 04:44:14 UTC 2018
  OS/Arch:	linux/amd64
  Experimental:	false

We also experienced this problem. When we tried to remove our network with docker network rm <network_id> we got this:

[root@server centos]# docker network rm y9ru2bnofd7y
Error response from daemon: network myapp-prod_myapp-prod id y9ru2bnofd7ytdr8kjjm7a01v has active endpoints

When we inspect with docker network inspect <network_id> we get:

[root@server centos]# docker network inspect y9ru2bnofd7y
[
    {
        "Name": "myapp-prod_myapp-prod",
        "Id": "y9ru2bnofd7ytdr8kjjm7a01v",
        "Created": "2018-04-04T05:27:15.897789997Z",
        "Scope": "swarm",
        "Driver": "overlay",
        "EnableIPv6": false,
        "IPAM": {
            "Driver": "default",
            "Options": null,
            "Config": [
                {
                    "Subnet": "10.0.12.0/24",
                    "Gateway": "10.0.12.1"
                }
            ]
        },
        "Internal": false,
        "Attachable": false,
        "Ingress": false,
        "ConfigFrom": {
            "Network": ""
        },
        "ConfigOnly": false,
        "Containers": {
            "f8e6a5676b73dc649f323deb72b0a2691f05e4f5f146815b5eb92bd099fdb90e": {
                "Name": "myapp-prod_app-extranet.1.r4elcwsxnlpu4kscrbn0h3zxw",
                "EndpointID": "f1d31c6c85a23cb6b1d1f59b4890f3bdc5ebdffe114cfa0593828c3ec3dc296e",
                "MacAddress": "02:42:0a:00:0c:90",
                "IPv4Address": "10.0.12.144/24",
                "IPv6Address": ""
            }
        },
        "Options": {
            "com.docker.network.driver.overlay.vxlanid_list": "4142"
        },
        "Labels": {
            "com.docker.stack.namespace": "myapp-prod",
            "com.docker.ucp.access.label": "/Shared/Private/deploy",
            "com.docker.ucp.collection": "21694651-5a5d-4f93-8d39-3c74807ea70d",
            "com.docker.ucp.collection.21694651-5a5d-4f93-8d39-3c74807ea70d": "true",
            "com.docker.ucp.collection.private": "true",
            "com.docker.ucp.collection.root": "true",
            "com.docker.ucp.collection.shared": "true",
            "com.docker.ucp.collection.swarm": "true"
        },
        "Peers": [
            {
                "Name": "ip-xxx-xxx-xxx-xxx.ec2.internal-id",
                "IP": "xxx.xxx.xxx.xxx"
            }
        ]
    }
]

So, according to the network there is still a container out there. First, we look for the container with docker ps -a | grep myapp and it does not exist:

[root@server centos]# docker ps -a | grep myapp
319d03200304        dtr.myregistry.io/dev/logistics:latest                  "./run-gunicorn.sh"      2 hours ago         Up 2 hours              8000/tcp                  myapp-dev_app-logistics.1.6zqzfk011b41gai80urbaao4d
ad6565232e69        dtr.myregistry.io/dev/proxy:latest                      "/bin/sh -c 'servi..."   6 hours ago         Up 6 hours              80/tcp                    myapp-dev_proxy.1.l598rw0nw4gg7b5qz0nb9j0vg
a7ee62618f24        dtr.myregistry.io/dev/redis:4.0.2                       "docker-entrypoint..."   9 hours ago         Up 9 hours              6379/tcp                  myapp-dev_app-logistics-redis.1.vqo8czmx2kbrhdeejh5153c5a
b0601d50ae79        dtr.myregistry.io/dev/user-api:dev-7ae88c6              "dotnet user-a..."      14 hours ago         Up 14 hours             8000/tcp                  user-api-dev_user-api.1.0ymlai010ttj6xtto2xsyqoxu
267de3e27f65        dtr.myregistry.io/dev/user-api:prod-11e54b9             "dotnet user-a..."      21 hours ago         Up 21 hours             8000/tcp                  user-api-prod_user-api.1.cmjrap524aojku3240bb5jwjm
d30389e81a7c        dtr.myregistry.io/dev/svc-access:prod-b38c752           "docker-php-entryp..."  21 hours ago         Up 21 hours             80/tcp                    svc-access-prod_svc-dc-access.1.mtw174i8n2ofcuk2racafvsmk
db91df87f58a        dtr.myregistry.io/dev/account-management:dev-917752f    "dotnet myacco..."      21 hours ago         Up 21 hours             8000/tcp                  app-account-management-dev_app-account-management.1.bu2qdvdfgg95q883ncsnin5sn

So we attempt to stop/remove this container we get:

Error response from daemon: No such container: f8e6a5676b73

We were stuck with this network we couldn’t remove. This broke our CI/CD process. Deploying the stack failed because it could not create the myapp-prod_myapp-prod as defined in the stack file since one already existed.

We were able to eventually remove the network after disconnecting the zombie container with docker network disconnect --force myapp-prod_myapp-prod myapp-prod_app-extranet.1.r4elcwsxnlpu4kscrbn0h3zxw. Thanks @thaJeztah for that tip! After that the network easily removed with docker network rm <id>.

While we do have a manual workaround, this is still an irritating issue for us.

I see the same issue on Docker 1.13.1 using a 2-host overlay network with a 3-node Consul 0.7.4 cluster. I can reproduce the issue by forcibly shutting down one of the Docker hosts.

The result is that after I start the host (and the container) I can see the endpoint with docker network inspect however it shows the old container’s ID. docker network disconnect -f doesn’t remove the container, it gives an error message that the endpoint doesn’t exist (it is using the new container ID I assume).

It would be great if container ID could be used in network disconnect and that would not be validated against the containers on the host.

You need to run docker system prune -a -f

Hi, I have the same issue with these versions:

# docker version
Client:
 Version:      1.11.2
 API version:  1.23
 Go version:   go1.5.4
 Git commit:   b9f10c9
 Built:        Wed Jun  1 21:23:11 2016
 OS/Arch:      linux/amd64

Server:
 Version:      1.11.2
 API version:  1.23
 Go version:   go1.5.4
 Git commit:   b9f10c9
 Built:        Wed Jun  1 21:23:11 2016
 OS/Arch:      linux/amd64

and, in a second node: calico/node:v1.1.3 etcdctl version: 3.1.7

Steps to reproduce:

docker network create --driver calico --ipam-driver calico-ipam myNet
docker run -d -t --name myContainer --net myNet --restart=always ubuntu bash
systemctl stop docker
systemctl start docker
docker rm -f myContainer
docker network rm myNet
Error response from daemon: network myNet has active endpoints
docker network disconnect -f myNet myContainer
Error response from daemon: unable to force delete endpoint myContainer: invalid endpoint locator identifier

The main problem, as stated in a previous comment, is that docker doesn’t remove the etcd entries, so my workaround is to delete them by hand, removing the stopped containers and finally, the created network:

NETWORK_TO_DELETE="myNet"
NET_ID=$(docker network inspect --format '{{.ID}}' ${NETWORK_TO_DELETE})
ENDPOINTS_TO_DELETE=$(etcdctl ls /docker/network/v1.0/endpoint/${NET_ID} | sed 's/.*\///g')
for ep in ${ENDPOINTS_TO_DELETE}; do docker rm $ep && etcdctl rm /docker/network/v1.0/endpoint/${NET_ID}/$ep ;done; && etcdctl rm /docker/network/v1.0/endpoint_count/${NET_ID} && etcdctl rm /docker/network/v1.0/network/${NET_ID}

I hope it helps someone… 😃

Hello there, I’m srry for responding so late. Well, I have changed the docker Version to

Client: Version: 17.03.1-ce API version: 1.27 Go version: go1.7.5 Git commit: c6d412e Built: Mon Mar 27 17:14:09 2017 OS/Arch: linux/amd64

Server: Version: 17.03.1-ce API version: 1.27 (minimum version 1.12) Go version: go1.7.5 Git commit: c6d412e Built: Mon Mar 27 17:14:09 2017 OS/Arch: linux/amd64 Experimental: false

Sometimes I get the same error message “Error response from daemon: network lager has active endpoints” by removing an network using command docker network rm 8bde, but I can disconnect active endpoints even if the containers are not “existent” anymore. I use the command

docker network disconnect -f <network> <container name>

Example: docker network inspect lager

Shows all active endpoints even if the containers are not available anymore, I think it’s called zombie containers 😉

[
    {
        "Name": "lager",
        "Id": "8bde2c60085a5bae1989f36dd55ea89767a60690da537ad23b0570adc19ebdfb",
        "Created": "2017-04-04T06:49:13.531850943Z",
        "Scope": "local",
        "Driver": "bridge",
        "EnableIPv6": false,
        "IPAM": {
            "Driver": "default",
            "Options": {},
            "Config": [
                {
                    "Subnet": "172.19.0.0/16",
                    "Gateway": "172.19.0.1"
                }
            ]
        },
        "Internal": false,
        "Attachable": false,
        "Containers": {
            "252cd073f4f84a4e165a662193a62c44a4fa4ba38c36dc5ee640fc5cb94fa728": {
                "Name": "deploy_lager-client_1",
                "EndpointID": "6c2acca22cfc73f3de47a0711ee6000b7909b0adaf1d427bb91beb0592b7dc66",
                "MacAddress": "02:42:ac:13:00:07",
                "IPv4Address": "172.19.0.7/16",
                "IPv6Address": ""
            }, 

If I want to disconnect the Service deploy_lager-client_1 I use docker network disconnect -f 8bde deploy_lager-client_1

Or you can use docker network disconnect -f lager deploy_lager-client_1

You can not use the container id, you must instead use the Container Name. This docker network disconnect -f lager 252cd is NOT working.

If you have removed all active endpoints you can delete the network.

@thaJeztah Hi again, it was not long ^^

The issue happened again:

$ docker network inspect appsapps_default

[
    {
        "Name": "appsapps_default",
        "Id": "3cda39493fb0c42f966e3f8a9d4458b3574716062445a7f469f776dc680fa71d",
        "Scope": "global",
        "Driver": "overlay",
        "EnableIPv6": false,
        "IPAM": {
            "Driver": "default",
            "Options": {},
            "Config": [
                {
                    "Subnet": "10.0.2.0/24",
                    "Gateway": "10.0.2.1/24"
                }
            ]
        },
        "Internal": false,
        "Containers": {
            "83bfe13fa1fcb60f75e64f4df44f59029d2e2471ceac81177d23fa3b0cdd2b79": {
                "Name": "appsapps_diya-apps_1",
                "EndpointID": "9ae34cecd1a0190792c84842e510e9d6e363b37581aa8ccbac8bd2ea32f0966f",
                "MacAddress": "",
                "IPv4Address": "",
                "IPv6Address": ""
            },
            "fe4f82c07613d060f78c15a0e285b115e9cb560862af0077cb2d792993ab6e09": {
                "Name": "appsapps_portal-db_1",
                "EndpointID": "44072162d4bc012f9529ee78d257281b557a98b2336bc344a053283cd1d07e76",
                "MacAddress": "",
                "IPv4Address": "",
                "IPv6Address": ""
            }
        },
        "Options": {},
        "Labels": {}
    }
]

And neither containers exist when I do docker ps -a.

Then I tried this:

$ docker stop appsapps_diya-apps_1

Error response from daemon: No such container: appsapps_diya-apps_1
$ docker rm appsapps_diya-apps_1 

Error response from daemon: No such container: appsapps_diya-apps_1

Then network disconnect:

$ docker network disconnect appsapps_default appsapps_diya-apps_1

Error response from daemon: No such container: appsapps_diya-apps_1

and with the -f:

$ docker network disconnect -f appsapps_default appsapps_diya-apps_1

Error response from daemon: unable to force delete endpoint appsapps_diya-apps_1: invalid endpoint locator identifier

Note: I have tried every above command by replacing the container name with the container ID, I got the same result.
Weird thing: the container ID doesn’t start with ep- this time.

Note that this is an overlay network, and I have only had this problem with overlay networks.

Any idea? Solution?

Thanks @mavenugo for coming here 😃

I’ve noticed the ep- prefix of the container and suspected that indeed, it was something like that.

However, I can confirm that docker network disconnect <network> ep-xxx did not solve the problem, because the daemon responded with no such container: ep-xxx.

And there were, in fact, no other nodes that are part of the overlay: this is a single-host overlay network (I don’t need multi-host yet, but I do need a high number of subnets, which bridge cannot give me yet (see #21776).

For sanity let’s time it happens, I will re-check with the --force option to docker networks disconnect, but I’m 90% sure I tried it and it failed 😕

Thanks for your support!

Typically when you see containers in docker network inspect output with a ep- prefix, that means it can be either of 2 cases -

  • these are stale endpoints left over in the DB. For those cases, docker network disconnect should help.
  • these are remote endpoints seen in other nodes that are part of the overlay network. The only way to clean them up are from that specific host.

We faced a similar issue but with a bridge network that we use for as part of our docker compose. We tried docker-compose down and then docker-compose up which failed and gave this error:

Creating backend … error

ERROR: for backend Cannot start service api: endpoint with name backend already exists in network ironman-composenet

ERROR: for api Cannot start service api: endpoint with name backend already exists in network ironman-composenet ERROR: Encountered errors while bringing up the project.

This is what ironman-composenet looked like:

[
    {
        "Name": "ironman-composenet",
        "Id": "9f4f077859791121cfb8660644ed3d783a10c553d3c1e6c5ff82de5bdab6d8e9",
        "Created": "2021-07-26T16:39:42.121699877+01:00",
        "Scope": "local",
        "Driver": "bridge",
        "EnableIPv6": false,
        "IPAM": {
            "Driver": "default",
            "Options": {},
            "Config": [
                {
                    "Subnet": "172.23.0.0/16",
                    "Gateway": "172.23.0.1"
                }
            ]
        },
        "Internal": false,
        "Attachable": false,
        "Ingress": false,
        "ConfigFrom": {
            "Network": ""
        },
        "ConfigOnly": false,
        "Containers": {
            "ep-043878077b5ec2b71a053b2eed51d529a616d26ca2aaa7857ec84434bf1ceaca": {
                "Name": "backend",
                "EndpointID": "043878077b5ec2b71a053b2eed51d529a616d26ca2aaa7857ec84434bf1ceaca",
                "MacAddress": "02:42:ac:17:00:02",
                "IPv4Address": "172.23.0.2/16",
                "IPv6Address": ""
            }
        },
        "Options": {},
        "Labels": {}
    }
]

I ran the following:

  1. docker container rename backend backend2
  2. docker network disconnect -f ironman-composenet backend
  3. docker container rename backend2 backend

Which all ran fine. We then ran docker-compose down and docker-compose up and it worked as normal.

Be aware there is a different flavour to this issue, where no container actually is attached to the network at all, still the network cannot be deleted #42119 - so people in here, if not watching closely, could run in the other case too.

So I was not seeing this issue until I recently updated to 17.12.0-ce.

We are deploying stacks onto docker swarms, running series of tests against containers in the stack, removing the stack and then immediately deploying a 2nd stack and running set of tests. Then removing and then deploying a 3rd stack. Somewhere (and it is random) between the remove and deploy of stacks we see this issue.

The process is randomly (but frequently) failing to remove the stack (5 out of 6 containers are removed from the stack). Which container fails to be removed is random.

When I try to manually remove the last container, I get the:

Failed to remove network vjbo7hqulyrf1p0uk0ka2nstk: Error response from daemon: network test_default id vjbo7hqulyrf1p0uk0ka2nstk has active endpoints Failed to remove some resources from stack: test

I then try to remove the test_default network manually - with same message regarding “has active endpoints”.

I tried restarting the docker daemon - which hung. I was forced to reboot the system.

This is definitely an issue for us - it is regularly breaking our CI/CD process.

@BSWANG As already stated the --force flag doesn’t work anymore. In my case it is version 1.12, but has worked with 1.10 and (but I’m not sure) 1.11.

Update:

To remove the network in the case that a docker disconnect rm -f <network> <container> results in an error you have to pick the network ID (e.g. zp1hd8cmb9h5i1fkiylvsifag for the mentioned gefahr network. Then go into consul K/V_Store (assuming you use consul and not etcd), navigate to kv/docker/network/v1.0/network/ and kv/docker/network/v1.0/overlay/ and remove the entry with the found ID from both directories. After this, the network should not be listed anymore. I’ve not observed any side effects, but can’t ensure this.