moby: Removing stack that failed to fully deploy leaves behind stray networks

Description

I have a four node swarm on Docker 1.13-rc5. I accidentally deployed a stack from a compose file whose services are all from a private registry, but without the --with-registry-auth option. My services failed to start, so I removed the stack.

While removing the stack, the docker client shows that it is moving the networks defined in the stack (all overlay/swarm in this case), as expected. But, upon listing the networks on the nodes, some of the stack networks have actually been left behind and now appear as overlay/local.

The stack cannot be deployed again until the stray networks are removed from all the nodes.

Steps to reproduce the issue:

Given a Docker Compose file with some networks defined, and services that use a private registry:

  1. Create a Docker Compose file with services whose images are on a private registry, and with some networks.
  2. Deploy a stack using the compose file, but without the --with-registry-auth option.
  3. Remove the stack because the services failed to start due to missing images (they can’t be pulled).
  4. List the networks on the swarm nodes.

Describe the results you received:

Some of the networks that were created as part of the stack deployment are left behind. Their driver is still overly, but their scope has changed from swarm to local.

Describe the results you expected:

The networks that were created as part of the stack deployment should be all removed.

Additional information you deem important (e.g. issue happens only occasionally):

Here’s the Docker Compose file I was trying to deploy, and output showing the problem.

networks:
  converis: {}
  front: {}
  traefik_rms-dev:
    external: true
services:
  converis:
    environment:
      CONVERIS_HOST: converis-db-configs-master.rms-test.ucalgary.ca
      CRA_HOST: converis-db-configs-master.rms-test.ucalgary.ca
    image: docker.ucalgary.ca/tr/converis:5.9.8
    networks:
      converis: null
      front: null
  converis-db:
    environment:
      CONVERIS_HOST: converis-db-configs-master.rms-test.ucalgary.ca
      CRA_HOST: converis-db-configs-master.rms-test.ucalgary.ca
    image: docker.ucalgary.ca/rms/converis-db-configs:master
    networks:
      converis: null
  converis-web:
    environment:
      CAS_LOGIN_URL: http://castestqa.ucalgary.ca/replicant/login
      CAS_VALIDATE_URL: http://castestqa.ucalgary.ca/replicant/ucserviceValidate
      CONVERIS_HOST: converis-db-configs-master.rms-test.ucalgary.ca
      CRA_HOST: converis-db-configs-master.rms-test.ucalgary.ca
    image: docker.ucalgary.ca/tr/converis-web:1.1.0
    networks:
      front: null
      traefik_rms-dev: null
version: '3.0'
volumes: {}
→  ~ docker stack ls
NAME  SERVICES

→  ~ docker network ls
NETWORK ID          NAME                DRIVER              SCOPE
58721ada8da1        bridge              bridge              local
45ac998dbf08        docker_gwbridge     bridge              local
3c2e6ce7fecf        host                host                local
68i730xxf68a        ingress             overlay             swarm
5430cf1700b3        none                null                local
vur2f7vb8eue        traefik_rms-dev     overlay             swarm

→  ~ docker stack deploy --compose-file /Users/kchuang/Downloads/converis-db-configs.yml converis-db-configs-master
Creating network converis-db-configs-master_default
Creating network converis-db-configs-master_converis
Creating network converis-db-configs-master_front
Creating service converis-db-configs-master_converis
Creating service converis-db-configs-master_converis-db
Creating service converis-db-configs-master_converis-web

→  ~ docker network ls
NETWORK ID          NAME                                  DRIVER              SCOPE
58721ada8da1        bridge                                bridge              local
svatalmcadi2        converis-db-configs-master_converis   overlay             swarm
k63biko0t0fm        converis-db-configs-master_default    overlay             swarm
pfkak51i75hr        converis-db-configs-master_front      overlay             swarm
45ac998dbf08        docker_gwbridge                       bridge              local
3c2e6ce7fecf        host                                  host                local
68i730xxf68a        ingress                               overlay             swarm
5430cf1700b3        none                                  null                local
vur2f7vb8eue        traefik_rms-dev                       overlay             swarm

(time passes, services fail to start)

→  ~ docker stack rm converis-db-configs-master
Removing service converis-db-configs-master_converis-db
Removing service converis-db-configs-master_converis-web
Removing service converis-db-configs-master_converis
Removing network converis-db-configs-master_converis
Removing network converis-db-configs-master_front
Removing network converis-db-configs-master_default

→  ~ docker network ls
NETWORK ID          NAME                                  DRIVER              SCOPE
58721ada8da1        bridge                                bridge              local
fwe4wv5bc8md        converis-db-configs-master_converis   overlay             local
ks6pzoguzk3c        converis-db-configs-master_front      overlay             local
45ac998dbf08        docker_gwbridge                       bridge              local
3c2e6ce7fecf        host                                  host                local
68i730xxf68a        ingress                               overlay             swarm
5430cf1700b3        none                                  null                local
vur2f7vb8eue        traefik_rms-dev                       overlay             swarm

→  ~ docker stack deploy --compose-file /Users/kchuang/Downloads/converis-db-configs.yml converis-db-configs-master
Creating network converis-db-configs-master_default
Creating service converis-db-configs-master_converis
Error response from daemon: network converis-db-configs-master_converis not found

Output of docker version:

Client:
 Version:      1.13.0-rc5
 API version:  1.25
 Go version:   go1.7.3
 Git commit:   43cc971
 Built:        Thu Jan  5 00:43:46 2017
 OS/Arch:      linux/amd64

Server:
 Version:      1.13.0-rc5
 API version:  1.25 (minimum version 1.12)
 Go version:   go1.7.3
 Git commit:   43cc971
 Built:        Thu Jan  5 00:43:46 2017
 OS/Arch:      linux/amd64
 Experimental: true

Output of docker info:

Containers: 0
 Running: 0
 Paused: 0
 Stopped: 0
Images: 3
Server Version: 1.13.0-rc5
Storage Driver: devicemapper
 Pool Name: docker-thinpool
 Pool Blocksize: 524.3 kB
 Base Device Size: 10.74 GB
 Backing Filesystem: xfs
 Data file: 
 Metadata file: 
 Data Space Used: 4.406 GB
 Data Space Total: 65.28 GB
 Data Space Available: 60.87 GB
 Metadata Space Used: 1.126 MB
 Metadata Space Total: 683.7 MB
 Metadata Space Available: 682.5 MB
 Thin Pool Minimum Free Space: 6.527 GB
 Udev Sync Supported: true
 Deferred Removal Enabled: true
 Deferred Deletion Enabled: false
 Deferred Deleted Device Count: 0
 Library Version: 1.02.107-RHEL7 (2016-06-09)
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins: 
 Volume: local
 Network: bridge host ipvlan macvlan null overlay
Swarm: active
 NodeID: 3vz5vqb96ab3yh6mrhhbnf5ki
 Is Manager: true
 ClusterID: 9p9qjvvrp991zitpixlm8dpm5
 Managers: 4
 Nodes: 4
 Orchestration:
  Task History Retention Limit: 5
 Raft:
  Snapshot Interval: 10000
  Number of Old Snapshots to Retain: 0
  Heartbeat Tick: 1
  Election Tick: 3
 Dispatcher:
  Heartbeat Period: 5 seconds
 CA Configuration:
  Expiry Duration: 3 months
 Node Address: 10.45.32.40
 Manager Addresses:
  10.45.32.40:2377
  10.45.32.41:2377
  10.45.32.42:2377
  10.45.32.43:2377
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 03e5862ec0d8d3b3f750e19fca3ee367e13c090e
runc version: 51371867a01c467f08af739783b8beafc154c4d7
init version: 949e6fa
Security Options:
 seccomp
  Profile: default
Kernel Version: 3.10.0-327.36.3.el7.x86_64
Operating System: Red Hat Enterprise Linux Server 7.2 (Maipo)
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 15.51 GiB
Name: itpocnode01.ucalgary.ca
ID: OM7K:NNN7:75VJ:I34C:W5ZS:TCYH:ZGPH:SRXP:HINS:7O2W:IZQ4:QIGD
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
WARNING: bridge-nf-call-ip6tables is disabled
Experimental: true
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false

Additional environment details (AWS, VirtualBox, physical, etc.):

This is on a four node swarm with Docker 1.13.0-rc5.

About this issue

  • Original URL
  • State: closed
  • Created 7 years ago
  • Comments: 24 (11 by maintainers)

Most upvoted comments

I can still reproduce this issue on 19.03.13 for Windows containers. Can you please reopen the issue?

Thanks @kinghuang. I will close this issue since it is resolved in 17.05. I still dont know which fix solved the issue though .

Facing the same issue in docker 17.03.1-ce. Intermittently docker stack rm leaves stray overlay network with local scope. This actually results in subsequent containers of other apps also saying “Address already in use”. Looks like docker assigns some ip which is still reserved in the stray network. To resolve this we need to manually disconnect endpoints of the stray network and then remove it. Should we be expecting any fix for this?