moby: Unable to remove an unreachable node from swarm cluster

Output of docker version:

Client:
 Version:      1.12.1
 API version:  1.24
 Go version:   go1.6.3
 Git commit:   23cf638
 Built:        Thu Aug 18 05:28:14 2016
 OS/Arch:      linux/amd64

Server:
 Version:      1.12.1
 API version:  1.24
 Go version:   go1.6.3
 Git commit:   23cf638
 Built:        Thu Aug 18 05:28:14 2016
 OS/Arch:      linux/amd64

Output of docker info:

Containers: 21
 Running: 1
 Paused: 0
 Stopped: 20
Images: 13
Server Version: 1.12.1
Storage Driver: aufs
 Root Dir: /var/lib/docker/aufs
 Backing Filesystem: extfs
 Dirs: 122
 Dirperm1 Supported: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: overlay bridge host null
Swarm: active
 NodeID: 0tcthsr3tyi44mg3jkx0d8e9x
 Is Manager: true
 ClusterID: 0srplt1fljtw87pzzz6zu4glx
 Managers: 3
 Nodes: 3
 Orchestration:
  Task History Retention Limit: 5
 Raft:
  Snapshot Interval: 10000
  Heartbeat Tick: 1
  Election Tick: 3
 Dispatcher:
  Heartbeat Period: 5 seconds
 CA Configuration:
  Expiry Duration: 3 months
 Node Address: 10.15.2.39
Runtimes: runc
Default Runtime: runc
Security Options: apparmor seccomp
Kernel Version: 4.2.0-42-generic
Operating System: Ubuntu 15.10
OSType: linux
Architecture: x86_64
CPUs: 6
Total Memory: 5.79 GiB
Name: 489f0ffa-fc3b-4d95-95e6-46e32b771ac6
ID: FCRQ:NTEQ:E7DY:36WF:UTRE:B3YX:OAQI:APVJ:KRHK:QDUN:DP2I:AE65
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
WARNING: No swap limit support
Insecure Registries:
 127.0.0.0/8

Additional environment details (AWS, VirtualBox, physical, etc.):

Steps to reproduce the issue:

  1. Create a swarm cluster
  2. Stop a manager
  3. Try to remove the manager with docker node -rm <id> --force

Describe the results you received:

I have a node that was in the cluster that went down unexpectedly without being able to be recovered. I was trying to remove it before adding a new one but I couldn’t.

$ docker node ls
ID                           HOSTNAME                              STATUS   AVAILABILITY  MANAGER STATUS
0tcthsr3tyi44mg3jkx0d8e9x *  489f0ffa-fc3b-4d95-95e6-46e32b771ac6  Unknown  Active        Leader
3wguwco187au4jyh9w70ui1bs    d2e19ffb-2b92-43ee-bbc1-91cd40d923e3  Unknown  Active        Reachable
8buja7impqchibmeevepsvtba    255be2fa-ae02-4835-8bce-de8c6cf55ba6  Down     Active        Unreachable
$ docker node rm 8buja7impqchibmeevepsvtba --force
Error response from daemon: rpc error: code = 4 desc = context deadline exceeded

Describe the results you expected:

The node should be removed.

Additional information you deem important (e.g. issue happens only occasionally):

About this issue

  • Original URL
  • State: closed
  • Created 8 years ago
  • Reactions: 4
  • Comments: 25 (15 by maintainers)

Most upvoted comments

I think the most important thing is that people not use docker node rm --force to forcefully remove managers. We have safeguards in place to prevent people from losing quorum by running docker commands. However, --force overrides those safeguards.

So what’s the workaround?

I’ve been hit by this when I’ve run out of disk space.

docker --version
Docker version 1.12.2-rc3, build cb0ca64

I have the same problem. I created two-node swarm for learning purposes on two Ubuntu VirtialBox instances. I promoted second node to be a manager and then I forced him to leave the swarm by docker swarm --force leave (I know I should do it in other manner but this was done for testing purposes). Now I cannot remove it from the swarm.

After forcing one of your two manager nodes to leave, your swarm doesn’t have a quorum anymore. This makes it impossible to make further changes. You shouldn’t use docker swarm --force leave on a manager because it can break the swarm like this. If you get into this state, you can use --force-new-cluster to recover.