moby: Unable to remove an unreachable node from swarm cluster
Output of docker version
:
Client:
Version: 1.12.1
API version: 1.24
Go version: go1.6.3
Git commit: 23cf638
Built: Thu Aug 18 05:28:14 2016
OS/Arch: linux/amd64
Server:
Version: 1.12.1
API version: 1.24
Go version: go1.6.3
Git commit: 23cf638
Built: Thu Aug 18 05:28:14 2016
OS/Arch: linux/amd64
Output of docker info
:
Containers: 21
Running: 1
Paused: 0
Stopped: 20
Images: 13
Server Version: 1.12.1
Storage Driver: aufs
Root Dir: /var/lib/docker/aufs
Backing Filesystem: extfs
Dirs: 122
Dirperm1 Supported: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: overlay bridge host null
Swarm: active
NodeID: 0tcthsr3tyi44mg3jkx0d8e9x
Is Manager: true
ClusterID: 0srplt1fljtw87pzzz6zu4glx
Managers: 3
Nodes: 3
Orchestration:
Task History Retention Limit: 5
Raft:
Snapshot Interval: 10000
Heartbeat Tick: 1
Election Tick: 3
Dispatcher:
Heartbeat Period: 5 seconds
CA Configuration:
Expiry Duration: 3 months
Node Address: 10.15.2.39
Runtimes: runc
Default Runtime: runc
Security Options: apparmor seccomp
Kernel Version: 4.2.0-42-generic
Operating System: Ubuntu 15.10
OSType: linux
Architecture: x86_64
CPUs: 6
Total Memory: 5.79 GiB
Name: 489f0ffa-fc3b-4d95-95e6-46e32b771ac6
ID: FCRQ:NTEQ:E7DY:36WF:UTRE:B3YX:OAQI:APVJ:KRHK:QDUN:DP2I:AE65
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
WARNING: No swap limit support
Insecure Registries:
127.0.0.0/8
Additional environment details (AWS, VirtualBox, physical, etc.):
Steps to reproduce the issue:
- Create a swarm cluster
- Stop a manager
- Try to remove the manager with
docker node -rm <id> --force
Describe the results you received:
I have a node that was in the cluster that went down unexpectedly without being able to be recovered. I was trying to remove it before adding a new one but I couldn’t.
$ docker node ls
ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS
0tcthsr3tyi44mg3jkx0d8e9x * 489f0ffa-fc3b-4d95-95e6-46e32b771ac6 Unknown Active Leader
3wguwco187au4jyh9w70ui1bs d2e19ffb-2b92-43ee-bbc1-91cd40d923e3 Unknown Active Reachable
8buja7impqchibmeevepsvtba 255be2fa-ae02-4835-8bce-de8c6cf55ba6 Down Active Unreachable
$ docker node rm 8buja7impqchibmeevepsvtba --force
Error response from daemon: rpc error: code = 4 desc = context deadline exceeded
Describe the results you expected:
The node should be removed.
Additional information you deem important (e.g. issue happens only occasionally):
About this issue
- Original URL
- State: closed
- Created 8 years ago
- Reactions: 4
- Comments: 25 (15 by maintainers)
Doc improvements: https://github.com/docker/docker.github.io/pull/297
I think the most important thing is that people not use
docker node rm --force
to forcefully remove managers. We have safeguards in place to prevent people from losing quorum by running docker commands. However,--force
overrides those safeguards.So what’s the workaround?
I’ve been hit by this when I’ve run out of disk space.
After forcing one of your two manager nodes to leave, your swarm doesn’t have a quorum anymore. This makes it impossible to make further changes. You shouldn’t use
docker swarm --force leave
on a manager because it can break the swarm like this. If you get into this state, you can use--force-new-cluster
to recover.