moby: Error response from daemon: rpc error: code = DeadlineExceeded desc = context deadline exceeded
Description
I just want to change the roles of an existing swarm like:
worker2 -> promote to manager
manager1 -> demote to worker
This is due to a planned maintenance with ip-change on manager1, which should be done like
manager1 -> demote to worker -> drain mode -> leave swarm -> change ip -> join swarm -> promote to manager
worker2 -> demote to worker again
Steps to reproduce the issue:
manager1:~# docker node promote worker2
Node worker2 promoted to a manager in the swarm.
worker2:~# docker node ls
ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS
mzqms0uiq2f6t9lqvhghiuqmg manager1 Ready Active Leader
vp4dbt8xefe14rqzej5gpdi2u worker1 Ready Active
20vbax32k3rc5dla7p86kfgku * worker2 Ready Active Reachable
worker2:~# docker node demote manager1 # or just
worker2:~# docker node update --availability drain manager1
Describe the results you received:
Error response from daemon: rpc error: code = DeadlineExceeded desc = context deadline exceeded
Describe the results you expected:
Manager manager1 demoted in the swarm.
Additional information you deem important (e.g. issue happens only occasionally): Swarm has been running for half a year.
Output of docker version
:
# docker version
Client:
Version: 17.12.1-ce
API version: 1.35
Go version: go1.9.4
Git commit: 7390fc6
Built: Tue Feb 27 22:17:40 2018
OS/Arch: linux/amd64
Server:
Engine:
Version: 17.12.1-ce
API version: 1.35 (minimum version 1.12)
Go version: go1.9.4
Git commit: 7390fc6
Built: Tue Feb 27 22:16:13 2018
OS/Arch: linux/amd64
Experimental: false
Output of docker info
:
docker info
Containers: 22
Running: 0
Paused: 0
Stopped: 22
Images: 8
Server Version: 17.12.1-ce
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: active
NodeID: mzqms0uiq2f6t9lqvhghiuqmg
Is Manager: false
Node Address: 10.47.0.2
Manager Addresses:
10.47.0.4:2377
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 9b55aab90508bd389d7654c4baf173a981477d55
runc version: 9f9c96235cc97674e935002fc3d78361b696a69e
init version: 949e6fa
Security Options:
apparmor
seccomp
Profile: default
Kernel Version: 4.13.0-36-generic
Operating System: Ubuntu 16.04.4 LTS
OSType: linux
Architecture: x86_64
CPUs: 1
Total Memory: 15.67GiB
Name: manager1
ID: HFFB:LBVB:4TSL:DRVP:JXMR:WZXI:QEDA:N3WP:Z7QL:WAPG:OPVZ:BZLQ
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
WARNING: No swap limit support
Additional environment details (AWS, VirtualBox, physical, etc.):
Ubuntu-Machines running on VMWare.
manager1:~# cat /etc/issue
Ubuntu 16.04.4 LTS \n \l
manager1:~# uname -a
Linux manager1 4.13.0-36-generic #40~16.04.1-Ubuntu SMP Fri Feb 16 23:25:58 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Reactions: 8
- Comments: 37 (5 by maintainers)
Same with 18.06.0-ce
Same here on a manager using
18.03.1-ce
. Get this message withdocker node ls
,docker info
… From the other managers, it can be demoted, promoted, removed from the swarm and re-added without any apparent issue. But when I log into that manager it’s totally broken… 🤨 Tried to reboot, recreate the instance, no avail. So it doesn’t seem related to the instance itself but rather to the swarm.Just wanted to say since I’ve been running Swarm on
18.06.1-ce
I’ve been seeing this error everyday and all the time. I never understood what it meant, but from general feeling its related to network issues (e.g. swarm ports not reachable, node is offline).One thing I do know for certain is that this error happens when one of the Swarm node is on a Wi-Fi connection on a Type-1 Hypervisor (namely Hyper-V), where , so that any change of state (e.g.
docker stack deploy
,docker node rm
) of the Swarm causes the entire Swarm to ‘hang’ for about 1 minute. See here for an explanation. My issue might be related.Just take a look at my logs yesterday when I was doing some
docker stack rm
anddocker stack deploy
(not evendocker node rm
ordocker swarm join
):and more details on here on the last minute:
10.0.0.1
here is an actual manager node that was offline on a 3 node Swarm. All 3 nodes were managers.My Swarm runs well. All until the next time do a
docker stack rm
, then the frustration returns.I’m running into this error on a production swarm.
The existing manager node is running docker 20.10.7 I added four new worker nodes running docker 24.0.6
All was running fine.
Today I added a new node running 24.0.6 in order to replace the old manager node. The node joined the swarm as a “Reachable” manager successfully. But while trying to manager services we run into this error (Error response from daemon: rpc error: code = DeadlineExceeded desc = context deadline exceeded)
So I decided to remove the newly added manager node :
docker node demote <last-added-manager-node>
and I still run into this error. I get the same error if I try to demote the actual leader node.If I try to leave the swarm using
docker swarm leave
I get this error message :I really fear using this command in this production cluster.
Can you tell me what commands to run on which node to fix this problem ?
Not related to memory
Hello,
I confirmed this issue in 18.09.9 and maybe 19.03.12
I have the issue actually and 2 of 3 manager can reach actual manager leader.
I have got the same case one month before on non prod environment.
I am actually stuck. This is not reproducible on fresh stack and new machines.
OS No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 16.04.3 LTS Release: 16.04 Codename: xenial
Kernel 4.15.0-1060
Message Error response from daemon: Timeout was reached before node joined. The attempt to join the swarm will continue in the background. Use the “docker info” command to see the current swarm status of your node.
REFS https://forums.docker.com/t/docker-19-03-12-the-swarm-does-not-have-a-leader-aferter-swarm-upgrade/98579 https://stackoverflow.com/questions/63843832/docker-19-03-12-the-swarm-does-not-have-a-leader-aferter-swarm-upgrade
I use some strategy but without success.
https://github.com/moby/moby/issues/34384#:~:text=demote master …-,new-server%23%20docker%20node%20ls%20Error%20response%20from%20daemon%3A,too%20few%20managers%20are%20online.&text=have%20a%20leader.-,It%27s%20possible%20that%20too%20few%20managers%20are%20online.,of%20the%20managers%20are%20online. https://stackoverflow.com/questions/50933171/docker-node-is-down-after-service-restart https://cynici.wordpress.com/2018/05/31/docker-info-rpc-error-on-manager-node/ https://gitmemory.com/issue/docker/swarmkit/2670/481951641 https://forums.docker.com/t/cant-add-third-swarm-manager-or-create-overlay-network-the-swarm-does-not-have-a-leader/50849 https://askubuntu.com/questions/935569/how-to-completely-uninstall-docker
I just promoted a node to manager and was in the process of promoting another one (going from one manager to three) but after promoting one I also get the error:
EDIT: Waiting for a few more minutes seems to have resolved the issue, the command was accepted and both new nodes are managers now.
@thaJeztah that could be possible. How can this be checked?
Update: It seems that the same problem occurs when adding manager1 to the swarm again: (I removed manager1 from the swarm and added it again as worker). Then:
@rdxmb if you’re immediately running after the worker was promoted, a timing issue sounds plausible; perhaps the manager did not yet have fully synced the cluster state?
Looks similar to the situation discussed in https://github.com/moby/moby/issues/23903 (possibly https://github.com/moby/moby/issues/34384), which (IIUC) should’ve been resolved by https://github.com/docker/swarmkit/pull/1091
ping @nishanttotla @aaronlehmann PTAL