moby: Error response from daemon: rpc error: code = DeadlineExceeded desc = context deadline exceeded

Description

I just want to change the roles of an existing swarm like:

worker2 -> promote to manager
manager1 -> demote to worker

This is due to a planned maintenance with ip-change on manager1, which should be done like

manager1 -> demote to worker -> drain mode -> leave swarm -> change ip -> join swarm -> promote to manager 
worker2 -> demote to worker again

Steps to reproduce the issue:

manager1:~# docker node promote worker2
Node worker2 promoted to a manager in the swarm.
worker2:~# docker node ls                                                                           
ID                            HOSTNAME            STATUS              AVAILABILITY        MANAGER STATUS            
mzqms0uiq2f6t9lqvhghiuqmg     manager1            Ready               Active              Leader                    
vp4dbt8xefe14rqzej5gpdi2u     worker1             Ready               Active                                        
20vbax32k3rc5dla7p86kfgku *   worker2             Ready               Active              Reachable   
worker2:~# docker node demote manager1 # or just
worker2:~# docker node update --availability drain manager1

Describe the results you received: Error response from daemon: rpc error: code = DeadlineExceeded desc = context deadline exceeded

Describe the results you expected: Manager manager1 demoted in the swarm.

Additional information you deem important (e.g. issue happens only occasionally): Swarm has been running for half a year.

Output of docker version:

# docker version
Client:
 Version:       17.12.1-ce
 API version:   1.35
 Go version:    go1.9.4
 Git commit:    7390fc6
 Built: Tue Feb 27 22:17:40 2018
 OS/Arch:       linux/amd64

Server:
 Engine:
  Version:      17.12.1-ce
  API version:  1.35 (minimum version 1.12)
  Go version:   go1.9.4
  Git commit:   7390fc6
  Built:        Tue Feb 27 22:16:13 2018
  OS/Arch:      linux/amd64
  Experimental: false

Output of docker info:

docker info
Containers: 22
 Running: 0
 Paused: 0
 Stopped: 22
Images: 8
Server Version: 17.12.1-ce
Storage Driver: overlay2
 Backing Filesystem: extfs
 Supports d_type: true
 Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: active
 NodeID: mzqms0uiq2f6t9lqvhghiuqmg
 Is Manager: false
 Node Address: 10.47.0.2
 Manager Addresses:
  10.47.0.4:2377
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 9b55aab90508bd389d7654c4baf173a981477d55
runc version: 9f9c96235cc97674e935002fc3d78361b696a69e
init version: 949e6fa
Security Options:
 apparmor
 seccomp
  Profile: default
Kernel Version: 4.13.0-36-generic
Operating System: Ubuntu 16.04.4 LTS
OSType: linux
Architecture: x86_64
CPUs: 1
Total Memory: 15.67GiB
Name: manager1
ID: HFFB:LBVB:4TSL:DRVP:JXMR:WZXI:QEDA:N3WP:Z7QL:WAPG:OPVZ:BZLQ
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false

WARNING: No swap limit support

Additional environment details (AWS, VirtualBox, physical, etc.):

Ubuntu-Machines running on VMWare.


manager1:~# cat /etc/issue
Ubuntu 16.04.4 LTS \n \l

manager1:~# uname -a
Linux manager1 4.13.0-36-generic #40~16.04.1-Ubuntu SMP Fri Feb 16 23:25:58 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Reactions: 8
  • Comments: 37 (5 by maintainers)

Most upvoted comments

Same with 18.06.0-ce

# docker node ls
Error response from daemon: rpc error: code = DeadlineExceeded desc = context deadline exceeded
# docker --version
Docker version 18.06.0-ce, build 0ffa825
# docker version
Client:
 Version:           18.06.0-ce
 API version:       1.38
 Go version:        go1.10.3
 Git commit:        0ffa825
 Built:             Wed Jul 18 19:10:22 2018
 OS/Arch:           linux/amd64
 Experimental:      false

Server:
 Engine:
  Version:          18.06.0-ce
  API version:      1.38 (minimum version 1.12)
  Go version:       go1.10.3
  Git commit:       0ffa825
  Built:            Wed Jul 18 19:08:26 2018
  OS/Arch:          linux/amd64
  Experimental:     false
# docker info
Containers: 38
 Running: 1
 Paused: 0
 Stopped: 37
Images: 108
Server Version: 18.06.0-ce
Storage Driver: aufs
 Root Dir: /var/lib/docker/aufs
 Backing Filesystem: xfs
 Dirs: 2096
 Dirperm1 Supported: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: active
 NodeID: ra5qlktmfa1am2u830g715tp4
 Error: rpc error: code = DeadlineExceeded desc = context deadline exceeded
 Is Manager: true
 Node Address: 192.168.20.46
 Manager Addresses:
  192.168.20.30:2377
  192.168.20.45:2377
  192.168.20.46:2377
  192.168.20.60:2377
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: d64c661f1d51c48782c9cec8fda7604785f93587
runc version: 69663f0bd4b60df09991c08812a60108003fa340 
init version: fec3683
Security Options:
 apparmor
Kernel Version: 4.2.0-42-generic
Operating System: Ubuntu 14.04.5 LTS
OSType: linux
Architecture: x86_64
CPUs: 8
Total Memory: 61.05GiB
Name: codexServer
ID: I3P4:SLVS:BBBQ:SBVF:3XLC:CWAY:3RCM:EFNR:TJPK:RRU7:6WJO:C43J
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Username: crimsonglory
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false

Same here on a manager using 18.03.1-ce. Get this message with docker node ls, docker info… From the other managers, it can be demoted, promoted, removed from the swarm and re-added without any apparent issue. But when I log into that manager it’s totally broken… 🤨 Tried to reboot, recreate the instance, no avail. So it doesn’t seem related to the instance itself but rather to the swarm.

Just wanted to say since I’ve been running Swarm on 18.06.1-ce I’ve been seeing this error everyday and all the time. I never understood what it meant, but from general feeling its related to network issues (e.g. swarm ports not reachable, node is offline).

One thing I do know for certain is that this error happens when one of the Swarm node is on a Wi-Fi connection on a Type-1 Hypervisor (namely Hyper-V), where , so that any change of state (e.g. docker stack deploy, docker node rm) of the Swarm causes the entire Swarm to ‘hang’ for about 1 minute. See here for an explanation. My issue might be related.

Just take a look at my logs yesterday when I was doing some docker stack rm and docker stack deploy (not even docker node rm or docker swarm join):

$ cat /var/log/syslog | grep 'context deadline exceeded'
Apr 30 00:57:08 dockermanager1 dockerd[1666]: time="2019-04-30T00:57:08.469936110+08:00" level=error msg="error sending message to peer" error="rpc error: code = DeadlineExceeded desc = context deadline exceeded"
Apr 30 00:57:16 dockermanager1 dockerd[1666]: time="2019-04-30T00:57:16.470048208+08:00" level=error msg="error sending message to peer" error="rpc error: code = DeadlineExceeded desc = context deadline exceeded"
Apr 30 00:57:24 dockermanager1 dockerd[1666]: time="2019-04-30T00:57:24.470005513+08:00" level=error msg="error sending message to peer" error="rpc error: code = DeadlineExceeded desc = context deadline exceeded"
Apr 30 00:57:32 dockermanager1 dockerd[1666]: time="2019-04-30T00:57:32.469811627+08:00" level=error msg="error sending message to peer" error="rpc error: code = DeadlineExceeded desc = context deadline exceeded"
Apr 30 00:57:39 dockermanager1 dockerd[1666]: time="2019-04-30T00:57:39.470093866+08:00" level=error msg="error sending message to peer" error="rpc error: code = DeadlineExceeded desc = context deadline exceeded"
Apr 30 00:57:48 dockermanager1 dockerd[1666]: time="2019-04-30T00:57:48.470102369+08:00" level=error msg="error sending message to peer" error="rpc error: code = DeadlineExceeded desc = context deadline exceeded"
Apr 30 00:57:57 dockermanager1 dockerd[1666]: time="2019-04-30T00:57:57.470082781+08:00" level=error msg="error sending message to peer" error="rpc error: code = DeadlineExceeded desc = context deadline exceeded"
Apr 30 00:58:05 dockermanager1 dockerd[1666]: time="2019-04-30T00:58:05.469967521+08:00" level=error msg="error sending message to peer" error="rpc error: code = DeadlineExceeded desc = context deadline exceeded"
Apr 30 00:58:15 dockermanager1 dockerd[1666]: time="2019-04-30T00:58:15.470036328+08:00" level=error msg="error sending message to peer" error="rpc error: code = DeadlineExceeded desc = context deadline exceeded"
Apr 30 00:58:22 dockermanager1 dockerd[1666]: time="2019-04-30T00:58:22.469912401+08:00" level=error msg="error sending message to peer" error="rpc error: code = DeadlineExceeded desc = context deadline exceeded"
Apr 30 00:58:29 dockermanager1 dockerd[1666]: time="2019-04-30T00:58:29.469905377+08:00" level=error msg="error sending message to peer" error="rpc error: code = DeadlineExceeded desc = context deadline exceeded"
Apr 30 00:58:37 dockermanager1 dockerd[1666]: time="2019-04-30T00:58:37.470047939+08:00" level=error msg="error sending message to peer" error="rpc error: code = DeadlineExceeded desc = context deadline exceeded"
Apr 30 00:58:47 dockermanager1 dockerd[1666]: time="2019-04-30T00:58:47.469931679+08:00" level=error msg="error sending message to peer" error="rpc error: code = DeadlineExceeded desc = context deadline exceeded"
Apr 30 00:58:54 dockermanager1 dockerd[1666]: time="2019-04-30T00:58:54.470549865+08:00" level=error msg="error sending message to peer" error="rpc error: code = DeadlineExceeded desc = context deadline exceeded"
Apr 30 00:59:02 dockermanager1 dockerd[1666]: time="2019-04-30T00:59:02.470009052+08:00" level=error msg="error sending message to peer" error="rpc error: code = DeadlineExceeded desc = context deadline exceeded"
Apr 30 00:59:09 dockermanager1 dockerd[1666]: time="2019-04-30T00:59:09.469991053+08:00" level=error msg="error sending message to peer" error="rpc error: code = DeadlineExceeded desc = context deadline exceeded"
Apr 30 00:59:17 dockermanager1 dockerd[1666]: time="2019-04-30T00:59:17.470073345+08:00" level=error msg="error sending message to peer" error="rpc error: code = DeadlineExceeded desc = context deadline exceeded"
Apr 30 00:59:24 dockermanager1 dockerd[1666]: time="2019-04-30T00:59:24.470134555+08:00" level=error msg="error sending message to peer" error="rpc error: code = DeadlineExceeded desc = context deadline exceeded"

and more details on here on the last minute:

$ cat /var/log/syslog | grep 'Apr 30 00:59'
Apr 30 00:58:38 dockermanager1 dockerd[1666]: time="2019-04-30T00:58:38.238909507+08:00" level=error msg="error sending message to peer" error="rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = \"transport: Error while dialing dial tcp 192.168.0.211:2377: connect: no route to host\""
Apr 30 00:58:38 dockermanager1 dockerd[1666]: time="2019-04-30T00:58:38.469913224+08:00" level=error msg="error sending message to peer" error="rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = \"transport: Error while dialing dial tcp 10.0.0.1:2377: connect: no route to host\""
Apr 30 00:58:39 dockermanager1 dockerd[1666]: time="2019-04-30T00:58:39.469913607+08:00" level=error msg="error sending message to peer" error="rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = \"transport: Error while dialing dial tcp 10.0.0.1:2377: connect: no route to host\""
Apr 30 00:58:40 dockermanager1 dockerd[1666]: time="2019-04-30T00:58:40.469880591+08:00" level=error msg="error sending message to peer" error="rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = \"transport: Error while dialing dial tcp 10.0.0.1:2377: connect: no route to host\""
Apr 30 00:58:41 dockermanager1 dockerd[1666]: time="2019-04-30T00:58:41.469798076+08:00" level=error msg="error sending message to peer" error="rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = \"transport: Error while dialing dial tcp 10.0.0.1:2377: connect: no route to host\""
Apr 30 00:58:42 dockermanager1 dockerd[1666]: time="2019-04-30T00:58:42.469865259+08:00" level=error msg="error sending message to peer" error="rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = \"transport: Error while dialing dial tcp 10.0.0.1:2377: connect: no route to host\""
Apr 30 00:58:43 dockermanager1 dockerd[1666]: time="2019-04-30T00:58:43.469886743+08:00" level=error msg="error sending message to peer" error="rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = \"transport: Error while dialing dial tcp 10.0.0.1:2377: connect: no route to host\""
Apr 30 00:58:44 dockermanager1 dockerd[1666]: time="2019-04-30T00:58:44.469770628+08:00" level=error msg="error sending message to peer" error="rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = \"transport: Error while dialing dial tcp 10.0.0.1:2377: connect: no route to host\""
Apr 30 00:58:44 dockermanager1 dockerd[1666]: time="2019-04-30T00:58:44.523885240+08:00" level=info msg="pickfirstBalancer: HandleSubConnStateChange: 0xc42390e150, CONNECTING" module=grpc
Apr 30 00:58:47 dockermanager1 dockerd[1666]: time="2019-04-30T00:58:47.469931679+08:00" level=error msg="error sending message to peer" error="rpc error: code = DeadlineExceeded desc = context deadline exceeded"
Apr 30 00:58:47 dockermanager1 dockerd[1666]: time="2019-04-30T00:58:47.582513564+08:00" level=warning msg="grpc: addrConn.createTransport failed to connect to {10.0.0.1:2377 0  <nil>}. Err :connection error: desc = \"transport: Error while dialing dial tcp 10.0.0.1:2377: connect: no route to host\". Reconnecting..." module=grpc
Apr 30 00:58:47 dockermanager1 dockerd[1666]: time="2019-04-30T00:58:47.582759561+08:00" level=info msg="pickfirstBalancer: HandleSubConnStateChange: 0xc42390e150, TRANSIENT_FAILURE" module=grpc
Apr 30 00:58:47 dockermanager1 dockerd[1666]: time="2019-04-30T00:58:47.582911860+08:00" level=error msg="error sending message to peer" error="rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = \"transport: Error while dialing dial tcp 10.0.0.1:2377: connect: no route to host\""
Apr 30 00:58:47 dockermanager1 dockerd[1666]: time="2019-04-30T00:58:47.583184257+08:00" level=error msg="error sending message to peer" error="rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = \"transport: Error while dialing dial tcp 10.0.0.1:2377: connect: no route to host\""
Apr 30 00:58:48 dockermanager1 dockerd[1666]: time="2019-04-30T00:58:48.469930163+08:00" level=error msg="error sending message to peer" error="rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = \"transport: Error while dialing dial tcp 10.0.0.1:2377: connect: no route to host\""
Apr 30 00:58:49 dockermanager1 dockerd[1666]: time="2019-04-30T00:58:49.469957047+08:00" level=error msg="error sending message to peer" error="rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = \"transport: Error while dialing dial tcp 10.0.0.1:2377: connect: no route to host\""
Apr 30 00:58:50 dockermanager1 dockerd[1666]: time="2019-04-30T00:58:50.469823733+08:00" level=error msg="error sending message to peer" error="rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = \"transport: Error while dialing dial tcp 10.0.0.1:2377: connect: no route to host\""
Apr 30 00:58:51 dockermanager1 dockerd[1666]: time="2019-04-30T00:58:51.469861617+08:00" level=error msg="error sending message to peer" error="rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = \"transport: Error while dialing dial tcp 10.0.0.1:2377: connect: no route to host\""
Apr 30 00:58:51 dockermanager1 dockerd[1666]: time="2019-04-30T00:58:51.587378557+08:00" level=info msg="pickfirstBalancer: HandleSubConnStateChange: 0xc42390e150, CONNECTING" module=grpc
Apr 30 00:58:54 dockermanager1 dockerd[1666]: time="2019-04-30T00:58:54.470549865+08:00" level=error msg="error sending message to peer" error="rpc error: code = DeadlineExceeded desc = context deadline exceeded"
Apr 30 00:58:54 dockermanager1 dockerd[1666]: time="2019-04-30T00:58:54.653933912+08:00" level=warning msg="grpc: addrConn.createTransport failed to connect to {10.0.0.1:2377 0  <nil>}. Err :connection error: desc = \"transport: Error while dialing dial tcp 10.0.0.1:2377: connect: no route to host\". Reconnecting..." module=grpc
Apr 30 00:58:54 dockermanager1 dockerd[1666]: time="2019-04-30T00:58:54.654009711+08:00" level=info msg="pickfirstBalancer: HandleSubConnStateChange: 0xc42390e150, TRANSIENT_FAILURE" module=grpc
Apr 30 00:58:54 dockermanager1 dockerd[1666]: time="2019-04-30T00:58:54.654046711+08:00" level=error msg="error sending message to peer" error="rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = \"transport: Error while dialing dial tcp 10.0.0.1:2377: connect: no route to host\""
Apr 30 00:58:54 dockermanager1 dockerd[1666]: time="2019-04-30T00:58:54.654083010+08:00" level=error msg="error sending message to peer" error="rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = \"transport: Error while dialing dial tcp 10.0.0.1:2377: connect: no route to host\""
Apr 30 00:58:55 dockermanager1 dockerd[1666]: time="2019-04-30T00:58:55.469919456+08:00" level=error msg="error sending message to peer" error="rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = \"transport: Error while dialing dial tcp 10.0.0.1:2377: connect: no route to host\""
Apr 30 00:58:56 dockermanager1 dockerd[1666]: time="2019-04-30T00:58:56.469938640+08:00" level=error msg="error sending message to peer" error="rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = \"transport: Error while dialing dial tcp 10.0.0.1:2377: connect: no route to host\""
Apr 30 00:58:57 dockermanager1 dockerd[1666]: time="2019-04-30T00:58:57.469761127+08:00" level=error msg="error sending message to peer" error="rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = \"transport: Error while dialing dial tcp 10.0.0.1:2377: connect: no route to host\""
Apr 30 00:58:58 dockermanager1 dockerd[1666]: time="2019-04-30T00:58:58.469787412+08:00" level=error msg="error sending message to peer" error="rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = \"transport: Error while dialing dial tcp 10.0.0.1:2377: connect: no route to host\""
Apr 30 00:58:59 dockermanager1 dockerd[1666]: time="2019-04-30T00:58:59.469860997+08:00" level=error msg="error sending message to peer" error="rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = \"transport: Error while dialing dial tcp 10.0.0.1:2377: connect: no route to host\""

10.0.0.1 here is an actual manager node that was offline on a 3 node Swarm. All 3 nodes were managers.

My Swarm runs well. All until the next time do a docker stack rm, then the frustration returns.

I’m running into this error on a production swarm.

The existing manager node is running docker 20.10.7 I added four new worker nodes running docker 24.0.6

All was running fine.

Today I added a new node running 24.0.6 in order to replace the old manager node. The node joined the swarm as a “Reachable” manager successfully. But while trying to manager services we run into this error (Error response from daemon: rpc error: code = DeadlineExceeded desc = context deadline exceeded)

So I decided to remove the newly added manager node : docker node demote <last-added-manager-node> and I still run into this error. I get the same error if I try to demote the actual leader node.

If I try to leave the swarm using docker swarm leave I get this error message :

Error response from daemon: You are attempting to leave the swarm on a node that is participating as a manager. 
Removing this node leaves 1 managers out of 2. Without a Raft quorum your swarm will be inaccessible. 
The only way to restore a swarm that has lost consensus is to reinitialize it with `--force-new-cluster`. 
Use `--force` to suppress this message.

I really fear using this command in this production cluster.

Can you tell me what commands to run on which node to fix this problem ?

Hi, I also noticed the same issue today. I am also using docker stack deploy. I am guessing that it happened while my ec2 instance was not responding, so it might be related to memory. I am currently running 3 containers in a 1 GB memory machine. Anyone got the exact issue?

Not related to memory

Hello,

I confirmed this issue in 18.09.9 and maybe 19.03.12

I have the issue actually and 2 of 3 manager can reach actual manager leader.

I have got the same case one month before on non prod environment.

I am actually stuck. This is not reproducible on fresh stack and new machines.

OS  No LSB modules are available.  Distributor ID: Ubuntu  Description: Ubuntu 16.04.3 LTS  Release: 16.04  Codename: xenial

Kernel  4.15.0-1060

Message  Error response from daemon: Timeout was reached before node joined. The attempt to join the swarm will continue in the background. Use the “docker info” command to see the current swarm status of your node.

REFS https://forums.docker.com/t/docker-19-03-12-the-swarm-does-not-have-a-leader-aferter-swarm-upgrade/98579 https://stackoverflow.com/questions/63843832/docker-19-03-12-the-swarm-does-not-have-a-leader-aferter-swarm-upgrade

I use some strategy but without success.

https://github.com/moby/moby/issues/34384#:~:text=demote master …-,new-server%23%20docker%20node%20ls%20Error%20response%20from%20daemon%3A,too%20few%20managers%20are%20online.&text=have%20a%20leader.-,It%27s%20possible%20that%20too%20few%20managers%20are%20online.,of%20the%20managers%20are%20online. https://stackoverflow.com/questions/50933171/docker-node-is-down-after-service-restart https://cynici.wordpress.com/2018/05/31/docker-info-rpc-error-on-manager-node/ https://gitmemory.com/issue/docker/swarmkit/2670/481951641 https://forums.docker.com/t/cant-add-third-swarm-manager-or-create-overlay-network-the-swarm-does-not-have-a-leader/50849 https://askubuntu.com/questions/935569/how-to-completely-uninstall-docker

I just promoted a node to manager and was in the process of promoting another one (going from one manager to three) but after promoting one I also get the error:

$ docker node promote tbqrzclcxe5pgmthtj4zfi20v
Node tbqrzclcxe5pgmthtj4zfi20v promoted to a manager in the swarm.
$ docker node promote weapv3om9qm5xp56caebfl8ig
Error response from daemon: rpc error: code = DeadlineExceeded desc = context deadline exceeded

EDIT: Waiting for a few more minutes seems to have resolved the issue, the command was accepted and both new nodes are managers now.

@thaJeztah that could be possible. How can this be checked?

Update: It seems that the same problem occurs when adding manager1 to the swarm again: (I removed manager1 from the swarm and added it again as worker). Then:

worker2:~# docker node promote manager1
Node manager1 promoted to a manager in the swarm.

manager1:~# docker node ls
Error response from daemon: rpc error: code = DeadlineExceeded desc = context deadline exceeded
manager1:~# docker node ls
Error response from daemon: rpc error: code = DeadlineExceeded desc = context deadline exceeded

worker2:~# docker node ls
Error response from daemon: rpc error: code = Unknown desc = The swarm does not have a leader. It's possible that too few managers are online. Make sure more than half of the managers are online.

@rdxmb if you’re immediately running after the worker was promoted, a timing issue sounds plausible; perhaps the manager did not yet have fully synced the cluster state?

Looks similar to the situation discussed in https://github.com/moby/moby/issues/23903 (possibly https://github.com/moby/moby/issues/34384), which (IIUC) should’ve been resolved by https://github.com/docker/swarmkit/pull/1091

ping @nishanttotla @aaronlehmann PTAL