moby: Error response from daemon: rpc error: code = 4 desc = context deadline exceeded

Description

Steps to reproduce the issue:

try to remove a service by docker service rm serviceName

Describe the results you received: Error response from daemon: rpc error: code = 4 desc = context deadline exceeded

Describe the results you expected: expected the service to be removed

Additional information you deem important (e.g. issue happens only occasionally):

Output of docker version:

Client:
 Version:      1.12.2
 API version:  1.24
 Go version:   go1.6.3
 Git commit:   bb80604
 Built:
 OS/Arch:      linux/amd64

Server:
 Version:      1.12.2
 API version:  1.24
 Go version:   go1.6.3
 Git commit:   bb80604
 Built:
 OS/Arch:      linux/amd64



**Additional environment details (AWS, VirtualBox, physical, etc.):**

About this issue

Original URL
State: closed
Created 8 years ago
Reactions: 4
Comments: 62 (8 by maintainers)

Most upvoted comments

I have the same issue with docker 1.13.0 on production server! Is there a solution for it? What is the cause of this problem? In my case, swarm join commands seemed to work, but when trying to create a service, we always get Error response from daemon: rpc error: code = 4 desc = context deadline exceeded Restart or reinstall doesn’t solve anything. Is there a fix for it in 1.13.1?

+17

liyaka on Feb 15, 2017

I hit the same issue as those above (Docker for Windows 10).
Random failures with “docker stack deploy” to swarm. Those failures ALSO correlate with times when “docker login” also fails with the message:

Error response from daemon: Get https://registry-1.docker.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)

However, if I run docker-compose -f file.yml up, then everything comes up during this same time.
Just the “docker stack deploy -c file,yml” fails. Appears as if the stack deploy is trying to authenticate/call back to the docker.io and if that fails, then you get the error message:

Error response from daemon: rpc error: code = 4 desc = context deadline exceeded

Perhaps stack deploy needs to be more forgiving??? If all the images are present, why is it calling back??

Or make sure docker.io times out less often 😃.

RobWillis on Mar 31, 2017

I just had this issue on 1.13.0 build 49bf474 after installing Shipyard. Rebooting did not fix, had to reinstall.

Danielv123 on Jan 24, 2017

@iglov yes, that’s expected; if you have a 2-manager swarm, you need a quorum of two managers to control the swarm. The moment you --force leave one manager, you lost quorum. Note that the other node, is not aware of the other manager no longer being there; it still assumes there’s two managers, so requires the other manager in order to do anything.

Read https://docs.docker.com/engine/swarm/admin_guide/#add-manager-nodes-for-fault-tolerance, and https://docs.docker.com/engine/swarm/raft/ on how the raft consensus algorithm works.

thaJeztah on May 20, 2017

Same error on http://play-with-docker.com 😃

How to reproduce:

node1# docker swarm init --advertise-addr $(hostname -i)
node2# docker swarm join --token _token_here_ 10.0.2.3:2377
node1# docker node promote node2
node1# docker swarm leave --force
node2# docker node inspect self

[]
Status: Error response from daemon: rpc error: code = 4 desc = context deadline exceeded, Code: 1

Now, in node2 any swarm command will be failed, and any other command (e.x. docker info) will be execute very long time.

It is ok? ^_^

iglov on May 20, 2017

@ekatyukhin you should never run with a two-manager setup, as it actually cuts fault “tolerance” in half. Swarm uses “raft”, which requires a quorum. If you have two managers defined, both managers need to be active for swarm to work (as quorum requires > 50% of the managers). If one of both dies, you lost quorum, and control over you swarm;

Read there’s pages in the documentation;

thaJeztah on Feb 17, 2017

We have a swarm cluster of about 5 hosts (on manager for now).

$ docker -v
Docker version 18.01.0-ce, build 03596f5
$ uname -a
Linux ** 4.4.0-112-generic #135-Ubuntu SMP Fri Jan 19 11:48:36 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

About every 12-24 hours we get the mentioned error (on the manager node) for every swarm command. E.g.:

$ docker node ls
Error response from daemon: rpc error: code = DeadlineExceeded desc = context deadline exceeded

After service docker restart (takes about 1-5 minutes) it works again until…

I have the feeling as more networks (overlay - standard config)/services are started more often the Docker API crashes.

$ docker network ls | wc -l
55
$ docker stack ls | wc -l
23
$ docker service ls | wc -l
85

By the way journalctl -fu docker is flooded with

level=warning msg="[resolver] connect failed: dial udp ***.**.**.**:53: connect: network is unreachable

and sometimes (about every 30 - 60 secs) with

level=error msg="Failed to deserialize netlink ndmsg: Link not found

fentas on Mar 20, 2018

@menxit that’s why i call them to magic log,maybe this is a bug and i hope it can be fixed in the next version.

errorlife on Mar 3, 2017

In docker log (sudo cat /var/log/messages | grep docker) I can see tons of these messages: Feb 17 14:29:16 ip-172-31-18-45 dockerd: time=“2017-02-17T14:29:16.319700726-05:00” level=warning msg=“memberlist: failed to receive: No installed keys could decrypt the message from=172.31.19.167:47718” What does that mean?

liyaka on Feb 17, 2017

Similar issue with Docker Swarm 1.13.1. Created 2 manager nodes, shut down the leading one, the second throws this error when entering docker node ls command or docker info. Looks like switch to the second manager node fails

ekatyukhin on Feb 17, 2017

I also regularly encounter this problem with 1.12.6 (as this is the packaged CoreOS stable version of Docker) with a three manager-node setup. One of the nodes thinks it is still connected and will respond to a docker node ls by listing all three nodes, but the other two nodes will timeout when doing a node listing with the error:

Error response from daemon: rpc error: code = 4 desc = context deadline exceeded

The solution for me was to restart the docker daemon on the node that still thought it was connected (presumably the cluster leader) by doing sudo systemctl restart docker.service - note: exact command to restart the docker daemon will by system specific.

This causes the other two previously disconnected nodes to now respond to docker node ls and all other swarm commands appropriately.

Edit: AWS hosts in VPC, so not a case of changing IP addresses.

sleepyfox on Sep 20, 2017

i find out answer in this page https://github.com/moby/moby/issues/31427 it turns out network prolem. if we use imagename@sha256:… instead of imagename:tag, just need one of the nodes have already pulled the images. then service create can be work , rpc error not come again

kaasia on May 24, 2017

I’m not sure whether the issue caused by some network problem, it happens only occasionally. because I’m in China, I have to use a mirror to pull or push images, can’t directly connect to docker hub or something like google, instead I tried to use VPN, and then the problem was magically resolved, I tried several times, it works fine. My docker version: Docker version 17.03.0-ce, build 3a232c8

cancon2005 on Mar 22, 2017