moby: Error response from daemon: rpc error: code = 4 desc = context deadline exceeded
Description
Steps to reproduce the issue:
- try to remove a service by docker service rm serviceName
Describe the results you received: Error response from daemon: rpc error: code = 4 desc = context deadline exceeded
Describe the results you expected: expected the service to be removed
Additional information you deem important (e.g. issue happens only occasionally):
Output of docker version
:
Client:
Version: 1.12.2
API version: 1.24
Go version: go1.6.3
Git commit: bb80604
Built:
OS/Arch: linux/amd64
Server:
Version: 1.12.2
API version: 1.24
Go version: go1.6.3
Git commit: bb80604
Built:
OS/Arch: linux/amd64
**Additional environment details (AWS, VirtualBox, physical, etc.):**
About this issue
- Original URL
- State: closed
- Created 8 years ago
- Reactions: 4
- Comments: 62 (8 by maintainers)
I have the same issue with docker 1.13.0 on production server! Is there a solution for it? What is the cause of this problem? In my case, swarm join commands seemed to work, but when trying to create a service, we always get
Error response from daemon: rpc error: code = 4 desc = context deadline exceeded
Restart or reinstall doesn’t solve anything. Is there a fix for it in 1.13.1?I hit the same issue as those above (Docker for Windows 10).
Random failures with “docker stack deploy” to swarm. Those failures ALSO correlate with times when “docker login” also fails with the message:
Error response from daemon: Get https://registry-1.docker.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
However, if I run docker-compose -f file.yml up, then everything comes up during this same time.
Just the “docker stack deploy -c file,yml” fails. Appears as if the stack deploy is trying to authenticate/call back to the docker.io and if that fails, then you get the error message:
Error response from daemon: rpc error: code = 4 desc = context deadline exceeded
Perhaps stack deploy needs to be more forgiving??? If all the images are present, why is it calling back??
Or make sure docker.io times out less often 😃.
I just had this issue on 1.13.0 build 49bf474 after installing Shipyard. Rebooting did not fix, had to reinstall.
@iglov yes, that’s expected; if you have a 2-manager swarm, you need a quorum of two managers to control the swarm. The moment you
--force
leave one manager, you lost quorum. Note that the other node, is not aware of the other manager no longer being there; it still assumes there’s two managers, so requires the other manager in order to do anything.Read https://docs.docker.com/engine/swarm/admin_guide/#add-manager-nodes-for-fault-tolerance, and https://docs.docker.com/engine/swarm/raft/ on how the raft consensus algorithm works.
Same error on http://play-with-docker.com 😃
How to reproduce:
Now, in node2 any swarm command will be failed, and any other command (e.x.
docker info
) will be execute very long time.It is ok? ^_^
@ekatyukhin you should never run with a two-manager setup, as it actually cuts fault “tolerance” in half. Swarm uses “raft”, which requires a quorum. If you have two managers defined, both managers need to be active for swarm to work (as quorum requires > 50% of the managers). If one of both dies, you lost quorum, and control over you swarm;
Read there’s pages in the documentation;
We have a swarm cluster of about 5 hosts (on manager for now).
About every 12-24 hours we get the mentioned error (on the manager node) for every swarm command. E.g.:
After
service docker restart
(takes about 1-5 minutes) it works again until…I have the feeling as more networks (overlay - standard config)/services are started more often the Docker API crashes.
By the way
journalctl -fu docker
is flooded withand sometimes (about every 30 - 60 secs) with
@menxit that’s why i call them to magic log,maybe this is a bug and i hope it can be fixed in the next version.
In docker log (sudo cat /var/log/messages | grep docker) I can see tons of these messages: Feb 17 14:29:16 ip-172-31-18-45 dockerd: time=“2017-02-17T14:29:16.319700726-05:00” level=warning msg=“memberlist: failed to receive: No installed keys could decrypt the message from=172.31.19.167:47718” What does that mean?
Similar issue with Docker Swarm 1.13.1. Created 2 manager nodes, shut down the leading one, the second throws this error when entering docker node ls command or docker info. Looks like switch to the second manager node fails
I also regularly encounter this problem with 1.12.6 (as this is the packaged CoreOS stable version of Docker) with a three manager-node setup. One of the nodes thinks it is still connected and will respond to a
docker node ls
by listing all three nodes, but the other two nodes will timeout when doing a node listing with the error:The solution for me was to restart the docker daemon on the node that still thought it was connected (presumably the cluster leader) by doing
sudo systemctl restart docker.service
- note: exact command to restart the docker daemon will by system specific.This causes the other two previously disconnected nodes to now respond to
docker node ls
and all other swarm commands appropriately.Edit: AWS hosts in VPC, so not a case of changing IP addresses.
i find out answer in this page https://github.com/moby/moby/issues/31427 it turns out network prolem. if we use imagename@sha256:… instead of imagename:tag, just need one of the nodes have already pulled the images. then service create can be work , rpc error not come again
I’m not sure whether the issue caused by some network problem, it happens only occasionally. because I’m in China, I have to use a mirror to pull or push images, can’t directly connect to docker hub or something like google, instead I tried to use VPN, and then the problem was magically resolved, I tried several times, it works fine. My docker version: Docker version 17.03.0-ce, build 3a232c8