moby: failed to get event and rpc error "connect: connection refused"

Description

In my environment, my docker occasionally happened the issue as follows: The service status of docker is running, but docker cannot run any new container or remove any old container. In actually, the docker is wrong.

It will print the same error logs as follows:

nection error: desc = \"transport: Error while dialing dial unix /run/containerd/containerd.sock: connect: connection refused\"" module=libcontainerd namespace=moby
10月 24 17:15:47 slave2 dockerd[3948]: time="2019-10-24T17:15:47.543529097+08:00" level=error msg="failed to get event" error="rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = \"transport: Error while dialing dial unix /run/containerd/containerd.sock: connect: connection refused\"" module=libcontainerd namespace=plugins.moby
10月 24 17:15:47 slave2 dockerd[3948]: time="2019-10-24T17:15:47.543585977+08:00" level=error msg="failed to get event" error="rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = \"transport: Error while dialing dial unix /run/containerd/containerd.sock: connect: connection refused\"" module=libcontainerd namespace=moby
10月 24 17:15:47 slave2 dockerd[3948]: time="2019-10-24T17:15:47.543616057+08:00" level=error msg="failed to get event" error="rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = \"transport: Error while dialing dial unix /run/containerd/containerd.sock: connect: connection refused\"" module=libcontainerd namespace=plugins.moby
10月 24 17:15:47 slave2 dockerd[3948]: time="2019-10-24T17:15:47.543666737+08:00" level=error msg="failed to get event" error="rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = \"transport: Error while dialing dial unix /run/containerd/containerd.sock: connect: connection refused\"" module=libcontainerd namespace=moby
10月 24 17:15:48 slave2 dockerd[3948]: time="2019-10-24T17:15:47.543704917+08:00" level=error msg="failed to get event" error="rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = \"transport: Error while dialing dial unix /run/containerd/containerd.sock: connect: connection refused\"" module=libcontainerd namespace=plugins.moby
10月 24 17:15:48 slave2 dockerd[3948]: time="2019-10-24T17:15:47.543743937+08:00" level=error msg="failed to get event" error="rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = \"transport: Error while dialing dial unix /run/containerd/containerd.sock: connect: connection refused\"" module=libcontainerd namespace=moby
10月 24 17:15:48 slave2 dockerd[3948]: time="2019-10-24T17:15:47.543781397+08:00" level=error msg="failed to get event" error="rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = \"transport: Error while dialing dial unix /run/containerd/containerd.sock: connect: connection refused\"" module=libcontainerd namespace=plugins.moby
10月 24 17:15:48 slave2 dockerd[3948]: time="2019-10-24T17:15:47.543820037+08:00" level=error msg="failed to get event" error="rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = \"transport: Error while dialing dial unix /run/containerd/containerd.sock: connect: connection refused\"" module=libcontainerd namespace=moby
10月 24 17:15:48 slave2 dockerd[3948]: time="2019-10-24T17:15:47.543858637+08:00" level=error msg="failed to get event" error="rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = \"transport: Error while dialing dial unix /run/containerd/containerd.sock: connect: connection refused\"" module=libcontainerd namespace=plugins.moby
10月 24 17:15:48 slave2 dockerd[3948]: time="2019-10-24T17:15:47.543900237+08:00" level=error msg="failed to get event" error="rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = \"transport: Error while dialing dial unix /run/containerd/containerd.sock: connect: connection refused\"" module=libcontainerd namespace=moby
10月 24 17:15:48 slave2 dockerd[3948]: time="2019-10-24T17:15:47.543934977+08:00" level=error msg="failed to get event" error="rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = \"transport: Error while dialing dial unix /run/containerd/containerd.sock: connect: connection refused\"" module=libcontainerd namespace=plugins.moby
10月 24 17:15:48 slave2 dockerd[3948]: time="2019-10-24T17:15:47.543990177+08:00" level=error msg="failed to get event" error="rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = \"transport: Error while dialing dial unix /run/containerd/containerd.sock: connect: connection refused\"" module=libcontainerd namespace=moby
10月 24 17:15:48 slave2 dockerd[3948]: time="2019-10-24T17:15:47.544012357+08:00" level=error msg="failed to get event" error="rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = \"transport: Error while dialing dial unix /run/containerd/containerd.sock: connect: connection refused\"" module=libcontainerd namespace=plugins.moby
10月 24 17:15:48 slave2 dockerd[3948]: time="2019-10-24T17:15:47.544066977+08:00" level=error msg="failed to get event" error="rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = \"transport: Error while dialing dial unix /run/containerd/containerd.sock: connect: connection refused\"" module=libcontainerd namespace=moby
10月 24 17:15:48 slave2 dockerd[3948]: time="2019-10-24T17:15:47.544089937+08:00" level=error msg="failed to get event" error="rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = \"transport: Error while dialing dial unix /run/containerd/containerd.sock: connect: connection refused\"" module=libcontainerd namespace=plugins.moby
10月 24 17:15:48 slave2 dockerd[3948]: time="2019-10-24T17:15:47.544147137+08:00" level=error msg="failed to get event" error="rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = \"transport: Error while dialing dial unix /run/containerd/containerd.sock: connect: connection refused\"" module=libcontainerd namespace=moby
10月 24 17:15:48 slave2 dockerd[3948]: time="2019-10-24T17:15:47.544167057+08:00" level=error msg="failed to get event" error="rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = \"transport: Error while dialing dial unix /run/containerd/containerd.sock: connect: connection refused\"" module=libcontainerd namespace=plugins.moby
10月 24 17:15:48 slave2 dockerd[3948]: time="2019-10-24T17:15:47.544227537+08:00" level=error msg="failed to get event" error="rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = \"transport: Error while dialing dial unix /run/containerd/containerd.sock: connect: connection refused\"" module=libcontainerd namespace=moby
10月 24 17:15:48 slave2 dockerd[3948]: time="2019-10-24T17:15:47.544302097+08:00" level=error msg="failed to get event" error="rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = \"transport: Error while dialing dial unix /run/containerd/containerd.sock: connect: connection refused\"" module=libcontainerd namespace=moby
10月 24 17:15:48 slave2 dockerd[3948]: time="2019-10-24T17:15:47.544250097+08:00" level=error msg="failed to get event" error="rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = \"transport: Error while dialing dial unix /run/containerd/containerd.sock: connect: connection refused\"" module=libcontainerd namespace=plugins.moby
10月 24 17:15:48 slave2 dockerd[3948]: time="2019-10-24T17:15:47.544379097+08:00" level=error msg="failed to get event" error="rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = \"transport: Error while dialing dial unix /run/containerd/containerd.sock: connect: connection refused\"" module=libcontainerd namespace=moby
10月 24 17:15:48 slave2 dockerd[3948]: time="2019-10-24T17:15:47.544451877+08:00" level=error msg="failed to get event" error="rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = \"transport: Error while dialing dial unix /run/containerd/containerd.sock: connect: connection refused\"" module=libcontainerd namespace=moby
10月 24 17:15:48 slave2 dockerd[3948]: time="2019-10-24T17:15:47.544402857+08:00" level=error msg="failed to get event" error="rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = \"transport: Error while dialing dial unix /run/containerd/containerd.sock: connect: connection refused\"" module=libcontainerd namespace=plugins.moby
10月 24 17:15:48 slave2 dockerd[3948]: time="2019-10-24T17:15:47.544529097+08:00" level=error msg="failed to get event" error="rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = \"transport: Error while dialing dial unix /run/containerd/containerd.sock: connect: connection refused\"" module=libcontainerd namespace=moby
10月 24 17:15:48 slave2 dockerd[3948]: time="2019-10-24T17:15:47.544550937+08:00" level=error msg="failed to get event" error="rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = \"transport: Error while dialing dial unix /run/containerd/containerd.sock: connect: connection refused\"" module=libcontainerd namespace=plugins.moby
10月 24 17:15:48 slave2 dockerd[3948]: time="2019-10-24T17:15:47.544608457+08:00" level=error msg="failed to get event" error="rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = \"transport: Error while dialing dial unix /run/containerd/containerd.sock: connect: connection refused\"" module=libcontainerd namespace=moby
10月 24 17:15:48 slave2 dockerd[3948]: time="2019-10-24T17:15:47.544627677+08:00" level=error msg="failed to get event" error="rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = \"transport: Error while dialing dial unix /run/containerd/containerd.sock: connect: connection refused\"" module=libcontainerd namespace=plugins.moby

When I exec docker run <caontainer>, I got:

docker run 6cf7c80fe444
docker: Error response from daemon: all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: Error while dialing dial unix /run/containerd/containerd.sock: connect: connection refused": unavailable.
ERRO[0000] error waiting for container: context canceled 

Steps to reproduce the issue: I don’t know how to reproduce it, it will only recover normal until I reboot the node that docker running on.

Additional information you deem important (e.g. issue happens only occasionally):

Output of docker version:

root@slave2:~# docker version
Client:
 Version:           18.09.8
 API version:       1.39
 Go version:        go1.10.8
 Git commit:        0dd43dd
 Built:             Wed Jul 17 17:45:38 2019
 OS/Arch:           linux/arm64
 Experimental:      false

Server: Docker Engine - Community
 Engine:
  Version:          18.09.8
  API version:      1.39 (minimum version 1.12)
  Go version:       go1.10.8
  Git commit:       0dd43dd
  Built:            Wed Jul 17 17:07:47 2019
  OS/Arch:          linux/arm64
  Experimental:     false

Output of docker info:

root@slave2:~# docker info
Containers: 79
 Running: 59
 Paused: 0
 Stopped: 20
Images: 189
Server Version: 18.09.8
Storage Driver: overlay2
 Backing Filesystem: xfs
 Supports d_type: true
 Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 894b81a4b802e4eb2a91d1ce216b8817763c29fb
runc version: 425e105d5a03fabd737a126ad93d62a9eeede87f
init version: fec3683
Security Options:
 seccomp
  Profile: default
Kernel Version: 4.4.58-20180615.kylin.server.YUN+-generic
Operating System: Kylin 4.0.2
OSType: linux
Architecture: aarch64
CPUs: 16
Total Memory: 62.89GiB
Name: slave2
ID: RBMJ:WBJ5:VBS5:SKYH:BVJK:6KH6:MIPN:QAWB:2ALL:XGPG:STBD:SMJW
Docker Root Dir: /opt/cke/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
 registry.icp.com:5000
 registry.inspurspring.com
 docker.inspur.com:5000
 10.150.0.0/16
 127.0.0.0/8
Live Restore Enabled: true
Product License: Community Engine

Additional environment details (AWS, VirtualBox, physical, etc.): physical

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Reactions: 7
  • Comments: 27 (2 by maintainers)

Most upvoted comments

Same problem here. Stopped Ubuntu Snap docker and syslog spam stopped:

“snap stop docker”

snap services

Service Startup Current Notes bootstack-elasticsearch.elasticsearch enabled inactive - docker.dockerd enabled inactive - graylog.graylog enabled active -

Same here: Ubuntu 18.04.3 Server Docker version 18.09.7, build 2d0083d

Update:

  • Restarting the server doesn’t help.
  • Process containerd looks good
service containerd status
● containerd.service - containerd container runtime
   Loaded: loaded (/lib/systemd/system/containerd.service; enabled; vendor preset: enabled)
   Active: active (running) since Fri 2019-11-15 07:38:57 UTC; 7min ago
     Docs: https://containerd.io
  Process: 784 ExecStartPre=/sbin/modprobe overlay (code=exited, status=0/SUCCESS)
 Main PID: 818 (containerd)
    Tasks: 54
   CGroup: /system.slice/containerd.service
           ├─ 818 /usr/bin/containerd
           ├─2115 containerd-shim -namespace moby -workdir /var/lib/containerd/io.containerd.runtime.v1.linux/moby/7004aa0e71abe8e9b6a843a638dda9d8f7dfc7834445713e72781ee0841b1a05 -address /run/containerd/containerd.sock -containerd-binary /usr/bin/containerd -run
           ├─2123 containerd-shim -namespace moby -workdir /var/lib/containerd/io.containerd.runtime.v1.linux/moby/e1c1f156a0974e30ad06d3c40a68bcb7c0e02f041c067c8891f3a8ba216ae0dc -address /run/containerd/containerd.sock -containerd-binary /usr/bin/containerd -run
           ├─2162 tini -- /docker-entrypoint.sh /opt/couchdb/bin/couchdb
           ├─2163 npm
           ├─2174 containerd-shim -namespace moby -workdir /var/lib/containerd/io.containerd.runtime.v1.linux/moby/9f3782883b638ab2f1783490ca046b00f0996c608eaedd5a1a7d8052ca0bb8b5 -address /run/containerd/containerd.sock -containerd-binary /usr/bin/containerd -run
           ├─2187 containerd-shim -namespace moby -workdir /var/lib/containerd/io.containerd.runtime.v1.linux/moby/e735db68b7ebae3f7853374cc75470a708ce23cce9230462eb26dfe75d492be2 -address /run/containerd/containerd.sock -containerd-binary /usr/bin/containerd -run
           ├─2208 nginx: master process nginx -g daemon off;
           ├─2226 nginx: master process nginx -g daemon off;
           ├─2427 /opt/couchdb/bin/../erts-8.3.5/bin/beam -K true -A 16 -Bd -- -root /opt/couchdb/bin/.. -progname couchdb -- -home /opt/couchdb -- -boot /opt/couchdb/bin/../releases/2.3.1/couchdb -kernel inet_dist_listen_min 9100 -kernel inet_dist_listen_max 9100
           ├─2460 nginx: worker process
           ├─2493 erl_child_setup 1048576
           ├─2522 nginx: worker process
           ├─2528 node /usr/src/app/node_modules/.bin/nodemon -L --exec node --experimental-modules --inspect=0.0.0.0 src/server.js
           ├─2548 node --experimental-modules --inspect=0.0.0.0 src/server.js
           ├─2549 sh -s disksup
           ├─2551 /opt/couchdb/bin/../lib/os_mon-2.4.2/priv/bin/memsup
           ├─2552 /opt/couchdb/bin/../lib/os_mon-2.4.2/priv/bin/cpu_sup
           ├─2574 inet_gethost 4
           └─2575 inet_gethost 4

Nov 15 07:38:58 sekp22 containerd[818]: time="2019-11-15T07:38:58.083688926Z" level=info msg="containerd successfully booted in 0.045222s"
Nov 15 07:38:58 sekp22 containerd[818]: time="2019-11-15T07:38:58.136826893Z" level=info msg="Start subscribing containerd event"
Nov 15 07:38:58 sekp22 containerd[818]: time="2019-11-15T07:38:58.136871593Z" level=info msg="Start recovering state"
Nov 15 07:38:58 sekp22 containerd[818]: time="2019-11-15T07:38:58.149373744Z" level=info msg="Start event monitor"
Nov 15 07:38:58 sekp22 containerd[818]: time="2019-11-15T07:38:58.149403679Z" level=info msg="Start snapshots syncer"
Nov 15 07:38:58 sekp22 containerd[818]: time="2019-11-15T07:38:58.149420106Z" level=info msg="Start streaming server"
Nov 15 07:41:48 sekp22 containerd[818]: time="2019-11-15T07:41:48.586751538Z" level=info msg="shim containerd-shim started" address="/containerd-shim/moby/7004aa0e71abe8e9b6a843a638dda9d8f7dfc7834445713e72781ee0841b1a05/shim.sock" debug=false pid=2115
Nov 15 07:41:48 sekp22 containerd[818]: time="2019-11-15T07:41:48.609720111Z" level=info msg="shim containerd-shim started" address="/containerd-shim/moby/e1c1f156a0974e30ad06d3c40a68bcb7c0e02f041c067c8891f3a8ba216ae0dc/shim.sock" debug=false pid=2123
Nov 15 07:41:48 sekp22 containerd[818]: time="2019-11-15T07:41:48.779031230Z" level=info msg="shim containerd-shim started" address="/containerd-shim/moby/9f3782883b638ab2f1783490ca046b00f0996c608eaedd5a1a7d8052ca0bb8b5/shim.sock" debug=false pid=2174
Nov 15 07:41:48 sekp22 containerd[818]: time="2019-11-15T07:41:48.809769293Z" level=info msg="shim containerd-shim started" address="/containerd-shim/moby/e735db68b7ebae3f7853374cc75470a708ce23cce9230462eb26dfe75d492be2/shim.sock" debug=false pid=2187
  • Process docker shows some warnings:
service docker status
● docker.service - Docker Application Container Engine
   Loaded: loaded (/lib/systemd/system/docker.service; disabled; vendor preset: enabled)
   Active: active (running) since Fri 2019-11-15 07:41:53 UTC; 5min ago
     Docs: https://docs.docker.com
 Main PID: 1754 (dockerd)
    Tasks: 30
   CGroup: /system.slice/docker.service
           ├─1754 /usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock
           ├─2097 /usr/bin/docker-proxy -proto tcp -host-ip 0.0.0.0 -host-port 9080 -container-ip 172.22.0.4 -container-port 80
           └─2108 /usr/bin/docker-proxy -proto tcp -host-ip 0.0.0.0 -host-port 9070 -container-ip 172.27.0.2 -container-port 80

Nov 15 07:41:47 sekp22 dockerd[1754]: time="2019-11-15T07:41:47.495859061Z" level=warning msg="Error (Unable to complete atomic operation, key modified) deleting object [endpoint ce9fd102d24d6663c4c3d19f8f02dc89581f6ab83c3a4dfc50a04fc2d961afdc f1ab5133a505d990d47e
Nov 15 07:41:47 sekp22 dockerd[1754]: time="2019-11-15T07:41:47.776169999Z" level=info msg="Removing stale sandbox d219f0db42cd0127b25352f27fbab18746c7b6a54a1cd5c12746a778e0e24a01 (e735db68b7ebae3f7853374cc75470a708ce23cce9230462eb26dfe75d492be2)"
Nov 15 07:41:47 sekp22 dockerd[1754]: time="2019-11-15T07:41:47.814254273Z" level=warning msg="Error (Unable to complete atomic operation, key modified) deleting object [endpoint 6abf6fee07ed833cc1ac2b47e88c04426a81eda581b62b5b17ee3528bad2336f efb881884e0e5c41a356
Nov 15 07:41:47 sekp22 dockerd[1754]: time="2019-11-15T07:41:47.904851001Z" level=info msg="Default bridge (docker0) is assigned with an IP address 172.17.0.0/16. Daemon option --bip can be used to set a preferred IP address"
Nov 15 07:41:52 sekp22 dockerd[1754]: time="2019-11-15T07:41:52.902235510Z" level=info msg="Loading containers: done."
Nov 15 07:41:52 sekp22 dockerd[1754]: time="2019-11-15T07:41:52.921695079Z" level=warning msg="failed to retrieve runc version: unknown output format: runc version spec: 1.0.1-dev\n"
Nov 15 07:41:53 sekp22 dockerd[1754]: time="2019-11-15T07:41:53.180929540Z" level=info msg="Docker daemon" commit=2d0083d graphdriver(s)=overlay2 version=18.09.7
Nov 15 07:41:53 sekp22 dockerd[1754]: time="2019-11-15T07:41:53.185264448Z" level=info msg="Daemon has completed initialization"
Nov 15 07:41:53 sekp22 systemd[1]: Started Docker Application Container Engine.
Nov 15 07:41:53 sekp22 dockerd[1754]: time="2019-11-15T07:41:53.390505691Z" level=info msg="API listen on /var/run/docker.sock"

Update 2: Snap tells me I’ve got docker 18.09.9 418 stable canonical✓ - but it’s Docker version 18.09.7, build 2d0083d - reinstall of docker via snap doesn’t help.

Update 3: FIX

  1. completly removed docker by snap
  2. removed snap as well
  3. installed docker from the official docker repos

Haven’t had a chance to dig deep, but the problem seems to be a conflict of snap.docker.dockerd.service and containerd.service.

Disabling/stopping containerd seems to have solved the issue for me.

systemctl disable containerd and everything is happier

Haven’t had a chance to dig deep, but the problem seems to be a conflict of snap.docker.dockerd.service and containerd.service.

Disabling/stopping containerd seems to have solved the issue for me.

This is what I did as a temporary fix:

  1. Firstly, stop the error message. I just did kill <pid> where the pid is number in dockerd[12345] in the error message (read this via tail -f /var/log/syslog)

  2. To clear the syslog I did cat /dev/null > /var/log/syslog

It is definitely not a coincidence from the cosmos that 9 people have got reported this problem in the last 4 hours…