moby: Docker daemon hanging

I have seen the daemon hang (1.8.3) under high load: https://github.com/docker/docker/issues/13885. However this appears different. I am now running 1.11.2 and had a hang under little to no load.

BUG REPORT INFORMATION

Output of docker version:

centos@ip-10-50-185-106 ~]$ docker version
Client:
 Version:      1.11.2
 API version:  1.23
 Go version:   go1.5.4
 Git commit:   b9f10c9
 Built:        Wed Jun  1 21:23:11 2016
 OS/Arch:      linux/amd64
Cannot connect to the Docker daemon. Is the docker daemon running on this host?
[centos@ip-10-50-185-106 ~]$ sudo docker version
Client:
 Version:      1.11.2
 API version:  1.23
 Go version:   go1.5.4
 Git commit:   b9f10c9
 Built:        Wed Jun  1 21:23:11 2016
 OS/Arch:      linux/amd64

Server:
 Version:      1.11.2
 API version:  1.23
 Go version:   go1.5.4
 Git commit:   b9f10c9
 Built:        Wed Jun  1 21:23:11 2016
 OS/Arch:      linux/amd64

Output of docker info:

Containers: 13
 Running: 12
 Paused: 0
 Stopped: 1
Images: 11
Server Version: 1.11.2
Storage Driver: devicemapper
 Pool Name: direct_lvm-thin_pool
 Pool Blocksize: 65.54 kB
 Base Device Size: 107.4 GB
 Backing Filesystem: xfs
 Data file: 
 Metadata file: 
 Data Space Used: 2.39 GB
 Data Space Total: 66.57 GB
 Data Space Available: 64.18 GB
 Metadata Space Used: 4.375 MB
 Metadata Space Total: 1.074 GB
 Metadata Space Available: 1.069 GB
 Udev Sync Supported: true
 Deferred Removal Enabled: false
 Deferred Deletion Enabled: false
 Deferred Deleted Device Count: 0
 Library Version: 1.02.107-RHEL7 (2016-06-09)
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins: 
 Volume: local
 Network: null host bridge
Kernel Version: 3.10.0-327.22.2.el7.x86_64
Operating System: CentOS Linux 7 (Core)
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 6.897 GiB
Name: ip-10-50-185-106.internal
ID: DDCD:TC7W:6V5N:QDUA:YQF6:24EU:5SVR:WY3L:VZ7X:4BRW:NKM4:INSA
Docker Root Dir: /var/lib/docker
Debug mode (client): false
Debug mode (server): false
Http Proxy: http://10.50.185.193:80
Https Proxy: http://10.50.185.193:80
No Proxy: 10.50.185.0/24,.internal,registry-k8.api.bskyb.com,master-test-k8.api.bskyb.com,localhost,127.0.0.0/8,::1,/var/run/docker.sock,169.254.169.254
Registry: https://index.docker.io/v1/

Additional environment details (AWS, VirtualBox, physical, etc.): AWS, Centos 7

Steps to reproduce the issue: Unknown

Describe the results you received: All docker client commands hang

Stracing docker client:

read(6, 0xc8203ea000, 4096)             = -1 EAGAIN (Resource temporarily unavailable)
write(6, "GET /v1.23/containers/json HTTP/"..., 89) = 89
futex(0x21ef2b0, FUTEX_WAIT, 0, NULL

So basically no response back from the daemon.

Stracing docker daemon

[root@ip-10-50-185-112 ~]# strace -p 1095
Process 1095 attached
read(46, 

So the daemon look to be reading on FD 46

[centos@ip-10-50-185-112 ~]$ sudo lsof -d 46                                                                                                                                                                                                                                                                                                                               
COMMAND     PID USER   FD   TYPE             DEVICE SIZE/OFF      NODE NAME
kube-prox   991 root   46u  IPv6              24429      0t0       TCP *:31624 (LISTEN)
docker     1095 root   46r  FIFO               0,18      0t0   3836156 /run/docker/libcontainerd/110d033df2a6bf66ddffce6aeef574148f04b25b61a4f931978933cec4f51116/init-stderr
master     1540 root   46u  unix 0xffff8800e787c740      0t0     24830 public/flush
docker-co  1583 root   46r  FIFO               0,18      0t0     36330 /run/containerd/d6bd98f8998e1e5638a8d0122e44ae452a30ff89a7e6733ab962188762b3785e/init/exit
kubelet    2244 root   46u  sock                0,6      0t0   6102239 protocol: TCPv6

Describe the results you expected: Docker to work

Additional information you deem important (e.g. issue happens only occasionally): Happens occasionally

Any additional information I can add today. I have taken this node out of use and left it hung. The only fix is to restart docker which I can hold off for 24 hours.

About this issue

  • Original URL
  • State: closed
  • Created 8 years ago
  • Comments: 59 (27 by maintainers)

Most upvoted comments

Hi, we’re having a similar issue across all of our docker machines that were recently upgraded to 1.12. I’ll try to get get stack traces out tomorrow. docker ps and all docker compose commands now hang. Going to try to rollback to previous version of docker.

I have the same issue with 1.12.0 too.

“docker ps” hangs forever and “docker stop” etc. access via swarm hangs too.

The log is absolutely quiet about this.

strace docker ps

…snip… epoll_ctl(4, EPOLL_CTL_ADD, 3, {EPOLLIN|EPOLLOUT|EPOLLRDHUP|EPOLLET, {u32=144861200, u64=140024668645392}}) = 0 getsockname(3, {sa_family=AF_LOCAL, NULL}, [2]) = 0 getpeername(3, {sa_family=AF_LOCAL, sun_path=“/var/run/docker.sock”}, [23]) = 0 futex(0xc82002c908, FUTEX_WAKE, 1) = 1 read(3, 0xc82034e000, 4096) = -1 EAGAIN (Resource temporarily unavailable) write(3, “GET /v1.24/containers/json HTTP/”…, 95) = 95 epoll_wait(4, [], 128, 0) = 0 futex(0x1326ca8, FUTEX_WAIT, 0, NULL

docker info

Containers: 11 Running: 4 Paused: 0 Stopped: 7 Images: 149 Server Version: 1.12.0 Storage Driver: aufs Root Dir: /var/lib/docker/aufs Backing Filesystem: extfs Dirs: 849 Dirperm1 Supported: true Logging Driver: json-file Cgroup Driver: cgroupfs Plugins: Volume: local Network: overlay null bridge host Swarm: inactive Runtimes: runc Default Runtime: runc Security Options: apparmor seccomp Kernel Version: 4.4.0-31-generic Operating System: Ubuntu 16.04.1 LTS OSType: linux Architecture: x86_64 CPUs: 8 Total Memory: 7.795 GiB Name: content-1 ID: MHM7:O6J2:ACPA:6ZPU:SQY4:WFCG:DV7O:YJ5K:TLLL:MH5N:2GXZ:QRR2 Docker Root Dir: /var/lib/docker Debug Mode (client): false Debug Mode (server): false Registry: https://index.docker.io/v1/ Labels: nodetype=worker-content provider=generic Cluster Store: consul://172.17.0.1:8500 Cluster Advertise: 192.168.1.13:2376 Insecure Registries: 127.0.0.0/8

We appear to be encountering similar after moving to docker 1.12.2. Restarting docker appears to be the only way to recover.

We are using an lvm thinpool devicemapper setup leveraging xfs and unfortunately, overlay is not very stable on Centos 7.2 so making that move is not currently an option. Doing so would be quite disruptive to existing workloads as well.

Here is the stack trace from dockerd: https://gist.github.com/sakserv/6aa7c7a1a8eac147e27d8c060023a36d

Docker info (note that even docker info hangs in my case):

Containers: 105
 Running: 19
 Paused: 0
 Stopped: 86
Images: 108
Server Version: 1.12.2
Storage Driver: devicemapper
 Pool Name: vg01-docker--pool
 Pool Blocksize: 524.3 kB
 Base Device Size: 274.9 GB
 Backing Filesystem: xfs
 Data file:
 Metadata file:
 Data Space Used: 853.2 GB
 Data Space Total: 5.63 TB
 Data Space Available: 4.777 TB
 Metadata Space Used: 105.9 MB
 Metadata Space Total: 16.98 GB
 Metadata Space Available: 16.87 GB
 Thin Pool Minimum Free Space: 563 GB
 Udev Sync Supported: true
 Deferred Removal Enabled: false
 Deferred Deletion Enabled: false
 Deferred Deleted Device Count: 0
 Library Version: 1.02.107-RHEL7 (2015-12-01)
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: host null overlay bridge
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Security Options: seccomp
Kernel Version: 3.10.0-327.13.1.el7.x86_64
Operating System: CentOS Linux 7 (Core)
OSType: linux
Architecture: x86_64
CPUs: 32
Total Memory: 251.6 GiB
Name: foo.example.com
ID: OSYP:WEPA:N2LF:KFTJ:BZNP:L3PT:LDNV:4OBJ:A4AM:CWFB:HHIK:WN4M
Docker Root Dir: /grid/0/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
WARNING: bridge-nf-call-iptables is disabled
WARNING: bridge-nf-call-ip6tables is disabled
Insecure Registries:
 127.0.0.0/8

Docker client strace:

-snip-
etpeername(4, {sa_family=AF_LOCAL, sun_path="/var/run/docker.sock"}, [23]) = 0
futex(0xc82004a908, FUTEX_WAKE, 1)      = 1
read(4, 0xc820327000, 4096)             = -1 EAGAIN (Resource temporarily unavailable)
write(4, "GET /v1.24/info HTTP/1.1\r\nHost: "..., 84) = 84
epoll_wait(5, {}, 128, 0)               = 0
futex(0x132cca8, FUTEX_WAIT, 0, NULL

dockerd strace:

Process 850182 attached
wait4(850188,

docker-containerd (pid 850188) strace:

Process 850188 attached
futex(0xee44c8, FUTEX_WAIT, 0, NULL

I have the same issue. I’m not using “docker exec” at all. After running “/etc/init.d/docker start” I can’t run a single docker command, even “docker version” is hung.

Version is “1.12.1_rc1 (Gentoo)”.

Tried to downgrade to 1.11, same issue.

How could I get docker to output all of the logs to one file on the host?

To keep your “current” approach, you can bind-mount a directory on your host, and have the container write to that (-v /path/on/host/:/var/log/), that way, you don’t have to exec into the container to access the log files.

However, that also doesn’t use the docker logging features. To do that, make sure the container writes logs to stdout / stderr (which can be as simple as symlinking the file, if the process expects a file to write to; see for example the nginx Dockerfile). After that, you can use a suitable logging driver for your use case to, e.g., log to syslog, collect logs with GELF, or splunk (see the logging drivers section

(sorry all for the off topic - back to the issue at hand)