moby: Docker daemon hanging
I have seen the daemon hang (1.8.3) under high load: https://github.com/docker/docker/issues/13885. However this appears different. I am now running 1.11.2 and had a hang under little to no load.
BUG REPORT INFORMATION
Output of docker version
:
centos@ip-10-50-185-106 ~]$ docker version
Client:
Version: 1.11.2
API version: 1.23
Go version: go1.5.4
Git commit: b9f10c9
Built: Wed Jun 1 21:23:11 2016
OS/Arch: linux/amd64
Cannot connect to the Docker daemon. Is the docker daemon running on this host?
[centos@ip-10-50-185-106 ~]$ sudo docker version
Client:
Version: 1.11.2
API version: 1.23
Go version: go1.5.4
Git commit: b9f10c9
Built: Wed Jun 1 21:23:11 2016
OS/Arch: linux/amd64
Server:
Version: 1.11.2
API version: 1.23
Go version: go1.5.4
Git commit: b9f10c9
Built: Wed Jun 1 21:23:11 2016
OS/Arch: linux/amd64
Output of docker info
:
Containers: 13
Running: 12
Paused: 0
Stopped: 1
Images: 11
Server Version: 1.11.2
Storage Driver: devicemapper
Pool Name: direct_lvm-thin_pool
Pool Blocksize: 65.54 kB
Base Device Size: 107.4 GB
Backing Filesystem: xfs
Data file:
Metadata file:
Data Space Used: 2.39 GB
Data Space Total: 66.57 GB
Data Space Available: 64.18 GB
Metadata Space Used: 4.375 MB
Metadata Space Total: 1.074 GB
Metadata Space Available: 1.069 GB
Udev Sync Supported: true
Deferred Removal Enabled: false
Deferred Deletion Enabled: false
Deferred Deleted Device Count: 0
Library Version: 1.02.107-RHEL7 (2016-06-09)
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: null host bridge
Kernel Version: 3.10.0-327.22.2.el7.x86_64
Operating System: CentOS Linux 7 (Core)
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 6.897 GiB
Name: ip-10-50-185-106.internal
ID: DDCD:TC7W:6V5N:QDUA:YQF6:24EU:5SVR:WY3L:VZ7X:4BRW:NKM4:INSA
Docker Root Dir: /var/lib/docker
Debug mode (client): false
Debug mode (server): false
Http Proxy: http://10.50.185.193:80
Https Proxy: http://10.50.185.193:80
No Proxy: 10.50.185.0/24,.internal,registry-k8.api.bskyb.com,master-test-k8.api.bskyb.com,localhost,127.0.0.0/8,::1,/var/run/docker.sock,169.254.169.254
Registry: https://index.docker.io/v1/
Additional environment details (AWS, VirtualBox, physical, etc.): AWS, Centos 7
Steps to reproduce the issue: Unknown
Describe the results you received: All docker client commands hang
Stracing docker client:
read(6, 0xc8203ea000, 4096) = -1 EAGAIN (Resource temporarily unavailable)
write(6, "GET /v1.23/containers/json HTTP/"..., 89) = 89
futex(0x21ef2b0, FUTEX_WAIT, 0, NULL
So basically no response back from the daemon.
Stracing docker daemon
[root@ip-10-50-185-112 ~]# strace -p 1095
Process 1095 attached
read(46,
So the daemon look to be reading on FD 46
[centos@ip-10-50-185-112 ~]$ sudo lsof -d 46
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
kube-prox 991 root 46u IPv6 24429 0t0 TCP *:31624 (LISTEN)
docker 1095 root 46r FIFO 0,18 0t0 3836156 /run/docker/libcontainerd/110d033df2a6bf66ddffce6aeef574148f04b25b61a4f931978933cec4f51116/init-stderr
master 1540 root 46u unix 0xffff8800e787c740 0t0 24830 public/flush
docker-co 1583 root 46r FIFO 0,18 0t0 36330 /run/containerd/d6bd98f8998e1e5638a8d0122e44ae452a30ff89a7e6733ab962188762b3785e/init/exit
kubelet 2244 root 46u sock 0,6 0t0 6102239 protocol: TCPv6
Describe the results you expected: Docker to work
Additional information you deem important (e.g. issue happens only occasionally): Happens occasionally
Any additional information I can add today. I have taken this node out of use and left it hung. The only fix is to restart docker which I can hold off for 24 hours.
About this issue
- Original URL
- State: closed
- Created 8 years ago
- Comments: 59 (27 by maintainers)
Hi, we’re having a similar issue across all of our docker machines that were recently upgraded to 1.12. I’ll try to get get stack traces out tomorrow. docker ps and all docker compose commands now hang. Going to try to rollback to previous version of docker.
I have the same issue with 1.12.0 too.
“docker ps” hangs forever and “docker stop” etc. access via swarm hangs too.
The log is absolutely quiet about this.
strace docker ps
…snip… epoll_ctl(4, EPOLL_CTL_ADD, 3, {EPOLLIN|EPOLLOUT|EPOLLRDHUP|EPOLLET, {u32=144861200, u64=140024668645392}}) = 0 getsockname(3, {sa_family=AF_LOCAL, NULL}, [2]) = 0 getpeername(3, {sa_family=AF_LOCAL, sun_path=“/var/run/docker.sock”}, [23]) = 0 futex(0xc82002c908, FUTEX_WAKE, 1) = 1 read(3, 0xc82034e000, 4096) = -1 EAGAIN (Resource temporarily unavailable) write(3, “GET /v1.24/containers/json HTTP/”…, 95) = 95 epoll_wait(4, [], 128, 0) = 0 futex(0x1326ca8, FUTEX_WAIT, 0, NULL
docker info
Containers: 11 Running: 4 Paused: 0 Stopped: 7 Images: 149 Server Version: 1.12.0 Storage Driver: aufs Root Dir: /var/lib/docker/aufs Backing Filesystem: extfs Dirs: 849 Dirperm1 Supported: true Logging Driver: json-file Cgroup Driver: cgroupfs Plugins: Volume: local Network: overlay null bridge host Swarm: inactive Runtimes: runc Default Runtime: runc Security Options: apparmor seccomp Kernel Version: 4.4.0-31-generic Operating System: Ubuntu 16.04.1 LTS OSType: linux Architecture: x86_64 CPUs: 8 Total Memory: 7.795 GiB Name: content-1 ID: MHM7:O6J2:ACPA:6ZPU:SQY4:WFCG:DV7O:YJ5K:TLLL:MH5N:2GXZ:QRR2 Docker Root Dir: /var/lib/docker Debug Mode (client): false Debug Mode (server): false Registry: https://index.docker.io/v1/ Labels: nodetype=worker-content provider=generic Cluster Store: consul://172.17.0.1:8500 Cluster Advertise: 192.168.1.13:2376 Insecure Registries: 127.0.0.0/8
We appear to be encountering similar after moving to docker 1.12.2. Restarting docker appears to be the only way to recover.
We are using an lvm thinpool devicemapper setup leveraging xfs and unfortunately, overlay is not very stable on Centos 7.2 so making that move is not currently an option. Doing so would be quite disruptive to existing workloads as well.
Here is the stack trace from dockerd: https://gist.github.com/sakserv/6aa7c7a1a8eac147e27d8c060023a36d
Docker info (note that even docker info hangs in my case):
Docker client strace:
dockerd strace:
docker-containerd (pid 850188) strace:
I have the same issue. I’m not using “docker exec” at all. After running “/etc/init.d/docker start” I can’t run a single docker command, even “docker version” is hung.
Version is “1.12.1_rc1 (Gentoo)”.
Tried to downgrade to 1.11, same issue.
To keep your “current” approach, you can bind-mount a directory on your host, and have the container write to that (
-v /path/on/host/:/var/log/
), that way, you don’t have to exec into the container to access the log files.However, that also doesn’t use the docker logging features. To do that, make sure the container writes logs to
stdout
/stderr
(which can be as simple as symlinking the file, if the process expects a file to write to; see for example the nginx Dockerfile). After that, you can use a suitable logging driver for your use case to, e.g., log to syslog, collect logs with GELF, or splunk (see the logging drivers section(sorry all for the off topic - back to the issue at hand)