moby: docker ps hangs. How to diagnose it ?

Description

I have 66 containers running on Ubuntu VM (Vmware ESXi) DISTRIB_ID=Ubuntu DISTRIB_RELEASE=14.04 DISTRIB_CODENAME=trusty DISTRIB_DESCRIPTION=“Ubuntu 14.04.5 LTS”

free -m
             total       used       free     shared    buffers     cached
Mem:         16047       8850       7196         13        370        969
-/+ buffers/cache:       7510       8537
Swap:         4095          0       4095

Docker version 1.12.3, build 6b644ec

Every day I come to work and my deploy system cannot connect to docker

fatal: [dev6]: FAILED! => {"changed": false, "failed": true, "msg": "ReadTimeout(ReadTimeoutError(\"UnixHTTPConnectionPool(host='localhost', port=None): Read timed out. (read timeout=60)\",),)"}

docker ps command hangs. I cant do anything except reboot server.

How to diagnose it ?

root@docker-01:~# ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 64100
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 65535
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 65535
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited
root@docker-01:~# cat /etc/default/docker

DOCKER_OPTS="-H tcp://10.129.4.103:2375 -H unix:///var/run/docker.sock"

root@docker-01:~# lsof -w | wc -l
167224
root@docker-01:~# docker info
Containers: 65
 Running: 62
 Paused: 0
 Stopped: 3
Images: 601
Server Version: 1.12.3
Storage Driver: aufs
 Root Dir: /var/lib/docker/aufs
 Backing Filesystem: extfs
 Dirs: 783
 Dirperm1 Supported: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host null overlay
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Security Options: apparmor
Kernel Version: 4.2.0-42-generic
Operating System: Ubuntu 14.04.5 LTS
OSType: linux
Architecture: x86_64
CPUs: 8
Total Memory: 15.67 GiB
Name: docker-01.public.exness.local
ID: K6WX:ERB4:NB7X:TDBL:DGMN:OKWQ:ECCZ:N5W3:CLPC:GALB:3O5E:4JUS
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
WARNING: No swap limit support
Insecure Registries:
 127.0.0.0/8

About this issue

  • Original URL
  • State: closed
  • Created 8 years ago
  • Comments: 18 (8 by maintainers)

Most upvoted comments

First thought: I don’t like the way srslog’s connect function demands that you must Lock before calling it. This means we’re holding the lock far too long, and putting responsibility for it in the wrong place.

I need to investigate whether it’s possible to make that locking more granular, and more focused on getting access to the conn rather than the use of it. (Because using conn should be threadsafe.)

Meanwhile, have you come up with a repeatable test case or is it only happening occasionally/intermittently?

Reboot away, I’ll check out this stack trace.