moby: 'docker ps' and 'docker run' commands never finish, other like 'info' are working

Description

Dockerd became unresponsive at 06:04:21 GMT (see logs). Commands like “docker ps” and “docker run” are never completed within reasonable time (1 min). “docker info” is working. Containers are running and weren’t restarted.

Steps to reproduce the issue: Unfortunately, I don’t know the way to reproduce the problem.

Additional information you deem important (e.g. issue happens only occasionally): This happened after about 2 weeks of running successfully with similar load.

Output of docker version:

# docker version
Client:
 Version:      1.12.6-cs7
 API version:  1.24
 Go version:   go1.6.4
 Git commit:   681cddc
 Built:        Tue Jan 24 18:01:10 2017
 OS/Arch:      linux/amd64

Server:
 Version:      1.12.6-cs7
 API version:  1.24
 Go version:   go1.6.4
 Git commit:   681cddc
 Built:        Tue Jan 24 18:01:10 2017
 OS/Arch:      linux/amd64

Output of docker info:

# docker info
Containers: 472
 Running: 355
 Paused: 0
 Stopped: 117
Images: 12929
Server Version: 1.12.6-cs7
Storage Driver: aufs
 Root Dir: /opt/io1/docker/aufs
 Backing Filesystem: extfs
 Dirs: 12143
 Dirperm1 Supported: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: null host overlay bridge
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Security Options: apparmor seccomp
Kernel Version: 4.4.0-59-generic
Operating System: Ubuntu 16.04.1 LTS
OSType: linux
Architecture: x86_64
CPUs: 128
Total Memory: 1.876 TiB
Name: ip-10-69-11-89
ID: UGZS:UFD3:GB4C:W5MX:JU2L:K7PH:6ZWS:4GPM:27Q5:UNNN:X3DC:YDT7
Docker Root Dir: /opt/io1/docker
Debug Mode (client): false
Debug Mode (server): true
 File Descriptors: 5526
 Goroutines: 4451
 System Time: 2017-02-07T07:56:27.892730562Z
 EventsListeners: 2
Registry: https://index.docker.io/v1/
WARNING: No swap limit support
Insecure Registries:
 127.0.0.0/8

Additional environment details (AWS, VirtualBox, physical, etc.):

  • running on AWS, x1.32xlarge instance
  • restarting the daemon takes very long time
# time systemctl restart docker

real    25m39.047s
user    0m0.028s
sys     0m0.012s
  • after the restart, dockerd was behaving the same
  • please see attached syslog from this period (with debug) and thread dump requested when the commands were unresponsive

2017-02-07-docker.log.txt 2017-02-07-dockerd_threaddump.log.txt

About this issue

  • Original URL
  • State: closed
  • Created 7 years ago
  • Comments: 23 (14 by maintainers)

Most upvoted comments

@mlaventure thanks to your hint I found the related bug. And indeed this is due to the device mapper and the RHEL/CentOS kernel, but here is the correct issue: https://github.com/docker/docker/issues/27381#issuecomment-283676898

As reported by Dan Walsh there is a Kernel bug in RHEL until 7.3 which will be addressed in next point release (7.4). Until then, he provides a work around but this also means that one cannot use live restore for the time being.

This comment is just for the record.