moby: 'docker ps' and 'docker run' commands never finish, other like 'info' are working
Description
Dockerd became unresponsive at 06:04:21 GMT (see logs). Commands like “docker ps” and “docker run” are never completed within reasonable time (1 min). “docker info” is working. Containers are running and weren’t restarted.
Steps to reproduce the issue: Unfortunately, I don’t know the way to reproduce the problem.
Additional information you deem important (e.g. issue happens only occasionally): This happened after about 2 weeks of running successfully with similar load.
Output of docker version
:
# docker version
Client:
Version: 1.12.6-cs7
API version: 1.24
Go version: go1.6.4
Git commit: 681cddc
Built: Tue Jan 24 18:01:10 2017
OS/Arch: linux/amd64
Server:
Version: 1.12.6-cs7
API version: 1.24
Go version: go1.6.4
Git commit: 681cddc
Built: Tue Jan 24 18:01:10 2017
OS/Arch: linux/amd64
Output of docker info
:
# docker info
Containers: 472
Running: 355
Paused: 0
Stopped: 117
Images: 12929
Server Version: 1.12.6-cs7
Storage Driver: aufs
Root Dir: /opt/io1/docker/aufs
Backing Filesystem: extfs
Dirs: 12143
Dirperm1 Supported: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: null host overlay bridge
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Security Options: apparmor seccomp
Kernel Version: 4.4.0-59-generic
Operating System: Ubuntu 16.04.1 LTS
OSType: linux
Architecture: x86_64
CPUs: 128
Total Memory: 1.876 TiB
Name: ip-10-69-11-89
ID: UGZS:UFD3:GB4C:W5MX:JU2L:K7PH:6ZWS:4GPM:27Q5:UNNN:X3DC:YDT7
Docker Root Dir: /opt/io1/docker
Debug Mode (client): false
Debug Mode (server): true
File Descriptors: 5526
Goroutines: 4451
System Time: 2017-02-07T07:56:27.892730562Z
EventsListeners: 2
Registry: https://index.docker.io/v1/
WARNING: No swap limit support
Insecure Registries:
127.0.0.0/8
Additional environment details (AWS, VirtualBox, physical, etc.):
- running on AWS, x1.32xlarge instance
- restarting the daemon takes very long time
# time systemctl restart docker
real 25m39.047s
user 0m0.028s
sys 0m0.012s
- after the restart, dockerd was behaving the same
- please see attached syslog from this period (with debug) and thread dump requested when the commands were unresponsive
2017-02-07-docker.log.txt 2017-02-07-dockerd_threaddump.log.txt
About this issue
- Original URL
- State: closed
- Created 7 years ago
- Comments: 23 (14 by maintainers)
@mlaventure thanks to your hint I found the related bug. And indeed this is due to the device mapper and the RHEL/CentOS kernel, but here is the correct issue: https://github.com/docker/docker/issues/27381#issuecomment-283676898
As reported by Dan Walsh there is a Kernel bug in RHEL until 7.3 which will be addressed in next point release (7.4). Until then, he provides a work around but this also means that one cannot use live restore for the time being.
This comment is just for the record.