moby: Docker Hangs Under Load
I have a small environment with four Docker containers on a single Centos 7.2 host. If I put the containers under load, Docker hangs. While it’s hung, docker stats shows no values, docker ps works, but docker exec and docker-compose down all hang.
I can avoid the hang in some cases by putting a long delay (30 seconds) between putting load on each container. In other words, I put load on container 1, wait 30 seconds, put load on container 2, etc, so after 2 minutes, all 4 containers are under load. However, once we’re all under load, if I issue a command against any container with docker exec, I get a hang.
The only fix is to stop Docker (service docker stop) and then restart.
strace against the dockerd process shows this:
Process 7097 attached wait4(7103,
strace against process 7103 (which is docker-containerd), shows this:
Process 7103 attached futex(0xed8348, FUTEX_WAIT, 0, NULL
I issued a sigusr1 against the dockerd and here is the trace:
Steps to reproduce the issue:
- Start Docker service with docker-compose (or by hand).
- Put load on the containers (either with real load or by running stress-ng inside of each container)
- Run a command against one of the containers with docker exec
- It hangs.
Describe the results you received:
Docker became non-responsive. Docker stats shows all lines.
Describe the results you expected:
Docker runs as normal and accepts commands.
Additional information you deem important (e.g. issue happens only occasionally):
I can make this issue occur on demand.
Output of docker version
:
Client:
Version: 1.12.1
API version: 1.24
Go version: go1.6.3
Git commit: 23cf638
Built:
OS/Arch: linux/amd64
Server: Version: 1.12.1 API version: 1.24 Go version: go1.6.3 Git commit: 23cf638 Built: OS/Arch: linux/amd64
(paste your output here)
Output of docker info
:
Containers: 4
Running: 4
Paused: 0
Stopped: 0
Images: 10
Server Version: 1.12.1
Storage Driver: devicemapper
Pool Name: docker-thinpool
Pool Blocksize: 524.3 kB
Base Device Size: 10.74 GB
Backing Filesystem: xfs
Data file:
Metadata file:
Data Space Used: 2.337 GB
Data Space Total: 61.2 GB
Data Space Available: 58.86 GB
Metadata Space Used: 766 kB
Metadata Space Total: 641.7 MB
Metadata Space Available: 641 MB
Thin Pool Minimum Free Space: 6.119 GB
Udev Sync Supported: true
Deferred Removal Enabled: true
Deferred Deletion Enabled: false
Deferred Deleted Device Count: 0
Library Version: 1.02.107-RHEL7 (2016-06-09)
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host null overlay
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Security Options: seccomp
Kernel Version: 3.10.0-327.36.1.el7.x86_64
Operating System: CentOS Linux 7 (Core)
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 7.64 GiB
Name: usildodnimladock1
ID: KZ2H:IWUH:KBGJ:PFG6:3GK7:HYLY:5AWW:TBQ3:AROJ:OZRJ:3LZP:RMI5
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Username: isaal01
Registry: https://index.docker.io/v1/
WARNING: bridge-nf-call-iptables is disabled
WARNING: bridge-nf-call-ip6tables is disabled
Insecure Registries:
127.0.0.0/8
(paste your output here)
Additional environment details (AWS, VirtualBox, physical, etc.): This is Docker running on Centos 7.2 in a VMWare 6 environment. The VM 2 vCPUs, 8GB of vRAM, and plenty of disk. The underlying physical host has 12 x 3.3GHz CPUs, and 98 GB of RAM. Both physical and virtual CPU usage is relatively low.
About this issue
- Original URL
- State: closed
- Created 8 years ago
- Comments: 15 (9 by maintainers)
1.12.2-rc1 is passing my basic tests, so things are looking good. I’ll have better results in a day or so.
Is the orphaned children problem you’re referring to also known as the “Zombie Reaping Problem” as detailed here? https://blog.phusion.nl/2015/01/20/docker-and-the-pid-1-zombie-reaping-problem/
Or is it something different?