moby: Docker Hangs Under Load

I have a small environment with four Docker containers on a single Centos 7.2 host. If I put the containers under load, Docker hangs. While it’s hung, docker stats shows no values, docker ps works, but docker exec and docker-compose down all hang.

I can avoid the hang in some cases by putting a long delay (30 seconds) between putting load on each container. In other words, I put load on container 1, wait 30 seconds, put load on container 2, etc, so after 2 minutes, all 4 containers are under load. However, once we’re all under load, if I issue a command against any container with docker exec, I get a hang.

The only fix is to stop Docker (service docker stop) and then restart.

strace against the dockerd process shows this:

Process 7097 attached wait4(7103,

strace against process 7103 (which is docker-containerd), shows this:

Process 7103 attached futex(0xed8348, FUTEX_WAIT, 0, NULL

I issued a sigusr1 against the dockerd and here is the trace:

dockerd-dump.txt

Steps to reproduce the issue:

  1. Start Docker service with docker-compose (or by hand).
  2. Put load on the containers (either with real load or by running stress-ng inside of each container)
  3. Run a command against one of the containers with docker exec
  4. It hangs.

Describe the results you received:

Docker became non-responsive. Docker stats shows all lines.

Describe the results you expected:

Docker runs as normal and accepts commands.

Additional information you deem important (e.g. issue happens only occasionally):

I can make this issue occur on demand.

Output of docker version: Client: Version: 1.12.1 API version: 1.24 Go version: go1.6.3 Git commit: 23cf638 Built: OS/Arch: linux/amd64

Server: Version: 1.12.1 API version: 1.24 Go version: go1.6.3 Git commit: 23cf638 Built: OS/Arch: linux/amd64

(paste your output here)

Output of docker info: Containers: 4 Running: 4 Paused: 0 Stopped: 0 Images: 10 Server Version: 1.12.1 Storage Driver: devicemapper Pool Name: docker-thinpool Pool Blocksize: 524.3 kB Base Device Size: 10.74 GB Backing Filesystem: xfs Data file: Metadata file: Data Space Used: 2.337 GB Data Space Total: 61.2 GB Data Space Available: 58.86 GB Metadata Space Used: 766 kB Metadata Space Total: 641.7 MB Metadata Space Available: 641 MB Thin Pool Minimum Free Space: 6.119 GB Udev Sync Supported: true Deferred Removal Enabled: true Deferred Deletion Enabled: false Deferred Deleted Device Count: 0 Library Version: 1.02.107-RHEL7 (2016-06-09) Logging Driver: json-file Cgroup Driver: cgroupfs Plugins: Volume: local Network: bridge host null overlay Swarm: inactive Runtimes: runc Default Runtime: runc Security Options: seccomp Kernel Version: 3.10.0-327.36.1.el7.x86_64 Operating System: CentOS Linux 7 (Core) OSType: linux Architecture: x86_64 CPUs: 2 Total Memory: 7.64 GiB Name: usildodnimladock1 ID: KZ2H:IWUH:KBGJ:PFG6:3GK7:HYLY:5AWW:TBQ3:AROJ:OZRJ:3LZP:RMI5 Docker Root Dir: /var/lib/docker Debug Mode (client): false Debug Mode (server): false Username: isaal01 Registry: https://index.docker.io/v1/ WARNING: bridge-nf-call-iptables is disabled WARNING: bridge-nf-call-ip6tables is disabled Insecure Registries: 127.0.0.0/8

(paste your output here)

Additional environment details (AWS, VirtualBox, physical, etc.): This is Docker running on Centos 7.2 in a VMWare 6 environment. The VM 2 vCPUs, 8GB of vRAM, and plenty of disk. The underlying physical host has 12 x 3.3GHz CPUs, and 98 GB of RAM. Both physical and virtual CPU usage is relatively low.

About this issue

  • Original URL
  • State: closed
  • Created 8 years ago
  • Comments: 15 (9 by maintainers)

Most upvoted comments

1.12.2-rc1 is passing my basic tests, so things are looking good. I’ll have better results in a day or so.

Is the orphaned children problem you’re referring to also known as the “Zombie Reaping Problem” as detailed here? https://blog.phusion.nl/2015/01/20/docker-and-the-pid-1-zombie-reaping-problem/

Or is it something different?