moby: Upgrade to 1.12.6-cs results in dockerd unresponsive

Description

When docker is upgraded from 1.12.3-cs4 to 1.12.6-cs9 (or cs10), dockerd hangs in specific situations.

Steps to reproduce the issue:

  1. Create a Ubuntu 16.04 box (I have used EC2 r3.large instance, ami-f4cc1de2, us-east-1)
  2. Install docker-engine 1.12.3-cs4
curl -fsSL 'https://sks-keyservers.net/pks/lookup?op=get&search=0xee6d536cf7dc86e2d7d56f59a178ac6c6238f52e' | sudo apt-key add -

add-apt-repository \
   "deb https://packages.docker.com/1.12/apt/repo/ \
   ubuntu-$(lsb_release -cs) \
   main"

apt-get update

apt-get install --no-install-recommends \
    apt-transport-https \
    curl \
    software-properties-common

apt-get -y install docker-engine=1.12.3~cs4-0~xenial
  1. Run 100 containers and try to run docker ps a few times. It will run fast.
for i in {1..100}; do docker run -d -it --restart=always --name poc_$i talves/health_poc; done

then

time docker ps -qa | wc -l
  1. Upgrade to 1.12.6-cs10
apt-get -y install docker-engine
  1. Try to run one container (the command will run forever)
docker run -d -it --restart=always --name poc_1_12_6 talves/health_poc
  1. Try to run docker ps (it will take ages)
time docker ps -qa | wc -l
  1. Downgrade to cs4
apt-get install docker-engine=1.12.3~cs4-0~xenial

8, Repeat steps 5 and 6. They will work fine. If you upgrade to 17.03-ce, it will also work fine

Describe the results you received: Fast response of dockerd, regardless of docker version

Describe the results you expected: Docker 1.12.3 and 17.03 are fast, but 1.12.6-cs9 and cs10 are very slow under certain conditions

Additional information you deem important (e.g. issue happens only occasionally): I am using a custom docker image for tests, but you will get similar results if you use other images with healthcheck enabled:

docker run -d -it --restart=always --health-cmd='curl --fail http://localhost/ || exit 1' --health-interval 10s --health-timeout 1s --health-retries 3 --name poc_0 php:7.0-apache

Output of docker version:

Client:
 Version:      1.12.6-cs10
 API version:  1.24
 Go version:   go1.6.4
 Git commit:   54bb958
 Built:        Mon Mar  6 03:49:00 2017
 OS/Arch:      linux/amd64

Server:
 Version:      1.12.6-cs10
 API version:  1.24
 Go version:   go1.6.4
 Git commit:   54bb958
 Built:        Mon Mar  6 03:49:00 2017
 OS/Arch:      linux/amd64

Output of docker info:

Containers: 101
 Running: 48
 Paused: 0
 Stopped: 53
Images: 1
Server Version: 1.12.6-cs10
Storage Driver: devicemapper
 Pool Name: docker-202:1-275400-pool
 Pool Blocksize: 65.54 kB
 Base Device Size: 10.74 GB
 Backing Filesystem: xfs
 Data file: /dev/loop0
 Metadata file: /dev/loop1
 Data Space Used: 1.279 GB
 Data Space Total: 107.4 GB
 Data Space Available: 5.401 GB
 Metadata Space Used: 13.53 MB
 Metadata Space Total: 2.147 GB
 Metadata Space Available: 2.134 GB
 Thin Pool Minimum Free Space: 10.74 GB
 Udev Sync Supported: true
 Deferred Removal Enabled: false
 Deferred Deletion Enabled: false
 Deferred Deleted Device Count: 0
 Data loop file: /var/lib/docker/devicemapper/devicemapper/data
 WARNING: Usage of loopback devices is strongly discouraged for production use. Use `--storage-opt dm.thinpooldev` to specify a custom block storage device.
 Metadata loop file: /var/lib/docker/devicemapper/devicemapper/metadata
 Library Version: 1.02.110 (2015-10-30)
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: host null bridge overlay
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Security Options: apparmor seccomp
Kernel Version: 4.4.0-64-generic
Operating System: Ubuntu 16.04.2 LTS
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 14.94 GiB
Name: ip-10-69-11-232
ID: NQHR:6JJ6:REF6:6GMR:5PTF:QB4Z:EANI:PYUH:UMFI:Q2TH:ZC4Z:YUVM
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
WARNING: No swap limit support
Insecure Registries:
 127.0.0.0/8

Additional environment details (AWS, VirtualBox, physical, etc.): EC2 r3.large instance, ami-f4cc1de2, us-east-1

About this issue

  • Original URL
  • State: closed
  • Created 7 years ago
  • Comments: 17 (16 by maintainers)

Most upvoted comments

When you get a hang, can you send SIGUSR1 to the docker daemon and there should be a stack trace in the daemon logs.

Thanks!

+1

I also just got an unresponsive docker daemon (i.e. all docker commands stall indefinitely) after upgrading to 1.12.6.

sh-4.2# docker --version
Docker version 1.12.6, build 7392c3b/1.12.6

@thiagoalves can you also provide the daemon logs (on top of the stacktrace requested by @cpuguy83), there may be an issue during startup.

Is live-restore enabled?