moby: Upgrade to 1.12.6-cs results in dockerd unresponsive
Description
When docker is upgraded from 1.12.3-cs4 to 1.12.6-cs9 (or cs10), dockerd hangs in specific situations.
Steps to reproduce the issue:
- Create a Ubuntu 16.04 box (I have used EC2 r3.large instance, ami-f4cc1de2, us-east-1)
 - Install docker-engine 1.12.3-cs4
 
curl -fsSL 'https://sks-keyservers.net/pks/lookup?op=get&search=0xee6d536cf7dc86e2d7d56f59a178ac6c6238f52e' | sudo apt-key add -
add-apt-repository \
   "deb https://packages.docker.com/1.12/apt/repo/ \
   ubuntu-$(lsb_release -cs) \
   main"
apt-get update
apt-get install --no-install-recommends \
    apt-transport-https \
    curl \
    software-properties-common
apt-get -y install docker-engine=1.12.3~cs4-0~xenial
- Run 100 containers and try to run docker ps a few times. It will run fast.
 
for i in {1..100}; do docker run -d -it --restart=always --name poc_$i talves/health_poc; done
then
time docker ps -qa | wc -l
- Upgrade to 1.12.6-cs10
 
apt-get -y install docker-engine
- Try to run one container (the command will run forever)
 
docker run -d -it --restart=always --name poc_1_12_6 talves/health_poc
- Try to run docker ps (it will take ages)
 
time docker ps -qa | wc -l
- Downgrade to cs4
 
apt-get install docker-engine=1.12.3~cs4-0~xenial
8, Repeat steps 5 and 6. They will work fine. If you upgrade to 17.03-ce, it will also work fine
Describe the results you received: Fast response of dockerd, regardless of docker version
Describe the results you expected: Docker 1.12.3 and 17.03 are fast, but 1.12.6-cs9 and cs10 are very slow under certain conditions
Additional information you deem important (e.g. issue happens only occasionally): I am using a custom docker image for tests, but you will get similar results if you use other images with healthcheck enabled:
docker run -d -it --restart=always --health-cmd='curl --fail http://localhost/ || exit 1' --health-interval 10s --health-timeout 1s --health-retries 3 --name poc_0 php:7.0-apache
Output of docker version:
Client:
 Version:      1.12.6-cs10
 API version:  1.24
 Go version:   go1.6.4
 Git commit:   54bb958
 Built:        Mon Mar  6 03:49:00 2017
 OS/Arch:      linux/amd64
Server:
 Version:      1.12.6-cs10
 API version:  1.24
 Go version:   go1.6.4
 Git commit:   54bb958
 Built:        Mon Mar  6 03:49:00 2017
 OS/Arch:      linux/amd64
Output of docker info:
Containers: 101
 Running: 48
 Paused: 0
 Stopped: 53
Images: 1
Server Version: 1.12.6-cs10
Storage Driver: devicemapper
 Pool Name: docker-202:1-275400-pool
 Pool Blocksize: 65.54 kB
 Base Device Size: 10.74 GB
 Backing Filesystem: xfs
 Data file: /dev/loop0
 Metadata file: /dev/loop1
 Data Space Used: 1.279 GB
 Data Space Total: 107.4 GB
 Data Space Available: 5.401 GB
 Metadata Space Used: 13.53 MB
 Metadata Space Total: 2.147 GB
 Metadata Space Available: 2.134 GB
 Thin Pool Minimum Free Space: 10.74 GB
 Udev Sync Supported: true
 Deferred Removal Enabled: false
 Deferred Deletion Enabled: false
 Deferred Deleted Device Count: 0
 Data loop file: /var/lib/docker/devicemapper/devicemapper/data
 WARNING: Usage of loopback devices is strongly discouraged for production use. Use `--storage-opt dm.thinpooldev` to specify a custom block storage device.
 Metadata loop file: /var/lib/docker/devicemapper/devicemapper/metadata
 Library Version: 1.02.110 (2015-10-30)
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: host null bridge overlay
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Security Options: apparmor seccomp
Kernel Version: 4.4.0-64-generic
Operating System: Ubuntu 16.04.2 LTS
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 14.94 GiB
Name: ip-10-69-11-232
ID: NQHR:6JJ6:REF6:6GMR:5PTF:QB4Z:EANI:PYUH:UMFI:Q2TH:ZC4Z:YUVM
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
WARNING: No swap limit support
Insecure Registries:
 127.0.0.0/8
Additional environment details (AWS, VirtualBox, physical, etc.): EC2 r3.large instance, ami-f4cc1de2, us-east-1
About this issue
- Original URL
 - State: closed
 - Created 7 years ago
 - Comments: 17 (16 by maintainers)
 
When you get a hang, can you send
SIGUSR1to the docker daemon and there should be a stack trace in the daemon logs.Thanks!
+1
I also just got an unresponsive docker daemon (i.e. all docker commands stall indefinitely) after upgrading to 1.12.6.
@thiagoalves can you also provide the daemon logs (on top of the stacktrace requested by @cpuguy83), there may be an issue during startup.
Is
live-restoreenabled?