moby: docker run stuck when daemon can't umount/remove container
Description
I was investigating some other issue and wrote a test script to reproduce it:
#!/bin/bash
set -e
set -u
set -o pipefail
TOTAL_CTS=100
IMG=busybox
CMD="sleep 0.2"
RUN="docker run --rm --net=none $IMG $CMD"
# load image, preheat
$RUN
time for i in $(seq 1 "$TOTAL_CTS"); do
printf "Iter: %5d Jobs: %5d\r" "$i" "$(jobs -p | wc -l)"
$RUN &
done
echo "Waiting..."
time wait
The script is supposed to run 100 short-lived containers via docker CLI and wait for all docker
children to finish. The problem is (it’s a separate issue), if docker daemon can’t unmount/remove a container, the corresponding docker cmdline is stuck (without reporting any error).
Docker daemon error (NOT the subject of this bug):
Sep 22 19:23:20 kir-ce73-gd dockerd[31494]: time="2017-09-22T19:23:20.322977590Z" level=error msg="Error removing mounted layer 5b920fbba5c5f330a0cb5ec3a296bc31cedaddeddb5ee0189836f6a08dae7d74: remove /var/lib/docker/overlay2/228429157bed9afebfb6f2d9c0f15c0fc1e43cf41f621fbbf71b2f11db2f7119/merged: device or resource busy"
Sep 22 19:23:20 kir-ce73-gd dockerd[31494]: time="2017-09-22T19:23:20.323097734Z" level=error msg="error removing container" container=5b920fbba5c5f330a0cb5ec3a296bc31cedaddeddb5ee0189836f6a08dae7d74 error="driver \"overlay2\" failed to remove root filesystem for 5b920fbba5c5f330a0cb5ec3a296bc31cedaddeddb5ee0189836f6a08dae7d74: remove /var/lib/docker/overlay2/228429157bed9afebfb6f2d9c0f15c0fc1e43cf41f621fbbf71b2f11db2f7119/merged: device or resource busy"
When the above happens, process docker run ...
is just stuck sitting there forever.
This was tested with both overlay2
and overlay
driver, and I guess this is the same with other graph drivers.
Steps to reproduce the issue:
- On a clean CentOS 7.4 system, install docker-ce 17.06
systemctl start docker
.sysctl fs.may_detach_mounts=0
(this is to reproduce the issue of not being able to umount container)- Run the above shell script.
Describe the results you received:
One or more docker run
commands stuck forever
Describe the results you expected:
docker run
exits with an error.
Additional information you deem important (e.g. issue happens only occasionally):
Output of docker version
:
[root@kir-ce73-gd ~]# docker version
Client:
Version: 17.06.1-ce
API version: 1.30
Go version: go1.8.3
Git commit: 874a737
Built: Thu Aug 17 22:53:49 2017
OS/Arch: linux/amd64
Server:
Version: 17.06.1-ce
API version: 1.30 (minimum version 1.12)
Go version: go1.8.3
Git commit: 874a737
Built: Thu Aug 17 23:01:50 2017
OS/Arch: linux/amd64
Experimental: false
Output of docker info
:
root@kir-ce73-gd ~]# docker info
Containers: 1
Running: 0
Paused: 0
Stopped: 1
Images: 1
Server Version: 17.06.1-ce
Storage Driver: overlay
Backing Filesystem: extfs
Supports d_type: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 6e23458c129b551d5c9871e5174f6b1b7f6d1170
runc version: 810190ceaa507aa2727d7ae6f4790c76ec150bd2
init version: 949e6fa
Security Options:
seccomp
Profile: default
Kernel Version: 3.10.0-693.2.2.el7.x86_64
Operating System: CentOS Linux 7 (Core)
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 1.796GiB
Name: kir-ce73-gd
ID: DF43:5A5K:WZRF:7OS7:BMGM:YPKR:IQVI:UVML:APS7:5ZME:KM2Y:RYG7
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
Additional environment details (AWS, VirtualBox, physical, etc.): A physical server running centos 7.3 upgraded to centos 7.4
About this issue
- Original URL
- State: open
- Created 7 years ago
- Comments: 19 (19 by maintainers)
Commits related to this issue
- ContainerWait on remove: don't stuck on rm fail Currently, if a container removal has failed for some reason, any client waiting for removal (e.g. `docker run --rm`) is stuck, waiting for removal to ... — committed to kolyshkin/moby by kolyshkin 7 years ago
- ContainerWait on remove: don't stuck on rm fail Currently, if a container removal has failed for some reason, any client waiting for removal (e.g. `docker run --rm`) is stuck, waiting for removal to ... — committed to docker/docker-ce by kolyshkin 7 years ago
- ContainerWait on remove: don't stuck on rm fail Currently, if a container removal has failed for some reason, any client waiting for removal (e.g. `docker run --rm`) is stuck, waiting for removal to ... — committed to salah-khan/moby by kolyshkin 7 years ago
As a separate note, a daemon might retry the container removal if it has failed. An analogy would be
EAGAIN
orEBUSY
returned from some Linux system calls – we can return something like this from say a graph driver, in which case the daemon is supposed to retry the removal.