moby: docker run stuck when daemon can't umount/remove container

Description

I was investigating some other issue and wrote a test script to reproduce it:

#!/bin/bash

set -e
set -u
set -o pipefail

TOTAL_CTS=100

IMG=busybox
CMD="sleep 0.2"
RUN="docker run --rm --net=none $IMG $CMD"

# load image, preheat
$RUN

time for i in $(seq 1 "$TOTAL_CTS"); do
	printf "Iter: %5d Jobs: %5d\r" "$i" "$(jobs -p | wc -l)"
	$RUN &
done

echo "Waiting..."
time wait

The script is supposed to run 100 short-lived containers via docker CLI and wait for all docker children to finish. The problem is (it’s a separate issue), if docker daemon can’t unmount/remove a container, the corresponding docker cmdline is stuck (without reporting any error).

Docker daemon error (NOT the subject of this bug):

Sep 22 19:23:20 kir-ce73-gd dockerd[31494]: time="2017-09-22T19:23:20.322977590Z" level=error msg="Error removing mounted layer 5b920fbba5c5f330a0cb5ec3a296bc31cedaddeddb5ee0189836f6a08dae7d74: remove /var/lib/docker/overlay2/228429157bed9afebfb6f2d9c0f15c0fc1e43cf41f621fbbf71b2f11db2f7119/merged: device or resource busy"
Sep 22 19:23:20 kir-ce73-gd dockerd[31494]: time="2017-09-22T19:23:20.323097734Z" level=error msg="error removing container" container=5b920fbba5c5f330a0cb5ec3a296bc31cedaddeddb5ee0189836f6a08dae7d74 error="driver \"overlay2\" failed to remove root filesystem for 5b920fbba5c5f330a0cb5ec3a296bc31cedaddeddb5ee0189836f6a08dae7d74: remove /var/lib/docker/overlay2/228429157bed9afebfb6f2d9c0f15c0fc1e43cf41f621fbbf71b2f11db2f7119/merged: device or resource busy"

When the above happens, process docker run ... is just stuck sitting there forever.

This was tested with both overlay2 and overlay driver, and I guess this is the same with other graph drivers.

Steps to reproduce the issue:

  1. On a clean CentOS 7.4 system, install docker-ce 17.06
  2. systemctl start docker.
  3. sysctl fs.may_detach_mounts=0 (this is to reproduce the issue of not being able to umount container)
  4. Run the above shell script.

Describe the results you received: One or more docker run commands stuck forever

Describe the results you expected: docker run exits with an error.

Additional information you deem important (e.g. issue happens only occasionally):

Output of docker version:

[root@kir-ce73-gd ~]# docker version
Client:
 Version:      17.06.1-ce
 API version:  1.30
 Go version:   go1.8.3
 Git commit:   874a737
 Built:        Thu Aug 17 22:53:49 2017
 OS/Arch:      linux/amd64

Server:
 Version:      17.06.1-ce
 API version:  1.30 (minimum version 1.12)
 Go version:   go1.8.3
 Git commit:   874a737
 Built:        Thu Aug 17 23:01:50 2017
 OS/Arch:      linux/amd64
 Experimental: false

Output of docker info:

root@kir-ce73-gd ~]# docker info
Containers: 1
 Running: 0
 Paused: 0
 Stopped: 1
Images: 1
Server Version: 17.06.1-ce
Storage Driver: overlay
 Backing Filesystem: extfs
 Supports d_type: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins: 
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 6e23458c129b551d5c9871e5174f6b1b7f6d1170
runc version: 810190ceaa507aa2727d7ae6f4790c76ec150bd2
init version: 949e6fa
Security Options:
 seccomp
  Profile: default
Kernel Version: 3.10.0-693.2.2.el7.x86_64
Operating System: CentOS Linux 7 (Core)
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 1.796GiB
Name: kir-ce73-gd
ID: DF43:5A5K:WZRF:7OS7:BMGM:YPKR:IQVI:UVML:APS7:5ZME:KM2Y:RYG7
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false

Additional environment details (AWS, VirtualBox, physical, etc.): A physical server running centos 7.3 upgraded to centos 7.4

About this issue

  • Original URL
  • State: open
  • Created 7 years ago
  • Comments: 19 (19 by maintainers)

Commits related to this issue

Most upvoted comments

As a separate note, a daemon might retry the container removal if it has failed. An analogy would be EAGAIN or EBUSY returned from some Linux system calls – we can return something like this from say a graph driver, in which case the daemon is supposed to retry the removal.