moby: Unable to remove a stopped container: `device or resource busy`

Apologies if this is a duplicate issue, there seems to be several outstanding issues around a very similar error message but under different conditions. I initially added a comment on #21969 and was told to open a separate ticket, so here it is!


BUG REPORT INFORMATION

Output of docker version:

Client:
 Version:      1.11.0
 API version:  1.23
 Go version:   go1.5.4
 Git commit:   4dc5990
 Built:        Wed Apr 13 18:34:23 2016
 OS/Arch:      linux/amd64

Server:
 Version:      1.11.0
 API version:  1.23
 Go version:   go1.5.4
 Git commit:   4dc5990
 Built:        Wed Apr 13 18:34:23 2016
 OS/Arch:      linux/amd64

Output of docker info:

Containers: 2
 Running: 2
 Paused: 0
 Stopped: 0
Images: 51
Server Version: 1.11.0
Storage Driver: aufs
 Root Dir: /var/lib/docker/aufs
 Backing Filesystem: extfs
 Dirs: 81
 Dirperm1 Supported: false
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge null host
Kernel Version: 3.13.0-74-generic
Operating System: Ubuntu 14.04.3 LTS
OSType: linux
Architecture: x86_64
CPUs: 1
Total Memory: 3.676 GiB
Name: ip-10-1-49-110
ID: 5GAP:SPRQ:UZS2:L5FP:Y4EL:RR54:R43L:JSST:ZGKB:6PBH:RQPO:PMQ5
Docker Root Dir: /var/lib/docker
Debug mode (client): false
Debug mode (server): false
Registry: https://index.docker.io/v1/
WARNING: No swap limit support

Additional environment details (AWS, VirtualBox, physical, etc.):

Running on Ubuntu 14.04.3 LTS HVM in AWS on an m3.medium instance with an EBS root volume.

Steps to reproduce the issue:

  1. $ docker run --restart on-failure --log-driver syslog --log-opt syslog-address=udp://localhost:514 -d -p 80:80 -e SOME_APP_ENV_VAR myimage
  2. Container keeps shutting down and restarting due to a bug in the runtime and exiting with an error
  3. Manually running docker stop container
  4. Container is successfully stopped
  5. Trying to rm container then throws the error: Error response from daemon: Driver aufs failed to remove root filesystem 88189a16be60761a2c04a455206650048e784d750533ce2858bcabe2f528c92e: rename /var/lib/docker/aufs/diff/a48629f102d282572bb5df964eeec7951057b50f21df7abe162f8de386e76dc0 /var/lib/docker/aufs/diff/a48629f102d282572bb5df964eeec7951057b50f21df7abe162f8de386e76dc0-removing: device or resource busy
  6. Restart docker engine: $ sudo service docker restart
  7. $ docker ps -a shows that the container no longer exists.

About this issue

  • Original URL
  • State: closed
  • Created 8 years ago
  • Reactions: 55
  • Comments: 204 (48 by maintainers)

Commits related to this issue

Most upvoted comments

suffered from this issue for quite long time.

I just experienced this multiple times. I added MountFlags=private to the docker service to prevent further mount leaks, but I was sick of restarting the machine, so I went hunting for a way to get rid of the leaked mounts without restarting.

Looking for these leaked mounts, I noticed here that each pid has its own mountinfo at /proc/[pid]/mountinfo. So, to see where the leaks were I did:

$ grep docker /proc/*/mountinfo
/proc/13731/mountinfo:521 460 8:3 /var/lib/docker/overlay /var/lib/docker/overlay rw,relatime shared:309 - xfs /dev/sda3 rw,seclabel,attr2,inode64,noquota
/proc/13731/mountinfo:522 521 0:46 / /var/lib/docker/overlay/2a2dd584da9858fc9e5928d55ee47328712c43e52320b050ef64db87ef4d545a/merged rw,relatime shared:310 - overlay overlay rw,seclabel,lowerdir=/var/lib/docker/overlay/7cbf3db2f8b860ba964c88539402f35c464c36013efcb845bce2ee307348649f/root,upperdir=/var/lib/docker/overlay/2a2dd584da9858fc9e5928d55ee47328712c43e52320b050ef64db87ef4d545a/upper,workdir=/var/lib/docker/overlay/2a2dd584da9858fc9e5928d55ee47328712c43e52320b050ef64db87ef4d545a/work
/proc/13731/mountinfo:523 521 0:47 / /var/lib/docker/overlay/12f139bad50b1837a6eda1fe6ea5833853746825bd55ab0924d70cfefc057b54/merged rw,relatime shared:311 - overlay overlay rw,seclabel,lowerdir=/var/lib/docker/overlay/d607050a3f9cdf004c6d9dc9739a29a88c78356580db90a83c1d49720baa0e5d/root,upperdir=/var/lib/docker/overlay/12f139bad50b1837a6eda1fe6ea5833853746825bd55ab0924d70cfefc057b54/upper,workdir=/var/lib/docker/overlay/12f139bad50b1837a6eda1fe6ea5833853746825bd55ab0924d70cfefc057b54/work
/proc/13731/mountinfo:524 521 0:48 / /var/lib/docker/overlay/33fb78580b0525c97cde8f23c585b31a004c51becb0ceb191276985d6f2ba69f/merged rw,relatime shared:312 - overlay overlay rw,seclabel,lowerdir=/var/lib/docker/overlay/5e8f5833ef21c482df3d80629dd28fd11de187d1cbbfe8d00c0500470c4f4af2/root,upperdir=/var/lib/docker/overlay/33fb78580b0525c97cde8f23c585b31a004c51becb0ceb191276985d6f2ba69f/upper,workdir=/var/lib/docker/overlay/33fb78580b0525c97cde8f23c585b31a004c51becb0ceb191276985d6f2ba69f/work
/proc/13731/mountinfo:525 521 0:49 / /var/lib/docker/overlay/e6306bbab8a29f715a0d9f89f9105605565d26777fe0072f73d5b1eb0d39df26/merged rw,relatime shared:313 - overlay overlay rw,seclabel,lowerdir=/var/lib/docker/overlay/409a9e5c05600faa82d34e8b8e7b6d71bffe78f3e9eff30846200b7a568ecef0/root,upperdir=/var/lib/docker/overlay/e6306bbab8a29f715a0d9f89f9105605565d26777fe0072f73d5b1eb0d39df26/upper,workdir=/var/lib/docker/overlay/e6306bbab8a29f715a0d9f89f9105605565d26777fe0072f73d5b1eb0d39df26/work
/proc/13731/mountinfo:526 521 0:50 / /var/lib/docker/overlay/7b56a0220212d9785bbb3ca32a933647bac5bc8985520d6437a41bde06959740/merged rw,relatime shared:314 - overlay overlay rw,seclabel,lowerdir=/var/lib/docker/overlay/d601cf06e1682c4c30611d90b67db748472d399aec8c84487c96cfb118c060c5/root,upperdir=/var/lib/docker/overlay/7b56a0220212d9785bbb3ca32a933647bac5bc8985520d6437a41bde06959740/upper,workdir=/var/lib/docker/overlay/7b56a0220212d9785bbb3ca32a933647bac5bc8985520d6437a41bde06959740/work

That told me that process 13731 still had references to /var/lib/docker/overlay, so I (as root) entered the mount namespace of that process and removed the mounts:

$ nsenter -m -t 13731 /bin/bash
$ mount
<snipped mount output that verifies that it does see those mount points>
$ umount /var/lib/docker/overlay/*
$ umount /var/lib/docker/overlay
$ exit

At which point I could finally delete /var/lib/docker, restart the docker service (thus recreating everything in /var/lib/docker), and have no more issues.

try stopping the service (systemctl stop docker), then remove /var/lib/docker

Just for the sake of readiness for other people looking and asking the same question over:

  1. this is fixed in docker-ce >= 17.09 AND updated kernel (for CentOs familty, >=7.4)
  2. this is fixed (to be confirmed) in docker-ce 17.12.1 in all kernels.

“Device or resource busy” is a generic error message. Please read your error messages and make sure it’s exactly the error message above (ie, rename /var/lib/docker/aufs/diff/...

“Me too!” comments do not help.

@danielfoss There are many fixes in 1.11.0 that would resolve some device or resource busy issues on multiple storage drivers when trying to remove the container. 1.11.1 fixes only a specific case (mounting /var/run into a container).

I had the same problem using docker-compose rm

Driver aufs failed to remove root filesystem 88189a16be60761a2c04a455206650048e784d750533ce2858bcabe2f528c92e

What I did to fix the problem without restarting docker :

cat /sys/fs/cgroup/devices/docker/88189a16be60761a2c04a455206650048e784d750533ce2858bcabe2f528c92e/tasks

It give you the pid of the processes which run in devices subsystem (what is mounted and busy) located in the hierarchy in /docker/:containerid:

I succeeded to kill them : kill $(cat /sys/fs/cgroup/devices/docker/88189a16be60761a2c04a455206650048e784d750533ce2858bcabe2f528c92e/tasks)

After their death, the container was gone (successfully removed)

Version

Client: Version: 1.12.1 API version: 1.24 Go version: go1.6.3 Git commit: 23cf638 Built: Thu Aug 18 05:02:53 2016 OS/Arch: linux/amd64

Server: Version: 1.12.1 API version: 1.24 Go version: go1.6.3 Git commit: 23cf638 Built: Thu Aug 18 05:02:53 2016 OS/Arch: linux/amd64

My solution steps:

$ docker -v Docker version 17.09.0-ce, build afdb6d4 $ docker rm 805c245dad45 Error response from daemon: driver “overlay” failed to remove root filesystem for 805c245dad451542b44bb1b58c60887fa98a64a61f2f0b8de32fa5b13ccc8ce4: remove /var/lib/docker/overlay/8f666b802f418f4a3dc4a6cafbefa79afc81491a5cb23da8084dd14e33afbea0/merged: device or resource busy $ grep docker /proc/*/mountinfo /proc/21163/mountinfo:137 107 253:0 /var/lib/docker/overlay /var/lib/docker/overlay rw,relatime shared:91 - xfs /dev/mapper/cl-root rw,attr2,inode64,noquota /proc/21163/mountinfo:138 137 0:35 / /var/lib/docker/overlay/8f666b802f418f4a3dc4a6cafbefa79afc81491a5cb23da8084dd14e33afbea0/merged rw,relatime shared:92 - overlay overlay rw,lowerdir=/var/lib/docker/overlay/ad94365f2c83432c97dbcb91eba688d4f8158d01c48b8d6135843abd451d4c4c/root,upperdir=/var/lib/docker/overlay/8f666b802f418f4a3dc4a6cafbefa79afc81491a5cb23da8084dd14e33afbea0/upper,workdir=/var/lib/docker/overlay/8f666b802f418f4a3dc4a6cafbefa79afc81491a5cb23da8084dd14e33afbea0/work

$ ps -p 21163 -o comm= httpd $ systemctl stop httpd $ docker ps -a CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 805c245dad45 dperson/samba “samba.sh -u usr;1…” 2 days ago Dead sambaserver $ docker rm 805c245dad45 805c245dad45 $ docker ps -a CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES $

I met this issue on RHEL 7.2 with Docker 17.06.
when I run docker rm xxx it says

Error response from daemon: driver "overlay2" failed to remove root filesystem for 22136564e833e518579ecc856408194614904dfa2b2adb10bd9e95d7fd75bf15: remove /var/lib/docker/overlay2/3cece7f31f5c379174679e185167c867c1ec393fbf80451c2b18f78a2720aa59/merged: device or resource busy

it says 3cece7f31f5c379174679e185167c867c1ec393fbf80451c2b18f78a2720aa59 is busy. So I run grep docker /proc/*/mountinfo | grep 3cece, the result is

/proc/1687/mountinfo:641 549 0:48 / /var/lib/docker/overlay2/3cece7f31f5c379174679e185167c867c1ec393fbf80451c2b18f78a2720aa59/merged rw,relatime shared:198 - overlay overlay rw,lowerdir=/var/lib/docker/overlay2/l/TDQYOJHG4PQI27TKLUKLUVQFE6:/var/lib/docker/overlay2/l/O23B5NBM2RDWRDWROCHGCAZ4M4:/var/lib/docker/overlay2/l/BQ53Z7BQVPMXTA65L44VEZIILI:/var/lib/docker/overlay2/l/46PJNPP32OGYORW42CTXTRUH2Z:/var/lib/docker/overlay2/l/RYJEGP5BUO5U6MFZPO7LCUMBA7:/var/lib/docker/overlay2/l/53I4CSDHFNLGPE3NGEUDYMWLWA:/var/lib/docker/overlay2/l/WO4EMCT272IRDF4S6B2HLTU3EP:/var/lib/docker/overlay2/l/RPASQ2PDBKFRPOOUVYVW5H5EIQ:/var/lib/docker/overlay2/l/QIYYCMFFNA3365DOPN3K6KSCUN:/var/lib/docker/overlay2/l/VLIQBSYVPJNSSAJXPU2OHKNLGD:/var/lib/docker/overlay2/l/M3FVK2JFV5PGPKMP2TIXJBC2KD,upperdir=/var/lib/docker/overlay2/3cece7f31f5c379174679e185167c867c1ec393fbf80451c2b18f78a2720aa59/diff,workdir=/var/lib/docker/overlay2/3cece7f31f5c379174679e185167c867c1ec393fbf80451c2b18f78a2720aa59/work
/proc/1696/mountinfo:727 692 0:48 / /var/lib/docker/overlay2/3cece7f31f5c379174679e185167c867c1ec393fbf80451c2b18f78a2720aa59/merged rw,relatime shared:256 - overlay overlay rw,lowerdir=/var/lib/docker/overlay2/l/TDQYOJHG4PQI27TKLUKLUVQFE6:/var/lib/docker/overlay2/l/O23B5NBM2RDWRDWROCHGCAZ4M4:/var/lib/docker/overlay2/l/BQ53Z7BQVPMXTA65L44VEZIILI:/var/lib/docker/overlay2/l/46PJNPP32OGYORW42CTXTRUH2Z:/var/lib/docker/overlay2/l/RYJEGP5BUO5U6MFZPO7LCUMBA7:/var/lib/docker/overlay2/l/53I4CSDHFNLGPE3NGEUDYMWLWA:/var/lib/docker/overlay2/l/WO4EMCT272IRDF4S6B2HLTU3EP:/var/lib/docker/overlay2/l/RPASQ2PDBKFRPOOUVYVW5H5EIQ:/var/lib/docker/overlay2/l/QIYYCMFFNA3365DOPN3K6KSCUN:/var/lib/docker/overlay2/l/VLIQBSYVPJNSSAJXPU2OHKNLGD:/var/lib/docker/overlay2/l/M3FVK2JFV5PGPKMP2TIXJBC2KD,upperdir=/var/lib/docker/overlay2/3cece7f31f5c379174679e185167c867c1ec393fbf80451c2b18f78a2720aa59/diff,workdir=/var/lib/docker/overlay2/3cece7f31f5c379174679e185167c867c1ec393fbf80451c2b18f78a2720aa59/work

so I got two process 1696 and 1687, then I run ps -elf | grep '1696 and 1687, and stop the two process, then I can run docker rm xxx successful, also I start the two processes. Hope this will help someone.

@oopschen yes, that’s a known issue; the cAdvisor uses various bind-mounts, including /var/lib/docker, which causes mounts to leak, resulting in this problem.

If you get such error:

Unable to remove filesystem: /var/lib/docker/container/11667ef16239.../

The solution here(No need to execute service docker restart to restart docker):

# 1. find which process(pid) occupy the fs system
$ find /proc/*/mounts  |xargs -n1 grep -l -E '^shm.*/docker/.*/11667ef16239' | cut -d"/" -f3
1302   # /proc/1302/mounts

# 2. kill this process
$ sudo kill -9 1302

Restart the VM… should work!

This answer saved me grep docker /proc/*/mountinfo | grep 3cece

Thanks @imaemo I am going to upgrade my centos and docker

This answer saved me: $ grep 656cfd09aee399c8ae8c8d3e735fe48d70be6672773616e15579c8de18e2a3b3 /proc/*/mountinfo then find the related pid and kill it

which outputs something like this, where the number after /proc/ is the pid:

/proc/10001/mountinfo:179...

https://stackoverflow.com/a/47965269/2803344

It seems like there could be several different causes for this issue, but in my case following From @cognifloyd 's comment revealed that I had (quite a lot of) left over sleeping nginx processes in the host. In my setup nginx is being used as a proxy for various services that are on the docker containers.

Stopping nginx, removing the containers and starting it again was the fastest way to get rid of them for me.

systemctl stop nginx
docker container prune
systemctl start nginx

in my case, workaround by trying what @cognifloyd mentioned above:

  1. info
[root@test_node_02 ~]# docker info

Server Version: 17.06.0-ce
Storage Driver: overlay
 Backing Filesystem: extfs
 Supports d_type: true
Logging Driver: json-file
Cgroup Driver: cgroupfs

Plugins: 
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: active

Kernel Version: 3.10.0-514.21.1.el7.x86_64
Operating System: CentOS Linux 7 (Core)
OSType: linux
Architecture: x86_64
CPUs: 1
Total Memory: 1.797GiB

Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Experimental: false
  1. problem:
Error response from daemon: driver "overlay" failed to remove root filesystem for xxx: remove /var/lib/docker/overlay/xxx/merged: device or resource busy
  1. workaround
1. try to remove "dead" containers
[root@test_node_02 ~]# docker rm -f $(docker ps -a --filter status=dead -q |head -n 1)
Error response from daemon: driver "overlay" failed to remove root filesystem for 808acab2716420275cdb135ab964071cfc33406a34481354127635d3a282fa31: remove /var/lib/docker/overlay/88440438ea95b47e7459049fd765b51282afee4ad974107b0bf96d08d9c7763e/merged: device or resource busy


2. find pid in /proc/*/mountinfo
[root@test_node_02 ~]# grep -l --color `docker ps -a --filter status=dead -q |head -n 1` /proc/*/mountinfo


3. whois pid
[root@test_node_02 ~]# ps -f 7360
UID        PID  PPID  C STIME TTY      STAT   TIME CMD
root      7360  7344  1 Aug16 ?        Ssl   73:57 /usr/bin/cadvisor -logtostderr

  
4. also, we can determine they are on different mount namespaces
[root@test_node_02 ~]# ls -l /proc/$(cat /var/run/docker.pid)/ns/mnt /proc/7360/ns/mnt
lrwxrwxrwx 1 root root 0 Aug 21 15:55 /proc/11460/ns/mnt -> mnt:[4026531840]
lrwxrwxrwx 1 root root 0 Aug 21 15:55 /proc/7360/ns/mnt -> mnt:[4026532279]
[root@test_node_02 ~]# 



5. try to restart cadvisor
[root@test_node_01 ~]# docker service ls |grep cadvisor
5f001c9293cf        cadvisor            global              3/3                 google/cadvisor:latest
     
[root@test_node_01 ~]# docker service update --force cadvisor
[root@test_node_01 ~]#
  
  
6. remove again
[root@test_node_02 ~]# docker rm -f $(docker ps -a --filter status=dead -q |head -n 1)
808acab27164
[root@test_node_02 ~]#
 

conclusion: cadvisor or other containers which using volume contains ‘/var/lib/docker’ or ‘/’ will cause the problem. workaround: find the container/service, restart it. how to fix it: unknown.

has anyone a better solution then restarting the docker service (version 1.12)?

There seems to be 2 different problems here as I am unable to fix my issue using @simkim’s solution.

# docker rm b1ed3bf7dd6e
Error response from daemon: Driver aufs failed to remove root filesystem b1ed3bf7dd6e5d0298088682516ec8796d93227e4b21b769b36e720a4cfcb353: rename /var/lib/docker/aufs/mnt/acf9b10e85b8ad53e05849d641a32e646739d4cfa49c1752ba93468dee03b0cf /var/lib/docker/aufs/mnt/acf9b10e85b8ad53e05849d641a32e646739d4cfa49c1752ba93468dee03b0cf-removing: device or resource busy
# ls /sys/fs/cgroup/devices/docker/b1ed3bf7dd6e5d0298088682516ec8796d93227e4b21b769b36e720a4cfcb353
ls: cannot access /sys/fs/cgroup/devices/docker/b1ed3bf7dd6e5d0298088682516ec8796d93227e4b21b769b36e720a4cfcb353: No such file or directory
# mount | grep acf9b10e85b8ad53e05849d641a32e646739d4cfa49c1752ba93468dee03b0cf

In my case, the cgroup associated with my container seems to be correctly deleted. The filesystem is also unmounted.

The only solution for me is still to restart the Docker daemon.

I’m also seeing this problem on some machines and by taking a look at the code I think the original error is being obscured in here: https://github.com/docker/docker/blob/master/daemon/graphdriver/aufs/aufs.go#L275-L278

My guess is that the Rename error is happening due to an unsuccessful call to unmount. However, as the error message in unmount is logged using Debugf we won’t see it unless the daemon is started in debug mode. I’ll see if I can spin some servers with debug mode enabled and catch this error.

We have a series of patches coming in that should resolve this for all kernels:

I’m not sure if any of these will land for 18.02, but 18.03 should be do-able, though not all of these are in yet.

“resource busy” means some process is using it. Restarting the OS can release the lock. Then you can remove them.

Same issue on docker 17.03.0-ce. No way to get it back to work unless restarting docker daemon…

Update Stop cadvisor, then attempt to remove dead container & re-create it. Works for me.

Should avoid using this mountpoint if not necessary /var/lib/docker:/var/lib/docker:ro. It seems Cadvisor, with permission to access other container’s volumes could lock them on the run.

I encouter same problem and i google for while. It seems the cadvisor container lock the file. After remove the cadvisor container, i can remove the files under [dockerroot]/containers/xxxxxx.

A workaround was proposed in #25718 to set MountFlags=private in the docker.service configuration file of systemd. See https://github.com/docker/docker/issues/25718#issuecomment-250254918 and my following comment.

So far, this has solved the problem for me.

Hi, we found an workaround by restarting ntpd daemon. After that no container are in ‘removal in progress’ or ‘dead’ state. Strange, but for us worked…

Just to add to my previous comment, I still have the issue and always enforced to reboot or umount the way I described… my docker instance info:

Containers: 2
 Running: 1
 Paused: 0
 Stopped: 1
Images: 4
Server Version: 17.06.0-ce
Storage Driver: overlay
 Backing Filesystem: xfs
 Supports d_type: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins: 
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: cfb82a876ecc11b5ca0977d1733adbe58599088a
runc version: 2d41c047c83e09a6d61d464906feb2a2f3c52aa4
init version: 949e6fa
Security Options:
 seccomp
  Profile: default
Kernel Version: 3.10.0-514.26.2.el7.x86_64
Operating System: CentOS Linux 7 (Core)
OSType: linux
Architecture: x86_64
CPUs: 12
Total Memory: 19.61GiB
Name: lgh-dev01
ID: OI4M:BVZK:YGCD:M7DS:TD7X:WO3Q:WFHQ:UECY:N5A6:NHSX:4THI:HE5T
Docker Root Dir: /opt/docker-data
Debug Mode (client): false
Debug Mode (server): true
 File Descriptors: 26
 Goroutines: 31
 System Time: 2017-08-21T17:29:34.436543815+02:00
 EventsListeners: 0
Registry: https://index.docker.io/v1/
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false

Blaming this problem on the kernel of the most up-to-date release of a current operating system is rather disingenuous. Kernels are not written and tested to meet the specs of any particular app or service (or shouldn’t be). Rather, it might be better to specify which operating systems, kernels, and configurations are necessary to have this working correctly. Such as the environment where docker was developed and tested without error.

Incidentally, I nearly solved this problem by inserting delays in my scripts between docker commands. Its ugly, but I haven’t seen the problem in a while.

Like I tried to hint, mileage may vary. The (almost?) root cause is that the usage of MountFlags in systemd units causes systemd to create a new mount namespace for the service that is being started. Depending on what is set in MountFlags will change how mounts are progated from the host (or back to the host).

In the case of systemd-udevd, it is using MountFlags=slave, which means that any changes to mounts on the host will propagate to the systemd-udevd mount ns (where the service is running).

What should be happening is when an unmount occurs, that should propagate to systemd-udevd’s mount ns… however for some reason either this propagation isn’t happening or something in the mount ns is keeping the mount active, preventing removal even if the mount appears to be gone in the host’s mount ns.

I’m using systemd-udevd as an example here as I can reproduce it 100% of the time specifically with systemd-udevd, and can mitigate it either by stopping the service or disabling the MountFlags in the unit file for that service (and thus it will live in the host mount ns).

There could be a myriad of things causing the resource to remain busy on your system, including how other containers are running and what they are mounting in. For instance if you are mounting /var/lib/docker into a container on one of these old kernels it is likely going to cause the same problem.

@Vanuan It would, though I’d stick with the LTS kernel, which would be 4.9. The version it was mainlined is 3.15 (IIRC). It’s more of a bug fix than a feature. It lets you unmount resources that are mounted in other namespaces.

It’s independent of storage driver.

This is now one of the top Google search results. If docker hangs in my experience it’s usually because loses track of netns or overlayfs… this works for me:

sudo su
service docker stop &
sleep 10; killall -9 dockerd;
rm /var/run/docker.* 
for a in `mount|egrep '(docker|netns)'|awk '{print $3}'`; do umount $a; done; 
service docker stop
killall -9 dockerd;
service docker start

If it still chokes up… dockerd -D

If you have autostarting containers breaking it, then disable them while docker is off:

sed s@always@no@ -i /var/lib/docker/containers/*/hostconfig.json

Disclaimer: not sure where else "always" may occur in the hostconfig.json file, but on my containers I only see it under the RestartPolicy section.

I’m going to close this as resolved in 17.12.1.

Thanks!

@SalamiArmy : please do not add +1 but use the dedicated smiley button to avoid spam.

@warmchang : please do not add +1 but use the dedicated smiley button to avoid spam.

This may related to a kernel bug. here is the centos side bug [0]. There are several commits to fix this issue.

  • In CentOS7, this issue is fixed since kernel 3.10.0-693.2.2.el7.x86_64
  • If you are using a upstream kernel, please try to upgrade to kernel-3.18.0

[0] https://bugs.centos.org/view.php?id=10414&nbn=8

Yes, it was backported to 17.09 https://github.com/docker/docker-ce/pull/247

@archenroot Docker sets fs.may_detach_mounts=1 on startup… I want to say starting with 17.10, but this may have been backported, I can’t remember.

@fitz123 did you have to set the detach option in 7.4?

Looks like RHEL/CentOS 7.4 has a “detached mount” option: https://bugzilla.redhat.com/show_bug.cgi?id=1441737 It is “0” by default. Does it mean we should set it to “1”? Or does a recent docker yum package has this option included?

RHEL 7.4 kernel has introduced a new sysctl knob to control kernel behavior. This is called /proc/sys/fs/may_detach_mounts. This knob is set to value 0 by default. Container run times (docker and others) need the new behavior and
want it to be set to 1.

So modify runc package to drop a file say /usr/lib/sysctl.d/99-docker.conf. Contents of this file can be say following.

fs.may_detach_mounts=1

CentOS Linux release 7.4.1708 (Core) 3.10.0-693.5.2.el7.x86_64 17.06.2-ce

LOOKS LIKE IT IS FINALLY FIXED FOR RHEL/CENTOS !!! https://access.redhat.com/articles/2938171 https://seven.centos.org/2017/08/centos-linux-7-1708-based-on-rhel-7-4-source-code/

Still having the same issue under CentOS 7.4 ERROR: for blubb driver "overlay" failed to remove root filesystem for 7ce02bff8873d4ae7c04481579e67b0a1ff4ffddbfd8b3af739bb87920b8ec43: remove /var/lib/docker/overlay/f54408dd5947eb3d3b6b9321f730d8c5ed6ef6a3a7b3308bcab5dbf549444194/merged: device or resource busy One good thing: I can move the /var/lib/docker/overlay/f54408dd5947eb3d3b6b9321f730d8c5ed6ef6a3a7b3308bcab5dbf549444194 and then I can run the command successfull…

@antoinetran I think the main reason the configuration even exists is so that RedHat does not break any existing users (beyond dockerd users), as it is a behavior change.

Upstream kernels support this since 3.15, just without the configuration.

@dobryakov if you are using docker rm -f you would not see the error on 17.03, but the error would still occur.

I don’t know anything about /var/lib/docker but I’m not using any composer. It happens as often as 1 in 5 times that I stop and remove a container during development.

@lievendp The particular kernel feature is being able to do a detached mount while the mount exists in another namespace. I believe RH is planning on including this in RHEL 7.4, btw.

Generally speaking the only time you would really encounter this issue is if you’ve mounted the /var/lib/docker, or one of it’s parents into a container.

One potential work-around for this is to set MountFlags=slave in the docker systemd unit file. The reason this isn’t in by default is it can cause problems for certain use-cases.

My team has been faced with the problem every time shutting down docker containers. We are running a service with more than 100 docker containers and container advisors through swarm system. The only solution what I found is that shutting down forcefully several times until the message which indicates containers do not exists anymore is shown. It’s happening around 1 out of 5 containers. It seems 10 percent is really critical for business service.

OS: Ubuntu Xeniel Docker: v1.13.1 CAdvisor: v0.24.1

We had to restart docker service or, unluckily, linux servers because of the combination of network allocation bug and this container advisor bug. Luckily, the network allocation bug seems to be fixed in the latest docker binary.

Maybe:

docker volume prune -f

Do you still have images? What does docker images show? If so, try to remove them:

docker rmi -f [container id]

Finally:

docker rmi $(docker images --quiet --filter "dangling=true")

If none of those work, I can’t help you… (reboot the server, if you are able?)