moby: Driver devicemapper failed to remove root filesystem. Device is busy
Description
Cannot remove containers, docker reports Driver devicemapper failed to remove root filesystem. Device is busy
. This leaves containers in Dead
state.
Steps to reproduce the issue:
docker rm container_id
Describe the results you received:
Error message is displayed: Error response from daemon: Driver devicemapper failed to remove root filesystem ce2ea989895b7e073b9c3103a7312f32e70b5ad01d808b42f16655ffcb06c535: Device is Busy
Describe the results you expected: Container should be removed.
Additional information you deem important (e.g. issue happens only occasionally): This started to occur after upgrade from 1.11.2 to 1.12.2 and happens occasionally (10% of removals)
Output of docker version
:
Client:
Version: 1.12.2
API version: 1.24
Go version: go1.6.3
Git commit: bb80604
Built:
OS/Arch: linux/amd64
Server:
Version: 1.12.2
API version: 1.24
Go version: go1.6.3
Git commit: bb80604
Built:
OS/Arch: linux/amd64
Output of docker info
:
Containers: 83
Running: 72
Paused: 0
Stopped: 11
Images: 49
Server Version: 1.12.2
Storage Driver: devicemapper
Pool Name: data-docker_thin
Pool Blocksize: 65.54 kB
Base Device Size: 107.4 GB
Backing Filesystem: ext4
Data file:
Metadata file:
Data Space Used: 33.66 GB
Data Space Total: 86.72 GB
Data Space Available: 53.06 GB
Metadata Space Used: 37.3 MB
Metadata Space Total: 268.4 MB
Metadata Space Available: 231.1 MB
Thin Pool Minimum Free Space: 8.672 GB
Udev Sync Supported: true
Deferred Removal Enabled: false
Deferred Deletion Enabled: false
Deferred Deleted Device Count: 0
Library Version: 1.02.107-RHEL7 (2016-06-09)
Logging Driver: journald
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge null overlay host
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Security Options: seccomp
Kernel Version: 3.10.0-327.10.1.el7.x86_64
Operating System: CentOS Linux 7 (Core)
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 7.305 GiB
Name: us-2.c.evennode-1234.internal
ID: HVU4:BVZ3:QYUQ:IJ6F:Q2FP:Z4T3:MBKH:I4KC:XFIF:W5DV:4HZW:45NJ
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
WARNING: bridge-nf-call-iptables is disabled
WARNING: bridge-nf-call-ip6tables is disabled
Insecure Registries:
127.0.0.0/8
Additional environment details (AWS, VirtualBox, physical, etc.): All environments we run servers in - AWS, gcloud, physical, etc.
About this issue
- Original URL
- State: closed
- Created 8 years ago
- Reactions: 60
- Comments: 153 (54 by maintainers)
Commits related to this issue
- Allow longer for containers to be removed. Maybe due to https://github.com/docker/docker/issues/27381 — committed to ClusterHQ/flocker by wallrj 8 years ago
- Revert to stable docker Latest versions 1.12+ have removed a setting, `MountFlags=slave`, that is causing an error on device mapper. References: https://github.com/docker/docker/issues/27381 https:/... — committed to bpinto/dotfiles by bpinto 8 years ago
Just had the same problem on:
Doing
systemctl restart ntpd
fixed the problem instantly.Following message suggests that directory removal failed.
remove /var/lib/docker/containers/4d9bbd9b4da95f0ba1947055fa263a059ede9397bcf1456e6795f16e1a7f0543/shm: device or resource busy
And in older kernel it can fail because directory is mounted on in some other mount namespace. If you disable
deferred deletion
feature, this message will stop coming. But it will become some other error message.Core of the issue here is that container is either still running or some of its mount points have leaked into other some mount namespace. And if we can figure out which mount namespace it has leaked into and how it got there, we could try fixing it.
Once you run into this issue, you can try doing
find /proc/*/mounts | xargs grep "4d9bbd9b4da95f0ba1947055fa263a059ede9397bcf1456e6795f16e1a7f0543"
And then see which pids have mounts related to containers leaked into them. And that might give some idea.
Just managed to solve an instance of this issue with Docker 17.05 on arch Linux on 4.11.9 by
docker rm -f [myContainer]
(failing with thedriver "devicemapper" failed to remove root filesystem
as usual)ls /var/lib/docker/devicemapper/mnt/
This made the container finally disappear (not sure why though).
This is fixed in 17.12.1
Thanks all.
Pretty major issue this, effectively makes docker unusable on Centos 7/RHEL - (and been open for 4 months?) Any ETA?
I wrote up an article explaining why RHEL7 can not support --live-restore until RHEL7.4 and why docker should be run within a different mount namespace then the host.
https://access.redhat.com/articles/2938171
Last time I had this problem, it was
ntpd
that was holding the mounts. Today, I got the same problem, and this time, it was amariadb
instance running on the host that was the reason.Example for finding the proc holding the mounts…
After restarting mariadb, it let go of the mountpoints, however, it grabbed a lot of them when it started.
Seriously coming up on a year now and this bug is still here?
This is still an issue in 17.06, FYI, at least with CentOS 7.
@quexer of course, restarting docker can solve this problem, but, all the container will be restarted too, that is too costly for a production environment
@jcberthon We recently bit the bullet and made the transition to overlay2, and I’m so glad we did! Performance improved 40% in the benchmarks of our unit tests that do
docker run --rm
. The final straw for us for devmapper was issue #20401. Switching to overlay2 wasn’t very hard, but we have plenty of free disk space. I wrote a script todocker save
all of our images to tarballs and another script todocker load
all of the tarballs. We were done in 2-3 hours. I know it seems like a hassle and it can be if you don’t have enough disk space, but it will be worth it in the long run, I think. Good luck!These versions completely fix the issue for me, including
--live-restore
@rhvgoyal Do you have a plan on which release of docker to include this PR? We are still dealing with the
driver "devicemapper" failed to remove root filesystem
on a regular basis.@NeckBeardPrince Please don’t waste our time with such pointless commentary. If you’d like to help solve it, great. If you’d like to report some more data about the problem, great.
Other than that, there are a couple of ways of getting around this issue that have been posted here.
@rhvgoyal @rhatdan @vbatts Running into ‘Device busy’ issues during stopped/dead container deletion/removal on RHEL7.1 running dockerd 1.12.4 with deferred deletion & removal enabled=true without any MountFlags in systemd docker.service unit files. Also seeing kernel messages like: “kernel: device-mapper: thin: Deletion of thin device 120 failed.” (120 being the device id of the container-being-removed’s thinpool device)
In all cases, the devicemapper thinpool device mount point for the container being removed was leaked into mount namespace of another pid on the host which is being started with MountFlag=private/slave.
So, looks like it is very easy to leak mount points in host mount namespace, as the above system processes unshares some mount namespaces, by default, which can’t be changed/controlled individually.
Is running dockerd with mountflags=slave is the only solution here? Also can you help me understand why the mountflags=slave (and defaulting to shared) was removed some time back from docker systemd unit file. Under what scenarios, running dockerd with slave mount point propagation breaks other things? Thanks.
The issue has appeared again.
dockerd
nginx
Processes from previous command
@mlaventure Can this issue be re-opened? Right now we don’t know if this issue has been fixed or not.
The
docker rm -f [container]
will report a failure but eventually cleanup the container and filesystem. Thels
command is a red herring, all you really need is to wait a few seconds. But better than that is to useMountFlags=slave
. And best is to switch off of devicemapper and use overlay2 instead.Just as a note for others reading this thread, if you’re running cAdvisor from Google you will see this issue when trying to remove a container. First you need to stop cAdvisor, then remove the container, then start cAdvisor it again.
And it’s broken again.
Some improvements are coming down the pipeline.
Specifically the issue with these other system services appears to be a race condition with setting up mount namespaces (for those other system services) and docker’s attempt to keep it’s own mounts private… the intention is for Docker to keep it’s mounts from leaking into containers, unfortunately it’s causing leakages elsewhere and actually end up holding private references to those mountpoints which means they can’t be unmounted in those namespaces except either manually or when the process restarts.
In addition there’s been some recent changes to deal with race conditions with using MS_PRIVATE mount propagation in both runc and docker. Will the next version be perfect? Probably not… but I do expect this to get better.
Restarting ntpd fixed the issue I was having… so confusing. Is there any “recommended” daemon.json configuration for docker on Centos7?
I’m using docker 17.03 on centos 7.3 and kernel 4.10. and I have been seeing this error lot. below is few more details on MountFlag.
@KevinTHU restarting docker service DO NOT affect any running container. You can try it yourself.
I created a VM just now and I was able to (quite easily!) reproduce the issue.
This is what I did w/ Virtualbox:
systemctl enable docker
, and reboot the systemdocker-compose up -d
systemctl start nginx
docker-compose pull
docker-compose up -d
(it attempts to recreate the containers, and then gives me the “Device is Busy” error for both containers.Stopping nginx lets me remove the containers.
docker-compose.yml
as of now (I imitated my failing system’s docker setup):I can provide access to the VM upon request, just give me an email to send the login to.
Here we go:
before the fiexed release, rebooting the physical node will solve the problem
We had a very bad time with stock CentOS 7.2 kernels (their 3.10.x frankenstein). Lots of crashes. We were running Kubernetes in a dev env, so the churn of our containers was very high, but even in relatively quiet installations we found the stock CentOS+overlay combo very unstable. Running a 4.10+ upstream kernel with overlay2 is much better. Haven’t tried a newer CentOS release.
You will need to either use an underlying filesystem that is ext4 or XFS formatted with “-n ftype=1”. Docker will run if you have an improperly formatted XFS, but the results will be unpredictable.
Sure, I have my VM on standby.
Now the situation is as follows - we use
MountFlags=slave
and deferred removal and deletion and sometime the remote API throws an error that the device is busy and cannot be removed. However whendocker rm container
is called right after the error, it removes the container just fine.So nginx process is running in another container? Or it is running on host?