moby: "Driver aufs failed to remove root filesystem" after calling "timedatectl status" on 1.12.0 with aufs

Output of docker version:

Client:
 Version:      1.12.0
 API version:  1.24
 Go version:   go1.6.3
 Git commit:   8eab29e
 Built:        Thu Jul 28 21:40:59 2016
 OS/Arch:      linux/amd64

Server:
 Version:      1.12.0
 API version:  1.24
 Go version:   go1.6.3
 Git commit:   8eab29e
 Built:        Thu Jul 28 21:40:59 2016
 OS/Arch:      linux/amd64

Output of docker info:

Containers: 13
 Running: 13
 Paused: 0
 Stopped: 0
Images: 195
Server Version: 1.12.0
Storage Driver: aufs
 Root Dir: /var/lib/docker/aufs
 Backing Filesystem: extfs
 Dirs: 268
 Dirperm1 Supported: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: null host bridge overlay
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Security Options:
Kernel Version: 3.16.0-4-amd64
Operating System: Debian GNU/Linux 8 (jessie)
OSType: linux
Architecture: x86_64
CPUs: 16
Total Memory: 29.5 GiB
Name: hello-4-225
ID: KI4A:WOXH:U5XJ:BGKR:HYCI:LCA4:Y35F:E5RD:ICI5:C5FO:V4UX:THIB
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
WARNING: No memory limit support
WARNING: No swap limit support
WARNING: No kernel memory limit support
WARNING: No oom kill disable support
WARNING: No cpu cfs quota support
WARNING: No cpu cfs period support
Insecure Registries:
 127.0.0.0/8

Additional environment details (AWS, VirtualBox, physical, etc.):

AWS and bare-metal servers

Steps to reproduce the issue: Run this in bash:

docker run -d --name hello ubuntu sleep 1000 ; timedatectl status ; docker stop hello ; docker rm hello

Describe the results you received:

# docker run -d --name hello ubuntu sleep 1000 ; timedatectl status ; docker stop hello ; docker rm hello
196a2f3a2fed3c7e54e38b6b678f252f6f77b3266903c66fef972e34c750bb0b
      Local time: Mon 2016-08-15 15:10:51 UTC
  Universal time: Mon 2016-08-15 15:10:51 UTC
        RTC time: Mon 2016-08-15 15:10:51
       Time zone: Etc/UTC (UTC, +0000)
     NTP enabled: yes
NTP synchronized: yes
 RTC in local TZ: no
      DST active: n/a
hello
Error response from daemon: Driver aufs failed to remove root filesystem 196a2f3a2fed3c7e54e38b6b678f252f6f77b3266903c66fef972e34c750bb0b: rename /var/lib/docker/aufs/mnt/e70acab187bae790b8e920a9420deb6c02ad4f659d51d9f265e9aa7699fec213 /var/lib/docker/aufs/mnt/e70acab187bae790b8e920a9420deb6c02ad4f659d51d9f265e9aa7699fec213-removing: device or resource busy

Describe the results you expected: The container should be successfully deleted.

Additional information you deem important (e.g. issue happens only occasionally): I was not able to reproduce it on 1.9.1.

About this issue

  • Original URL
  • State: closed
  • Created 8 years ago
  • Reactions: 3
  • Comments: 47 (29 by maintainers)

Most upvoted comments

@anusha-ragunathan the problem is the we do actually mount directories from the host, and I find it a very popular thing to do 😃

Thanks @anusha-ragunathan for your feeback. Here are my investigation results.

Yes, timedatectl seems to have an effect:

$ docker run -d --name hello ubuntu sleep 1000 ; docker stop hello ; docker rm hello
06a93d065246912f96a5e5386449244f561f3dd444e9711389b3196a104a070c
hello
hello

$ docker run -d --name hello ubuntu sleep 1000 ; timedatectl status ; docker stop hello ; docker rm hello
7311538700b94d123499b1129d8bb32b93da7a047b849b5f3c54beeaaf87e156
      Local time: Wed 2016-09-28 20:44:18 CEST
  Universal time: Wed 2016-09-28 18:44:18 UTC
        RTC time: Wed 2016-09-28 18:44:18
       Time zone: Europe/Paris (CEST, +0200)
     NTP enabled: yes
NTP synchronized: yes
 RTC in local TZ: no
      DST active: yes
 Last DST change: DST began at
                  Sun 2016-03-27 01:59:59 CET
                  Sun 2016-03-27 03:00:00 CEST
 Next DST change: DST ends (the clock jumps one hour backwards) at
                  Sun 2016-10-30 02:59:59 CEST
                  Sun 2016-10-30 02:00:00 CET
hello
Error response from daemon: Driver aufs failed to remove root filesystem 7311538700b94d123499b1129d8bb32b93da7a047b849b5f3c54beeaaf87e156: rename /var/lib/docker/aufs/mnt/d28da9c02ad0f99bde6911207852af8b7c08e2959fce19f3ba4691301d5ba317 /var/lib/docker/aufs/mnt/d28da9c02ad0f99bde6911207852af8b7c08e2959fce19f3ba4691301d5ba317-removing: device or resource busy

I then tried to add the MountFlags=private to the systemd configuration. After systemctl daemon-reload and systemctl restart docker (I am not an expert at systemd and I hope those commands are correct).

This changed apparently fixed the problem. I do not understand why but I will investigate this parameter further.

$ cat /lib/systemd/system/docker.service 
[Unit]
Description=Docker Application Container Engine
Documentation=https://docs.docker.com
After=network.target docker.socket
Requires=docker.socket

[Service]
Type=notify
# the default is not to use systemd for cgroups because the delegate issues still
# exists and systemd currently does not support the cgroup feature set required
# for containers run by docker
ExecStart=/usr/bin/dockerd -H fd://
ExecReload=/bin/kill -s HUP $MAINPID
# Having non-zero Limit*s causes performance problems due to accounting overhead
# in the kernel. We recommend using cgroups to do container-local accounting.
LimitNOFILE=infinity
LimitNPROC=infinity
LimitCORE=infinity
# Uncomment TasksMax if your systemd version supports it.
# Only systemd 226 and above support this version.
#TasksMax=infinity
TimeoutStartSec=0
# set delegate yes so that systemd does not reset the cgroups of docker containers
Delegate=yes
# kill only the docker process, not all processes in the cgroup
KillMode=process
MountFlags=private

[Install]
WantedBy=multi-user.target

$ docker run -d --name hello ubuntu sleep 1000 ; timedatectl status ; docker stop hello ; docker rm hello
f5b375e948d7513e968b0c60741cd02878336fc2d76676e6bbe94103f6d6685c
      Local time: Wed 2016-09-28 21:01:49 CEST
  Universal time: Wed 2016-09-28 19:01:49 UTC
        RTC time: Wed 2016-09-28 19:01:49
       Time zone: Europe/Paris (CEST, +0200)
     NTP enabled: yes
NTP synchronized: yes
 RTC in local TZ: no
      DST active: yes
 Last DST change: DST began at
                  Sun 2016-03-27 01:59:59 CET
                  Sun 2016-03-27 03:00:00 CEST
 Next DST change: DST ends (the clock jumps one hour backwards) at
                  Sun 2016-10-30 02:59:59 CEST
                  Sun 2016-10-30 02:00:00 CET
hello
hello

I can reproduce it on 1.12, same bug as with #22260, but this one reproductible 👍

Error response from daemon: Driver aufs failed to remove root filesystem 2966cd5b898c822762119b5d8c83f36659e9a891a19e950308dd02c2a729a74e: rename /var/lib/docker/aufs/mnt/41f77aaf79de753f0255d1f4f5894b7df0efa6a3c490337bea5b4641c2e6997d /var/lib/docker/aufs/mnt/41f77aaf79de753f0255d1f4f5894b7df0efa6a3c490337bea5b4641c2e6997d-removing: device or resource busy

sudo fuser -m /var/lib/docker/aufs/mnt/41f77aaf79de753f0255d1f4f5894b7df0efa6a3c490337bea5b4641c2e6997d /var/lib/docker/aufs/mnt/41f77aaf79de753f0255d1f4f5894b7df0efa6a3c490337bea5b4641c2e6997d: 11303 11305 11306 20416m 20631 21446 24748m 25131 25143 25146 26705 26716

999      25146  0.0  0.0 227580  6432 ?        Ss   août25   0:00 postgres: autovacuum launcher process
root     26705  0.0  0.1 200684 40316 ?        S    août25   0:10 /usr/local/bin/python2 /usr/local/bin/pootle runserver --insecure --noreload 0.0.0.0:8000
999      26716  0.0  0.0 228708 10108 ?        Ss   août25   0:00 postgres: pootle pootle 172.17.0.7(58121) idle
root     24748  0.1  0.2 2070556 83144 ?       Ssl  août25   3:03 /usr/bin/dockerd -H fd://

pid are from other container and docker daemon

@chhsia0 “device or resource busy” issues are known to be more troublesome in older kernels (you’ll find several reports of this, and some workarounds / suggestions to make it less likely); the upcoming 7.4 release of RHEL/CentOS will have a kernel with fixes for this, that may help

Dug up some history:

daemon flags changed to default “shared” mount propagation in https://github.com/docker/docker/commit/2aee081cad72352f8b0c37ba0414ebc925b022e8 in an effort to help volume mounts to ensure shared. This change made it to 1.12, which is probably why there’s more reports since then.

Prior to this, the flags were indeed set to “slave” in https://github.com/docker/docker/commit/eb76cb2301fc883941bc4ca2d9ebc3a486ab8e0a to avoid EBUSY errors on container remove. There’s a blog post on why this was done. Although it mentions devicemapper, the problem exists across graphdrivers. http://blog.hashbangbash.com/2014/11/docker-devicemapper-fix-for-device-or-resource-busy-ebusy/

@simkim @genezys : Sorry for the multiple questions, but it’s great that you have a consistent repro and hence the line of questioning. What is the MountFlags set on your hosts? You can look at the docker systemd service file. It’s typically located in /lib/systemd/system/docker.service. If there’s no entry for “MountFlags”, then the default is “shared”. If it is indeed “shared”, then can you change is to “private” and see if the issue reproduces again.

But we do have extra debug details in the daemon logs now if you have fuser installed
 which on Debian looks to be provided by the psmisc package. Can you take a minute to enable debug logging and then reproduce?