moby: overlay2 + linux v4.13: error creating overlay mount to /var/lib/docker/overlay2/ID/merged: device or resource busy

Description

The overlay driver in the kernel, starting with 4.13, will return an error for overlay mounts that re-use the upper dir. This error was introduced in this patch.

Using docker 17.06.1-ce on the 4.13-rc6 kernel I can unreliably reproduce this error message. I’ve only ever observed it on the first container run, and only infrequently. I assume that there are two mounts that race and sometimes clash.

Steps to reproduce the issue:

  1. Install the 4.13 kernel
  2. Boot the machine with an empty /var/lib/docker directory. Start dockerd, and as soon as possible, run a few dozen containers in parallel.
  3. Occasionally, (perhaps 1 out of 30 runs), get the error “error creating overlay mount to /var/lib/docker/overlay2/ID/merged: device or resource busy”
  4. Note that the dmesg output includes “overlayfs: upperdir is in-use by another mount”

Note that this only impacts running multiple containers at once. Serializing all container runs avoids it.

Output of docker version:

$ docker version
Client:
 Version:      17.06.1-ce
 API version:  1.30
 Go version:   go1.8.2
 Git commit:   874a737
 Built:        Sat Aug 26 01:07:04 2017
 OS/Arch:      linux/amd64

Server:
 Version:      17.06.1-ce
 API version:  1.30 (minimum version 1.12)
 Go version:   go1.8.2
 Git commit:   874a737
 Built:        Fri Aug 25 18:06:27 2017
 OS/Arch:      linux/amd64
 Experimental: false

Output of docker info:

Containers: 0
 Running: 0
 Paused: 0
 Stopped: 0
Images: 0
Server Version: 17.06.1-ce
Storage Driver: overlay2
 Backing Filesystem: extfs
 Supports d_type: true
 Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins: 
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 6e23458c129b551d5c9871e5174f6b1b7f6d1170
runc version: 810190ceaa507aa2727d7ae6f4790c76ec150bd2
init version: v0.13.2 (expected: 949e6facb77383876aeff8a6944dde66b3089574)
Security Options:
 seccomp
  Profile: default
 selinux
Kernel Version: 4.13.0-rc6-coreos
Operating System: Container Linux by CoreOS 1506.0.0+2017-08-25-1813 (Ladybug)
OSType: linux
Architecture: x86_64
CPUs: 1
Total Memory: 996.3MiB
Name: localhost
ID: ZBQY:PD55:UTX2:K2N4:CPQJ:HWIY:SOIQ:IC6P:NNXT:YKUZ:XFNP:ESWJ
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false

Additional environment details (AWS, VirtualBox, physical, etc.):

I’ve seen it on AWS and Qemu, presumably happens on all.

I’ve also reported this issue over here on the CoreOS bug tracker.

About this issue

  • Original URL
  • State: closed
  • Created 7 years ago
  • Reactions: 8
  • Comments: 37 (17 by maintainers)

Commits related to this issue

Most upvoted comments

I just experienced this myself on Arch - thanks for your help resolving it!

For other Arch users looking for the tl;dr version of what people above are suggesting, create a file /etc/modprobe.d/overlay.conf with the following contents:

options overlay index=off

and reboot (or rmmod overlay && modprobe overlay). That resolved the issue for me.

I’ve encountered this issue everytime I do a system upgrade ( Arch - Linux Kernel 4.14.6-1 ). A way to fix this without having to reboot your machine is the following chain of commands ( restarting your docker and regenerating the dependency tree for system configuration services ).

Bear in mind that this will only work if you have docker managed as a service rather than running it manually

systemctl stop docker.service && systemctl start docker.service && systemctl daemon-reload

As a sidenote, I haven’t come across this issue during runtime, only after upgrading systemwide dependencies.

This looks like another variant of #34573 (effectively mount leaks are causing EBUSY in a variety of places). Honestly we should re-architect Docker/containerd so that it spawns runC in different mount namespaces (with the rootfs mount only done privately in each mount namespace) so that the mounts won’t be leakable to other runcs.

@huegelc Yes, as of 4.13.6. There are two possible scenarios which will result in this specific issue (based on my understanding):

  1. Kernel between 4.13.0 and 4.13.6 — update your kernel or wait for docker to include #34948
  2. Kernel >= 4.13.6, but with CONFIG_OVERLAY_FS_INDEX=ydisable that feature (either at module load time or by setting it to n) or wait for docker to include #34948

Without #34948, there will still be warnings in dmesg in those two scenarios, but they can be safely ignored.

Because this is not a harmful issue on an up-to-date kernel and because the moby codebase (if not docker) has been updated to handle this issue better, I’m closing this bug.

After a bit more adventuring, I believe I’ve tracked this down to two distinct problems which both cause this issue identically!

First, a helpful trick for reproducing this – add a 1-3 second sleep before pivoting root: https://github.com/opencontainers/runc/blob/593914b8bd5448a93f7c3e4902a03408b6d5c0ce/libcontainer/rootfs_linux.go#L98-L103

And a few hundred ms in the Put code here: https://github.com/moby/moby/blob/ba317637de9b9918cdc2139466dd51c6200bd158/daemon/graphdriver/overlay2/overlay.go#L610

After those changes, it reliably reproduces running just two containers at once, which made it much easier to continue investigating.

Anyways, it turns out that if you look through every mount namespace for references to the overlayfs mount, you’ll find that the runc init process for another container sometimes has a copy of it still mounted, despite it being unmounted and gone from the host mount namespace.

This copy is a private copy of the mount meaning our host umount won’t get it, and we’re at the mercy of this other container’s runc init to eventually clean it up.

I’ve created a commented reproduction of what docker and runc are doing namespace and mount wise leading up to this ebusy:

#!/bin/bash
set -x

# c1 and c2 represent two different docker containers starting at once
c1=1
c2=2

function ovlOpts() {
	echo -n "lowerdir=$tmpdir/lower,upperdir=$tmpdir/$1/diff,workdir=$tmpdir/$1/work"
}

tmpdir=$(mktemp -d) # 'overlay2' graphdriver dir

mkdir -p $tmpdir/{$c1,$c2}/{diff,merged,work}
mkdir -p $tmpdir/lower

# overlay2 driver in its setup code does this
# https://github.com/moby/moby/blob/ba317637de9b9918cdc2139466dd51c6200bd158/daemon/graphdriver/overlay2/overlay.go#L178
mount --bind $tmpdir $tmpdir
mount --make-private $tmpdir

# Container 2 sets up 
# https://github.com/moby/moby/blob/ba317637de9b9918cdc2139466dd51c6200bd158/daemon/graphdriver/overlay2/overlay.go#L589
mount -t overlay overlay -o "$(ovlOpts $c2)" $tmpdir/$c2/merged 

# Container 1 starts setting up
mount -t overlay overlay -o "$(ovlOpts $c1)" $tmpdir/$c1/merged 

# Container 2 runs 'runc init' code in parallel
(
  # https://github.com/opencontainers/runc/blob/8b47a242a9aebdfe1c0c2b6513368f736d505bf0/libcontainer/nsenter/nsexec.c#L823
  unshare -m --propagation unchanged -- bash <<EOF
  # Now runc init remounts /
  # https://github.com/opencontainers/runc/blob/e385f67a0e45fa1d8ef8154e2aea5128ea1d331b/libcontainer/rootfs_linux.go#L599-L605
  # Due to how the config conversion works, config.RootPropagation is never 0,
  # and defaults instead to MS_PRIVATE | MS_REC. I'll PR a fix
  mount --make-rprivate /
  # Now a bunch of init stuff happens, including premount cmds and hooks
  sleep 1
  # .. and then pivot root happens which cleans up our old root
  # It's hard to do in shell, so we'll just pretend an umount of / is close enough
  # https://github.com/opencontainers/runc/blob/e385f67a0e45fa1d8ef8154e2aea5128ea1d331b/libcontainer/rootfs_linux.go#L676
  cd /
  umount -l .
EOF
) &
   
sleep 0.5

# While container2 is doing its init, container 1 unmounts and remounts its overlay
umount $tmpdir/$c1/merged
mount -t overlay overlay -o "$(ovlOpts $c1)" $tmpdir/$c1/merged 
# Boom, EBUSY on 4.13+ because `unshare -m` above has a private copy of the mount

sleep 1
# Now that the runc init code has pivoted and umounted its old root, we're able to mount without EBUSY
mount -t overlay overlay -o "$(ovlOpts $c1)" $tmpdir/$c1/merged 
umount $tmpdir/$c1/merged

# Cleanup
umount $tmpdir/$c2/merged
umount $tmpdir
rm -rf $tmpdir

Changing the runc-init mount to be rslave (as I think it was meant to be) and removing the MakePrivate call for the overlay2 graphdriver directory fixes the race. Even with the addition of the above-mentioned sleeps, I no longer am able to get EBUSY with those changes.

I’ll PR each of those changes shortly with suitable commit messages.

@jpalczewski the index option can be set to default off as a module option, as described in the kconfig entry for it.

@adambro Can you check your kernel options for the OVERLAY_FS_INDEX config option (e.g. with zgrep OVERLAY_FS_INDEX /proc/config.gz)?

If that option is set to yes, it’s expected that the kernel will still exhibit the same behaviour that leads to this failure, even with the above referenced patch.

As @banuchka indicates, this error shouldn’t occur on newer kernels so long as they don’t have that option set. That being said, there’s still a mount being leaked and dmesg will still show warnings.

I reboot computer when this is happening 😃

I’ve dug into it more and I think the invalid argument messages are actually red herrings.

The EBUSY is what actually matters. With some stracing and added logging, it appears to me that the overlay2 driver’s locking mechanism is working just fine. I see the EBUSY on a mount call for a given directory even though a umount call to it had returned 0.

On a hunch, I removed the MNT_DETACH flag, but that didn’t make a difference.

At this point I suspect that this is a kernel bug. My next step is to try and write a reproduction that doesn’t involve dockerd.