moby: overlay2: docker push creating overlapped ovls, rendering failures on newer kernels

Description

In short, the overlay subsystem in kernel forbids overlapped layers – that means, one layer being the upper dir & the lower at the same time. Which is just what docker push does.

This is an UB, undefined behavior in before, and with this patch https://github.com/torvalds/linux/commit/146d62e5a5867fbf84490d82455718bfb10fe824 , this action may cause error.

I have not looked into the issue, but I think it’s ilogic that docker push creates overlapped layers, and moreover, why did it use ovl in the first place?

Steps to reproduce the issue:

  1. Install a newer kernel (containing that patch), for me I use fedora 30.
[root@localhost ~]# uname -a
Linux localhost.localdomain 5.1.20-300.fc30.x86_64 #1 SMP Fri Jul 26 15:03:11 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
  1. Pull an image with multiple layers, here I choose mariadb, and tag it so it could be pushed to a personal registry(or any registry not including this image). Notice that the bug happens during push of non-exist image layers, that’s why I take the personal registry.
[root@localhost ~]# docker tag mariadb xxx.yyy.zzz/mariadb
  1. docker push xxx.yyy.zzz/mariadb

Describe the results you received:

Some of the layers (often the top 4) start to fail & retry

[root@localhost ~]# docker push xxx.yyy.zzz/mariadb
The push refers to repository [xxx.yyy.zzz/mariadb]
0681153a129b: Retrying in 1 second 
cd7a8d37f569: Retrying in 2 seconds 
...

journalctl -f -u docker outputs error messages related to overlay2

Aug 04 08:50:05 localhost.localdomain dockerd[3012]: time="2019-08-04T08:50:05.459134956-07:00" level=error msg="Upload failed, retrying: error creating overlay mount to /var/lib/docker/overlay2/aea27291a899b955bbed386a4c4a37a4512586a583c5fa05bc37a82cac229426/merged: device or resource busy"

and dmesg complains about the overlap:

[ 1348.372328] overlayfs: lowerdir is in-use as upperdir/workdir
[ 1394.161732] overlayfs: lowerdir is in-use as upperdir/workdir

Describe the results you expected:

The push shall be okay.

Additional information you deem important (e.g. issue happens only occasionally):

If tried many times running the same push command, the push might be completed. But this issue is first seen on my CI system, and it could get the system unstable.

Output of docker version:

Client: Docker Engine - Community
 Version:           19.03.1
 API version:       1.40
 Go version:        go1.12.5
 Git commit:        74b1e89
 Built:             Thu Jul 25 21:20:55 2019
 OS/Arch:           linux/amd64
 Experimental:      false

Server: Docker Engine - Community
 Engine:
  Version:          19.03.1
  API version:      1.40 (minimum version 1.12)
  Go version:       go1.12.5
  Git commit:       74b1e89
  Built:            Thu Jul 25 21:19:28 2019
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.2.6
  GitCommit:        894b81a4b802e4eb2a91d1ce216b8817763c29fb
 runc:
  Version:          1.0.0-rc8
  GitCommit:        425e105d5a03fabd737a126ad93d62a9eeede87f
 docker-init:
  Version:          0.18.0
  GitCommit:        fec3683

Output of docker info:

Client:
 Debug Mode: false

Server:
 Containers: 0
  Running: 0
  Paused: 0
  Stopped: 0
 Images: 1
 Server Version: 19.03.1
 Storage Driver: overlay2
  Backing Filesystem: xfs
  Supports d_type: true
  Native Overlay Diff: true
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 894b81a4b802e4eb2a91d1ce216b8817763c29fb
 runc version: 425e105d5a03fabd737a126ad93d62a9eeede87f
 init version: fec3683
 Security Options:
  seccomp
   Profile: default
 Kernel Version: 5.1.20-300.fc30.x86_64
 Operating System: Fedora 30 (Server Edition)
 OSType: linux
 Architecture: x86_64
 CPUs: 1
 Total Memory: 1.92GiB
 Name: localhost.localdomain
 ID: LGLA:TGK2:YCZC:NSHI:G4AU:CI3M:RWQX:2JN4:4AVD:YORS:RIW7:C2AE
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false

Additional environment details (AWS, VirtualBox, physical, etc.):

The above output is gathered from an repr image inside VMWare, but the issue happens on physical machine in the first.

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Comments: 17 (13 by maintainers)

Commits related to this issue

Most upvoted comments

@fuweid thanks for that link btw; through that issue, I also noticed we have another issue opened: https://github.com/moby/moby/issues/39475 which indicates it may be a kernel / overlays issue

I see CoreOS modified their kernel to suppress this warning https://github.com/coreos/linux/pull/346