moby: overlay2: docker push creating overlapped ovls, rendering failures on newer kernels
Description
In short, the overlay subsystem in kernel forbids overlapped layers – that means, one layer being the upper dir & the lower at the same time. Which is just what docker push
does.
This is an UB, undefined behavior in before, and with this patch https://github.com/torvalds/linux/commit/146d62e5a5867fbf84490d82455718bfb10fe824 , this action may cause error.
I have not looked into the issue, but I think it’s ilogic that docker push
creates overlapped layers, and moreover, why did it use ovl in the first place?
Steps to reproduce the issue:
- Install a newer kernel (containing that patch), for me I use fedora 30.
[root@localhost ~]# uname -a
Linux localhost.localdomain 5.1.20-300.fc30.x86_64 #1 SMP Fri Jul 26 15:03:11 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
- Pull an image with multiple layers, here I choose mariadb, and tag it so it could be pushed to a personal registry(or any registry not including this image). Notice that the bug happens during push of non-exist image layers, that’s why I take the personal registry.
[root@localhost ~]# docker tag mariadb xxx.yyy.zzz/mariadb
docker push xxx.yyy.zzz/mariadb
Describe the results you received:
Some of the layers (often the top 4) start to fail & retry
[root@localhost ~]# docker push xxx.yyy.zzz/mariadb
The push refers to repository [xxx.yyy.zzz/mariadb]
0681153a129b: Retrying in 1 second
cd7a8d37f569: Retrying in 2 seconds
...
journalctl -f -u docker
outputs error messages related to overlay2
Aug 04 08:50:05 localhost.localdomain dockerd[3012]: time="2019-08-04T08:50:05.459134956-07:00" level=error msg="Upload failed, retrying: error creating overlay mount to /var/lib/docker/overlay2/aea27291a899b955bbed386a4c4a37a4512586a583c5fa05bc37a82cac229426/merged: device or resource busy"
and dmesg
complains about the overlap:
[ 1348.372328] overlayfs: lowerdir is in-use as upperdir/workdir
[ 1394.161732] overlayfs: lowerdir is in-use as upperdir/workdir
Describe the results you expected:
The push shall be okay.
Additional information you deem important (e.g. issue happens only occasionally):
If tried many times running the same push command, the push might be completed. But this issue is first seen on my CI system, and it could get the system unstable.
Output of docker version
:
Client: Docker Engine - Community
Version: 19.03.1
API version: 1.40
Go version: go1.12.5
Git commit: 74b1e89
Built: Thu Jul 25 21:20:55 2019
OS/Arch: linux/amd64
Experimental: false
Server: Docker Engine - Community
Engine:
Version: 19.03.1
API version: 1.40 (minimum version 1.12)
Go version: go1.12.5
Git commit: 74b1e89
Built: Thu Jul 25 21:19:28 2019
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.2.6
GitCommit: 894b81a4b802e4eb2a91d1ce216b8817763c29fb
runc:
Version: 1.0.0-rc8
GitCommit: 425e105d5a03fabd737a126ad93d62a9eeede87f
docker-init:
Version: 0.18.0
GitCommit: fec3683
Output of docker info
:
Client:
Debug Mode: false
Server:
Containers: 0
Running: 0
Paused: 0
Stopped: 0
Images: 1
Server Version: 19.03.1
Storage Driver: overlay2
Backing Filesystem: xfs
Supports d_type: true
Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 894b81a4b802e4eb2a91d1ce216b8817763c29fb
runc version: 425e105d5a03fabd737a126ad93d62a9eeede87f
init version: fec3683
Security Options:
seccomp
Profile: default
Kernel Version: 5.1.20-300.fc30.x86_64
Operating System: Fedora 30 (Server Edition)
OSType: linux
Architecture: x86_64
CPUs: 1
Total Memory: 1.92GiB
Name: localhost.localdomain
ID: LGLA:TGK2:YCZC:NSHI:G4AU:CI3M:RWQX:2JN4:4AVD:YORS:RIW7:C2AE
Docker Root Dir: /var/lib/docker
Debug Mode: false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
Additional environment details (AWS, VirtualBox, physical, etc.):
The above output is gathered from an repr image inside VMWare, but the issue happens on physical machine in the first.
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Comments: 17 (13 by maintainers)
@fuweid thanks for that link btw; through that issue, I also noticed we have another issue opened: https://github.com/moby/moby/issues/39475 which indicates it may be a kernel / overlays issue
I see CoreOS modified their kernel to suppress this warning https://github.com/coreos/linux/pull/346