moby: failed to export image: failed to create image: failed to get layer: layer does not exist

Description

  • Docker CE 18.02
  • Linux 4.15.7-1-ARCH #1 SMP PREEMPT Wed Feb 28 19:01:57 UTC 2018 x86_64 GNU/Linux
  • BTRFS as filesystem
  • Docker uses subvolumes

Sometimes when building my images i have this error.

As docker uses subvolumes, i know that sometimes BTRFS just wreck everything and files/subvolumes aren’t visible from “users”, including “ls” in a shell or, here, Docker.

The problem appears after many build happened. Also, the cache isn’t available for docker to use it. When issuing sudo sync, the docker cache is available and build can continue from where it stopped.

Steps to reproduce the issue:

  1. BTRFS as main filesystem, docker to use subvolumes
  2. Multistage Build build build build build build build build build… many times until it happens. Having big layers helps.
  3. When the error show up, try to resume the build: cache isn’t used.
  4. Issue sync as root, wait for the command to finish
  5. Restart the build: cache is available and build resumes

Additional information you deem important (e.g. issue happens only occasionally):

Output of docker version:

❭ docker version
Client:
 Version:       18.02.0-ce
 API version:   1.36
 Go version:    go1.9.4
 Git commit:    fc4de447b5
 Built: Tue Feb 13 15:28:01 2018
 OS/Arch:       linux/amd64
 Experimental:  false
 Orchestrator:  swarm

Server:
 Engine:
  Version:      18.02.0-ce
  API version:  1.36 (minimum version 1.12)
  Go version:   go1.9.4
  Git commit:   fc4de447b5
  Built:        Tue Feb 13 15:28:34 2018
  OS/Arch:      linux/amd64
  Experimental: false

Output of docker info:

Containers: 673
 Running: 1
 Paused: 0
 Stopped: 672
Images: 1805
Server Version: 18.02.0-ce
Storage Driver: btrfs
 Build Version: Btrfs v4.15
 Library Version: 102
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 9b55aab90508bd389d7654c4baf173a981477d55
runc version: 9f9c96235cc97674e935002fc3d78361b696a69e
init version: 949e6fa
Security Options:
 seccomp
  Profile: default
Kernel Version: 4.15.7-1-ARCH
Operating System: Arch Linux
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 7.643GiB
Name: padme
ID: SR5W:2GPM:CDIM:OEQD:GPY4:ATIR:L7B5:AMPP:742G:BVKE:ITND:FLYW
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Username: leryan
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false

Additional environment details (AWS, VirtualBox, physical, etc.):

Physical with SSD.

About this issue

Commits related to this issue

Most upvoted comments

I’m having this issue, after 4 copy commands docker sends this error Docker version 18.06.1-ce, build e68fc7a

I seem to get this when I have two COPY commands, the first of which has a from argument (COPY --from=previous /usr/local /usr/local), copying from a previous state whereas the COPY after is then copying from the build context (COPY . ${TARGET_LOCATION}, to be precise).

It seem to fail each time it does this whole file from scratch, which has about 4-5 stages. If I just try again it seems to work, but if I docker image prune before or do not use cache, it fails with this error. I have not tested this thoroughly but it seems to be what is happening, on two separate machines but the same OS (Arch Linux), with the same version of Docker CE on both.

$ pacman -Qi docker
Name            : docker
Version         : 1:18.09.0-2
...

Same issue here, running CentOS 7 (3.10.0-693.21.1.el7.x86_64), btrfs v4.9.1, docker-ce 18.03.0-ce, build 0520e24.

I think the problem was caused by running:

docker rm -v $(docker ps --filter status=exited -q 2>/dev/null) 2>/dev/null
docker rmi $(docker images --filter dangling=true -q 2>/dev/null) 2>/dev/null

I’ve fixed problem by nuking the filesystem:

btrfs subvolume delete /var/lib/docker/btrfs/subvolumes/*
rm -rf /var/lib/docker/*

Happens for me too, but without BTRFS:

Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Native Overlay Diff: false

Started happening after I did prune old images.

I just run sync and everything works faultlessly again! Amazing!

@jsosic well next time you may want to try sync before ^^