moby: COPY fail in multistage build: layer does not exist

Description

Multistage build fails when a specific sequence of COPY commands are given. This happens in every version of dockerd > 17.06 (i.e. 17.06 is not affected, while all the later versions are).

The combination of COPY commands is:

  1. COPY --from a parent stage of a file or directory which is already present in the current stage.
  2. COPY from the host (or any stage other than the one used in step 1) of any other file.

Create and run the following shell script:

#!/bin/bash
set -eux

touch 1

cat << EOF > Dockerfile
FROM scratch AS base
COPY 1 /1

FROM base
# "useless" copy works ...
COPY --from=base /1 /1
# ... and the COPY statement after it fails
COPY 1 /2
EOF

docker build --no-cache .

Describe the results you received:

Error:

Step 5/5 : COPY 1 /2 failed to export image: failed to create image: failed to get layer sha256:a8ed352e74d0355836d2a5cbb8365c6e054ac8146b6e47991e48ed3e7331b832: layer does not exist

Full log:

+ touch 1
+ cat
+ docker build --no-cache .
Sending build context to Docker daemon   5.12kB
Step 1/5 : FROM scratch AS base
 ---> 
Step 2/5 : COPY 1 /1
 ---> 3ad67d379696
Step 3/5 : FROM base
 ---> 3ad67d379696
Step 4/5 : COPY --from=base /1 /1
 ---> dddb0549688c
Step 5/5 : COPY 1 /2
failed to export image: failed to create image: failed to get layer sha256:a8ed352e74d0355836d2a5cbb8365c6e054ac8146b6e47991e48ed3e7331b832: layer does not exist

Describe the results you expected:

Successfully built xxxxx

Additional information:

Reproducible 100% on any docker > 17.06 and any graphdriver. Can’t reproduce if DOCKER_BUILDKIT is set (for versions that support buildkit, of course).

This is a continuation of issues https://github.com/moby/moby/issues/33974 and https://github.com/moby/moby/issues/37340. The bug described here first appeared after merging PR https://github.com/moby/moby/pull/33454 and is not fixed by PRs https://github.com/moby/moby/pull/34063 and https://github.com/moby/moby/pull/35579.

About this issue

  • Original URL
  • State: open
  • Created 6 years ago
  • Reactions: 56
  • Comments: 35 (7 by maintainers)

Commits related to this issue

Most upvoted comments

Adding an extra command (like RUN true) between the two COPY statements makes the issue go away (as well as commenting out the first COPY).

@thaJeztah – this issue is still present in 19.03

Thanks for minimizing the bug and posting workarounds.

Just as a mild reminder — this is still an issue and continues occurring in the wild.

Not sure if said above, but both workarounds (RUN true and DOCKER_BUILDKIT=1) may be not what you want. If you expect the first COPY to always copy some files — then the real problem would be deeper, in your build script, and these workarounds would only achieve masking of the real problem.

For example, check that the target directory is not a VOLUME in your FROM image; docker build won’t commit any changes within volumes to a layer, and you’ll get this error layer does not exist. This has happened to me, and was a royal pain to debug.

Of course, if the copied content is allowed to be empty (or already present at the destination, with 0 diff) — you’ll need the RUN true in between. As figured out above, the bug only triggers when there’s a COPY instruction producing null effect, followed immediately by another COPY.

Hit this issue as well in GitHub Actions (not part of a multistage build). It errors on the second COPY command below:

COPY docker/images/ci/php.ini /usr/local/etc/php/php.ini
COPY --chown=www-data:www-data config/ /tmp/config/

Setting DOCKER_BUILDKIT=1 before the docker build fixed the problem.

I’m hitting this bug in our CI builds, happy to provide info if needed

Adding an extra command (like RUN true) between the two COPY statements makes the issue go away (as well as commenting out the first COPY).

May you please leave why it’s the solution, and what caused the issue? I’m eager to know why.

reminder, there is a workaround: add RUN true between problematic COPY instructions.

Still seeing this with Docker version 20.10.11, build dea9396e18.

In our case, it seems that having 3 or more of our COPY commands in a row will cause the issue:

Dockerfile (partial)

# First stage: install the application's dependencies and copy files from host machine
...

# Second stage: copy installed dependencies and workdir from the previous stage
...

# Copy dependencies from the build stage
COPY --from=builder /opt/venv /opt/venv

# Activate venv copied over from builder
ENV PATH="$VENV/bin:$PATH"

# Copy SSL CA certificates
COPY --from=builder /etc/ssl/certs/ca-certificates.crt /etc/ssl/certs/

# Copy netbase name-to-number mappings
COPY --from=builder /etc/services /etc/rpc /etc/protocols /etc/ethertypes /etc/

# Copy project files
COPY --from=builder /usr/src/app /usr/src/app
...

Build command (partial)

λ docker build . --no-cache --file Dockerfile --tag myapp
...
Step 20/24 : COPY --from=builder /usr/src/app /usr/src/app
failed to export image: failed to create image: failed to get layer sha256:48f7f470b4615dfff28e8ad216e06c6dcfebe67a56b58544ffcee5bf05afbbad: layer does not exist

Log (complete) Dockerfile (complete): this file is in .txt extesion, otherwise GitHub does not allow me to upload it. Please remove the .txt suffix to reproduce the issue =)

Solution

ATM, our solution is adding RUN true, as suggested in https://github.com/moby/moby/issues/37965#issuecomment-426853382, but after the third copy command:

...
# Copy dependencies from the build stage
COPY --from=builder /opt/venv /opt/venv

# Activate venv copied over from builder
ENV PATH="$VENV/bin:$PATH"

# Copy SSL CA certificates
COPY --from=builder /etc/ssl/certs/ca-certificates.crt /etc/ssl/certs/

# Copy netbase name-to-number mappings
COPY --from=builder /etc/services /etc/rpc /etc/protocols /etc/ethertypes /etc/

# This is a workaround to https://github.com/moby/moby/issues/37965
RUN true

# Copy project files
COPY --from=builder /usr/src/app /usr/src/app
...

Using DOCKER_BUILDKIT is not a viable solution for out team, given that we ship the application via CI/CD pipeline and, to achieve this, we use Kaniko. As of the last time we checked, Kaniko did not support DOCKER_BUILDKIT.

Let us know if we could be of further assistance =)

I am hitting this error in Github workflows. Any update?

Same issue here Server Version: 18.03.1-ce

RUN true did help.

We are seeing this same problem consistently in our CI env. Using the RUN true workaround in between COPY statements does seems to mitigate the issue.

docker info
Containers: 43
 Running: 14
 Paused: 0
 Stopped: 29
Images: 4503
Server Version: 18.03.1-ce
Storage Driver: overlay2
 Backing Filesystem: extfs
 Supports d_type: true
 Native Overlay Diff: false
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 773c489c9c1b21a6d78b5c538cd395416ec50f88
runc version: 4fc53a81fb7c994640722ac585fa9ca548971871
init version: 949e6fa
Security Options:
 seccomp
  Profile: default
Kernel Version: 3.10.0-514.16.1.el7.x86_64
Operating System: CentOS Linux 7 (Core)
OSType: linux
Architecture: x86_64
CPUs: 32
Total Memory: 251.7GiB
Name: xxxxxxx
ID: xxxxxxx
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false

After starting the overlay module with redirect_dir=off the problem stopped, there are more information here

when run docker info . if you use overlay2 storage.

...
  Native Overlay Diff: false
...

tmp fix

echo 0 > /sys/module/overlay/parameters/redirect_dir

then restart docker

systemctl restart docker 

show docker info

docker info
...
  Native Overlay Diff: true
...

this bug is overlay2 storage use redirect_dir ?? not sure why this issue exist so long.

fix

echo "options overlay redirect_dir=0" > /etc/modprobe.d/disable_redirect_dir_overlayfs.conf

reminder, there is a workaround: add RUN true between problematic COPY instructions.

RUN true worls not on images FROM scratch due the missing shell command.

export DOCKER_BUILDKIT=1 is really helpful especially in Github Action pipeline

@dwmh interesting; we ran into issues with that configuration in the past, and a detection was added to switch to the naive (instead of native overlay differ; https://github.com/moby/moby/pull/34342

From the output of your docker info, it looks like the storage driver did indeed switch to disable the native diff;

Native Overlay Diff: false

@dmcgowan @tonistiigi @kolyshkin ^^ any ideas?

More information on this bug.

After you failed the first time, with COPY --from=base /1 /1 suceeded, you can re-run docker build command. The second time the docker run command will succeed.

Have the same issue with copying to folders into image. The last one is not empty. First build always fails. Second succeeds.

#!/bin/bash
  
EMPTY_DIR=${EMPTY_DIR:=dir_1}
NONEMPTY_DIR=${NONEMPTY_DIR:=dir_2}

if [ ! -d $EMPTY_DIR ]; then
        mkdir $EMPTY_DIR;
fi

if [ ! -d $NONEMPTY_DIR ]; then
        mkdir $NONEMPTY_DIR;
        pushd $NONEMPTY_DIR
                for i in {1..5}; do
                        dd if=/dev/zero of="file_$i" bs=100k count=1
                done
        popd
fi

cat << EOF > Dockerfile
FROM opensuse/leap:15.0

COPY dir_1 /var
COPY dir_2 /var
EOF

docker build .

Docker version 18.06.1-ce, build e68fc7a215d7

/var/lib/docker - on BTRFS partition