buildkit: docker --cache-from with BUILDKIT_INLINE_CACHE does not work every second time

I am trying to take advantage of the caching/pulling system of BUILDKIT for Docker for my CI/CD process. But it does not work as expected.
I created a dummy local example (but the same happens also in my CI system - AWS CodePipeline, and for both DockerHub and AWS ECR). You need to have Dockerfile, run_test.py (with any insides) and requirements.txt (with any insides) in a folder. The Dockerfile:

# base image
FROM python:3.7-slim

# set working directory
WORKDIR /usr/src/app

# add and install requirements
RUN pip install --upgrade pip
COPY ./requirements.txt /usr/src/app/requirements.txt
RUN pip $PIP_PROXY install --no-cache-dir --compile -r requirements.txt

RUN echo 123
# add app
COPY ./run_test.py /usr/src/app/run_test.py

# run server
CMD ["python", "run_test.py"]

run_test.py is actually not interesting, but here is the code just in case:

import requests
import time

while True:
    time.sleep(1)
    print(requests)

In advance, I export two environment variables:

export DOCKER_BUILDKIT=1  # to activate buildkit
export DUMMY_IMAGE_URL=bi0max/test_docker

Then, to test I have the following command. First two commands remove local cache to resemble the CI environment, then build and push.
BE CAREFUL, CODE BELOW REMOVES LOCAL BUILD CACHE:

docker builder prune -a -f && \
(docker image rm $DUMMY_IMAGE_URL:latest || true) && \
docker build \
--cache-from $DUMMY_IMAGE_URL:latest \
--build-arg BUILDKIT_INLINE_CACHE=1 \
--tag $DUMMY_IMAGE_URL:latest "." && \
docker push $DUMMY_IMAGE_URL:latest

As expected, the first run just builds everything from scratch:

#2 [internal] load build definition from Dockerfile
#2 transferring dockerfile: 434B done
#2 DONE 0.0s

#1 [internal] load .dockerignore
#1 transferring context: 2B done
#1 DONE 0.1s

#3 [internal] load metadata for docker.io/library/python:3.7-slim
#3 DONE 0.0s

#12 [1/7] FROM docker.io/library/python:3.7-slim
#12 DONE 0.0s

#7 [internal] load build context
#7 DONE 0.0s

#4 importing cache manifest from bi0max/test_docker:latest
#4 ERROR: docker.io/bi0max/test_docker:latest not found

#12 [1/7] FROM docker.io/library/python:3.7-slim
#12 resolve docker.io/library/python:3.7-slim done
#12 DONE 0.0s

#7 [internal] load build context
#7 transferring context: 204B done
#7 DONE 0.1s

#5 [2/7] WORKDIR /usr/src/app
#5 DONE 0.0s

#6 [3/7] RUN pip install --upgrade pip
#6 1.951 Requirement already up-to-date: pip in /usr/local/lib/python3.7/site-packages (20.1.1)
#6 DONE 2.3s

#8 [4/7] COPY ./requirements.txt /usr/src/app/requirements.txt
#8 DONE 0.0s

#9 [5/7] RUN pip $PIP_PROXY install --no-cache-dir --compile -r requirement...
#9 0.750 Collecting requests==2.22.0
#9 0.848   Downloading requests-2.22.0-py2.py3-none-any.whl (57 kB)
#9 0.932 Collecting idna<2.9,>=2.5
#9 0.948   Downloading idna-2.8-py2.py3-none-any.whl (58 kB)
#9 0.995 Collecting chardet<3.1.0,>=3.0.2
#9 1.011   Downloading chardet-3.0.4-py2.py3-none-any.whl (133 kB)
#9 1.135 Collecting urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1
#9 1.153   Downloading urllib3-1.25.9-py2.py3-none-any.whl (126 kB)
#9 1.264 Collecting certifi>=2017.4.17
#9 1.282   Downloading certifi-2020.4.5.1-py2.py3-none-any.whl (157 kB)
#9 1.378 Installing collected packages: idna, chardet, urllib3, certifi, requests
#9 1.916 Successfully installed certifi-2020.4.5.1 chardet-3.0.4 idna-2.8 requests-2.22.0 urllib3-1.25.9
#9 DONE 2.2s

#10 [6/7] RUN echo 123
#10 0.265 123
#10 DONE 0.3s

#11 [7/7] COPY ./run_test.py /usr/src/app/run_test.py
#11 DONE 0.0s

#13 exporting to image
#13 exporting layers done
#13 writing image sha256:f98327afae246096725f7e54742fe9b25079f1b779699b099e66c8def1e19052 done
#13 naming to docker.io/bi0max/test_docker:latest done
#13 DONE 0.0s

#14 exporting cache
#14 preparing build cache for export done
#14 DONE 0.0s

Then, I slightly adjust run_test.py file and the result is again as expected. All the layers until the last step ([7/7] COPY) are downloaded from repository and reused.

#2 [internal] load .dockerignore
#2 transferring context: 2B done
#2 DONE 0.0s

#1 [internal] load build definition from Dockerfile
#1 transferring dockerfile: 434B done
#1 DONE 0.1s

#3 [internal] load metadata for docker.io/library/python:3.7-slim
#3 DONE 0.0s

#8 [internal] load build context
#8 DONE 0.0s

#4 [1/7] FROM docker.io/library/python:3.7-slim
#4 DONE 0.0s

#5 importing cache manifest from bi0max/test_docker:latest
#5 DONE 1.2s

#8 [internal] load build context
#8 transferring context: 193B done
#8 DONE 0.0s

#6 [2/7] WORKDIR /usr/src/app
#6 CACHED

#7 [3/7] RUN pip install --upgrade pip
#7 CACHED

#9 [4/7] COPY ./requirements.txt /usr/src/app/requirements.txt
#9 CACHED

#10 [5/7] RUN pip $PIP_PROXY install --no-cache-dir --compile -r requirement...
#10 CACHED

#11 [6/7] RUN echo 123
#11 pulling sha256:79fc69c08b391d082b4d2617faed489d220444fa0cf06953cdff55c667866bed
#11 pulling sha256:071624272167ab4e35a30eb1640cb3f15ced19c6cd10fa1c9d49763372e81c23
#11 pulling sha256:04ed4ecd76e1a110f468eb1a3173bbfa578c6b4c85a6dc82bf4a489ed8b8c54d
#11 pulling sha256:79fc69c08b391d082b4d2617faed489d220444fa0cf06953cdff55c667866bed 0.2s done
#11 pulling sha256:d6406c1ce2dc5e841233ebce164ee469388102cb98f1473adaeca15455d6d797
#11 pulling sha256:071624272167ab4e35a30eb1640cb3f15ced19c6cd10fa1c9d49763372e81c23 0.5s done
#11 pulling sha256:04ed4ecd76e1a110f468eb1a3173bbfa578c6b4c85a6dc82bf4a489ed8b8c54d 0.5s done
#11 pulling sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1
#11 pulling sha256:d6406c1ce2dc5e841233ebce164ee469388102cb98f1473adaeca15455d6d797 0.3s done
#11 pulling sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1 0.2s done
#11 CACHED

#12 [7/7] COPY ./run_test.py /usr/src/app/run_test.py
#12 DONE 0.0s

#13 exporting to image
#13 exporting layers done
#13 writing image sha256:f37692114f10b9a3646203569a0849af20774651f4aa0f5dc8d6f133fb7ff062 done
#13 naming to docker.io/bi0max/test_docker:latest done
#13 DONE 0.0s

#14 exporting cache
#14 preparing build cache for export done
#14 DONE 0.0s

Now, I change run_test.py again and I would expect docker to do the same thing as last time. But I get the following result, where it build everything from scratch:

#1 [internal] load .dockerignore
#1 transferring context: 2B done
#1 DONE 0.0s

#2 [internal] load build definition from Dockerfile
#2 transferring dockerfile: 434B done
#2 DONE 0.0s

#3 [internal] load metadata for docker.io/library/python:3.7-slim
#3 DONE 0.0s

#5 [1/7] FROM docker.io/library/python:3.7-slim
#5 DONE 0.0s

#8 [internal] load build context
#8 DONE 0.0s

#4 importing cache manifest from bi0max/test_docker:latest
#4 DONE 1.7s

#8 [internal] load build context
#8 transferring context: 182B done
#8 DONE 0.0s

#5 [1/7] FROM docker.io/library/python:3.7-slim
#5 resolve docker.io/library/python:3.7-slim done
#5 DONE 0.1s

#6 [2/7] WORKDIR /usr/src/app
#6 DONE 0.0s

#7 [3/7] RUN pip install --upgrade pip
#7 1.774 Requirement already up-to-date: pip in /usr/local/lib/python3.7/site-packages (20.1.1)
#7 DONE 2.1s

#9 [4/7] COPY ./requirements.txt /usr/src/app/requirements.txt
#9 DONE 0.0s

#10 [5/7] RUN pip $PIP_PROXY install --no-cache-dir --compile -r requirement...
#10 0.805 Collecting requests==2.22.0
#10 0.905   Downloading requests-2.22.0-py2.py3-none-any.whl (57 kB)
#10 1.079 Collecting urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1
#10 1.109   Downloading urllib3-1.25.9-py2.py3-none-any.whl (126 kB)
#10 1.242 Collecting certifi>=2017.4.17
#10 1.259   Downloading certifi-2020.4.5.1-py2.py3-none-any.whl (157 kB)
#10 1.336 Collecting idna<2.9,>=2.5
#10 1.353   Downloading idna-2.8-py2.py3-none-any.whl (58 kB)
#10 1.410 Collecting chardet<3.1.0,>=3.0.2
#10 1.428   Downloading chardet-3.0.4-py2.py3-none-any.whl (133 kB)
#10 1.545 Installing collected packages: urllib3, certifi, idna, chardet, requests
#10 2.102 Successfully installed certifi-2020.4.5.1 chardet-3.0.4 idna-2.8 requests-2.22.0 urllib3-1.25.9
#10 DONE 2.4s

#11 [6/7] RUN echo 123
#11 0.259 123
#11 DONE 0.3s

#12 [7/7] COPY ./run_test.py /usr/src/app/run_test.py
#12 DONE 0.0s

#13 exporting to image
#13 exporting layers done
#13 writing image sha256:f4ffb0e84e334b4b35fe2504de11012e5dc1ca5978eace055932e9bbbe83c93e done
#13 naming to docker.io/bi0max/test_docker:latest done
#13 DONE 0.0s

#14 exporting cache
#14 preparing build cache for export done
#14 DONE 0.0s

But the strangest thing for me is, when I change run_test.py for the third time, it uses cached layers again. And it continues in the same way: fourth time - doesn’t use, fifth time - uses, etc…

Do I miss something here?

If I pull the image each time before building, then it always uses cache, but it also works in the same way without the BUILDKIT.

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Reactions: 10
  • Comments: 25 (5 by maintainers)

Commits related to this issue

Most upvoted comments

It’s included in the PRs linked in ^ , including the backport to 20.10

Going to close this. If you find that fix does not apply to your use case open a new ticket with reproduction steps.

For the best way to avoid this issue at all is to use docker buildx with the container driver with docker buildx create. It does not rely on system docker version at all then and you can choose any buildkit version(for this case all latest buildkit versions should be ok).

Actually, it still doesn’t work. I have latest docker engine v20.10.7, but my reproducible example still doesn’t work

We’re experiencing the same issue as well, please reopen

I believe this issue should be reopened.

I can still reproduce this in 20.10.11+azure in my CI env, It think it should be reopened 😃 @tonistiigi

My team has been affected by this issue as well, and the original poster’s “pull the image each time before building” workaround (as mentioned right at the end of this issue description) seems to be working for us - 6 out of 610 out of 10 cache hits so far…

And I know, that the docs clearly states that an explicit docker pull beforehand shouldn’t be required (bolded the relevant text for emphasis):

The following example builds an image with inline-cache metadata and pushes it to a registry, then uses the image as a cache source on another machine:

 docker build -t myname/myapp --build-arg BUILDKIT_INLINE_CACHE=1 .
 docker push myname/myapp

After pushing the image, the image is used as cache source on another machine. BuildKit automatically pulls the image from the registry if needed.

https://docs.docker.com/engine/reference/commandline/build/

But without it we experience the same “cache doesn’t work every second time” bug as the original poster 😕

For reference we’re using gitlab CI, with their shared runners and docker:dind/docker:20.10.

To expand on @tonistiigi suggestion of using buildx plugin directly, this is what worked for me in gitlab-ci:

build_container:
  image: docker:stable
  services:
    - docker:stable-dind
  variables:
    - BUILDX_VERSION: v0.6.3
    - DOCKER_BUILDKIT: 1
  before_script:
    - mkdir -p "${HOME}/.docker/cli-plugins/"
    - curl -sLo "${HOME}/.docker/cli-plugins/docker-buildx" https://github.com/docker/buildx/releases/download/${BUILDX_VERSION}/buildx-${BUILDX_VERSION}.linux-amd64
    - chmod a+x "${HOME}/.docker/cli-plugins/docker-buildx"
    - docker buildx create --use
  script:
    - >
      docker buildx build
      --tag <image_base>:<image_tag>
      --cache-from=type=registry,ref=<image_base>:<image_tag>
      --cache-to=type=registry,ref=<image_base>:<image_tag>
      --progress plain
      --push .

It’s included in the PRs linked in ^ , including the backport to 20.10

Going to close this. If you find that fix does not apply to your use case open a new ticket with reproduction steps.

For the best way to avoid this issue at all is to use docker buildx with the container driver with docker buildx create. It does not rely on system docker version at all then and you can choose any buildkit version(for this case all latest buildkit versions should be ok).

@cgreening Thanks. I can confirm I can repro this in 20.10.2 . Buildkit master(buildx) and 19.03 seem to not have the issue.

I solved this by building the image with Buildah.

It works with the cache reliably and predictably.

My code for use in GitLab CI looks like:

before_script:
    - buildah login -u "$CI_REGISTRY_USER" -p "$CI_REGISTRY_PASSWORD" $CI_REGISTRY
script:
    - buildah build
        --tag $CI_REGISTRY_IMAGE/app:latest
        --cache-from $CI_REGISTRY_IMAGE/app # No tag here
        --layers
        --cache-to $CI_REGISTRY_IMAGE/app # No tag here
        .
    - buildah push $CI_REGISTRY_IMAGE/app:latest

The cache is stored in the GitLab registry separate from the app image.

I spent many hours by researching and testing this.

What’s also recommended is to have the “.dockerignore” file with .git*

Using Kaniko is another good solution but Buildah is closer to default Docker commands (1:1 replacement) and can be easily added into default DinD image.

Docker version 20.10.15, build fd82621 on Bitbucket Pipelines. Totally reproduces.

Hi @Bi0max , I am experiencing this issue using docker 20.10 on gitlab. Is there a specific version I should be pinning to?

Hi @alex-treebeard, last time I checked, i had v20.10.7 of docker engine (back then it still did not work). Haven’t checked since then.

I have a very similar problem in my CI using DOCKER_BUILDKIT=1, where a COPY instruction is missing the cache. The permissions and SHA of the file are the same.

Here are two layers that have the exact same file contents. It seems that only the times are different. I wasn’t aware that this should change a layer hash?

d19e1c61b0cbe787e9d58d9ea54e2660ab6ae0c6d1fd3b11a410f60154dbe525.tar.gz.txt

2a3db49c74cd0666b6e4d2729cabafa22ea4270a5a0b6a41b92f87ec6f0f1301.tar.gz.txt

@tonistiigi Here you go:

https://github.com/cgreening/docker-cache-problem

Let me know if I can do anything to help.

@cgreening Please provide runnable reproduction steps.