buildkit: Images created with buildx sometimes have layers that are incorrect/zero bytes

Cross posting from https://github.com/docker/buildx/issues/637 as it’s probably more relevant directly to buildkit.

I’ve been building images with buildx primary for the past couple of weeks and I have seen some very odd behavior where the image manifest appears to be written incorrectly. So far, this only appears to happen when I am re-running a build where at least a portion of the image is cached. Here is an example scenario but it doesn’t just happen for this one project. Another user reported the same problem on another image with a build that is done using the exact same template as reported here. Here is the GitHub repo for the app I am showing the logs for below.

docker version:

# docker version
Client: Docker Engine - Community
 Version:           20.10.7
 API version:       1.41
 Go version:        go1.13.15
 Git commit:        f0df350
 Built:             Wed Jun  2 11:56:40 2021
 OS/Arch:           linux/amd64
 Context:           default
 Experimental:      true

Server: Docker Engine - Community
 Engine:
  Version:          20.10.7
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.13.15
  Git commit:       b0f5bc3
  Built:            Wed Jun  2 11:54:48 2021
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.4.6
  GitCommit:        d71fcd7d8303cbf684402823e425e9dd2e99285d
 runc:
  Version:          1.0.0-rc95
  GitCommit:        b9ee9c6314599f1b4a7f497e1f1f856fe433d3b7
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

Here are the buildx settings:

Creating the builder:

docker buildx create --name builder1 --config ~/buildkit.toml --driver-opt network=host --node builder1

For build buildkit.toml, it just includes values to use my registry cache/mirror to avoid rate limits of Docker Hub:

[registry."docker.io"]
  mirrors = ["registry-mirror.casa.mbentley.net"]

Here is my docker buildx ls & docker buildx inspect output:

$ docker buildx ls
NAME/NODE  DRIVER/ENDPOINT                            STATUS  PLATFORMS
builder1   docker-container
  builder1 unix:///var/lib/jenkins/docker/docker.sock running linux/amd64, linux/arm64, linux/riscv64, linux/ppc64le, linux/s390x, linux/386, linux/arm/v7, linux/arm/v6
default *  docker
  default  default                                    running linux/amd64, linux/arm64, linux/riscv64, linux/ppc64le, linux/s390x, linux/386, linux/arm/v7, linux/arm/v6

$ docker buildx inspect builder1
Name:   builder1
Driver: docker-container

Nodes:
Name:      builder1
Endpoint:  unix:///var/lib/jenkins/docker/docker.sock
Status:    running
Platforms: linux/amd64, linux/arm64, linux/riscv64, linux/ppc64le, linux/s390x, linux/386, linux/arm/v7, linux/arm/v6

Image used by the buildx container:

moby/buildkit:buildx-stable-1@sha256:171689e43026533b48701ab6566b72659dd1839488d715c73ef3fe387fab9a80

I have also tried using from master just in case there was something that has been fixed since then and I have seen the same issue:

moby/buildkit:master@sha256:e0b50ede98f8d241d051b09fceae8956d0e07656657fcd86421e0feff04838ad

I am seeing some messages in the buildx builder container and I am not sure what the errors exactly mean but they don’t seem to specifically indicate that something is wrong as they happen when an image was built fine:

...
time="2021-06-20T07:00:11Z" level=warning msg="invalid image config with unaccounted layers"
time="2021-06-20T07:00:12Z" level=warning msg="failed to update distribution source for layer sha256:5c126ace4b8e4d5e2d1fa6699ac82dddc270a48ac66d40ecca9bbccf7e61d697: content digest sha256:5c126ace4b8e4d5e2d1fa6699ac82dddc270a48ac66d40ecca9bbccf7e61d697: not found"
time="2021-06-20T07:00:12Z" level=warning msg="failed to update distribution source for layer sha256:53380879c22c14c31df822b3976e5f7dc41d082e46dc33b5821183449cdd6be3: content digest sha256:53380879c22c14c31df822b3976e5f7dc41d082e46dc33b5821183449cdd6be3: not found"
time="2021-06-20T07:00:13Z" level=warning msg="reference for unknown type: application/vnd.buildkit.cacheconfig.v0"
time="2021-06-20T07:00:21Z" level=warning msg="failed to update distribution source for layer sha256:aa4b46473fcf70f92312bdb4920f0c4c342327d0db0329763a1d1751106fe362: content digest sha256:aa4b46473fcf70f92312bdb4920f0c4c342327d0db0329763a1d1751106fe362: not found"
time="2021-06-20T07:00:21Z" level=warning msg="failed to update distribution source for layer sha256:d960726af2bec62a87ceb07182f7b94c47be03909077e23d8226658f80b47f87: content digest sha256:d960726af2bec62a87ceb07182f7b94c47be03909077e23d8226658f80b47f87: not found"
time="2021-06-20T07:00:21Z" level=warning msg="failed to update distribution source for layer sha256:31e53a1e85966a7f2a77ed3627df81b3bf248754c8434e504426fcf4cf8c982f: content digest sha256:31e53a1e85966a7f2a77ed3627df81b3bf248754c8434e504426fcf4cf8c982f: not found"
time="2021-06-20T07:00:21Z" level=warning msg="failed to update distribution source for layer sha256:3bcb7e2703c0b6d494dcddb5ac5904498385a9f4495a886bc2262651fd2644d4: content digest sha256:3bcb7e2703c0b6d494dcddb5ac5904498385a9f4495a886bc2262651fd2644d4: not found"
time="2021-06-20T07:00:21Z" level=warning msg="failed to update distribution source for layer sha256:2e97a69f7f22afda6401e1fcc261f1e8cf243aeafb5a5217f895fc8f40660467: content digest sha256:2e97a69f7f22afda6401e1fcc261f1e8cf243aeafb5a5217f895fc8f40660467: not found"
time="2021-06-20T07:00:21Z" level=warning msg="reference for unknown type: application/vnd.buildkit.cacheconfig.v0"
...

Here is an example of a build command where I am using the same basic structure across many projects:

docker buildx build \
  --builder builder1 \
  --pull \
  --push \
  --progress plain \
  --build-arg AIRSONIC_VER=$(wget -q -O - https://api.github.com/repos/airsonic/airsonic/releases/latest | jq -r .tag_name) \
  --platform linux/amd64 \
  -t mbentley/airsonic:latest \
  -f Dockerfile \
  --cache-from=type=registry,ref=registry.casa.mbentley.net/mbentley/airsonic:latest-cache \
  --cache-to=type=registry,ref=registry.casa.mbentley.net/mbentley/airsonic:latest-cache,mode=max \
  .

My registry is just a simple v2 open source registry with valid SSL certs for https.

An example error that you might see is something like this:

# docker start airsonic
Error response from daemon: OCI runtime create failed: container_linux.go:380: starting container process caused: exec: "/entrypoint.sh": stat /entrypoint.sh: no such file or directory: unknown
Error: failed to start containers: airsonic

Rebuilding the image appears to re-write the manifest and it works fine. Here are some outputs from a few commands showing the issue on these two images:

Broken: mbentley/airsonic@sha256:e78d073b03e7802217825e7763d33ebf5f07ed4d2adc0bd94d4345854a15d4c3

Working: mbentley/airsonic@sha256:29aa0bb26757325405da60b4aa7d92fa3cf0fb489982ed4c19bc3580ad94ada1

docker history of broken dh_output.txt:

IMAGE          CREATED      CREATED BY                                      SIZE      COMMENT
fa972d2adb4a   4 days ago   CMD ["java" "-Dserver.address=0.0.0.0" "-Dse…   0B        buildkit.dockerfile.v0
<missing>      4 days ago   ENTRYPOINT ["/entrypoint.sh"]                   0B        buildkit.dockerfile.v0
<missing>      4 days ago   VOLUME [/data]                                  0B        buildkit.dockerfile.v0
<missing>      4 days ago   EXPOSE map[4040/tcp:{}]                         0B        buildkit.dockerfile.v0
<missing>      4 days ago   WORKDIR /var/airsonic                           0B        buildkit.dockerfile.v0
<missing>      4 days ago   USER airsonic                                   0B        buildkit.dockerfile.v0
<missing>      4 days ago   COPY entrypoint.sh /entrypoint.sh # buildkit    0B        buildkit.dockerfile.v0
<missing>      4 days ago   RUN |1 AIRSONIC_VER=v10.6.2 /bin/sh -c (mkdi…   0B        buildkit.dockerfile.v0
<missing>      4 days ago   RUN |1 AIRSONIC_VER=v10.6.2 /bin/sh -c (mkdi…   0B        buildkit.dockerfile.v0
<missing>      4 days ago   RUN |1 AIRSONIC_VER=v10.6.2 /bin/sh -c (AIRS…   0B        buildkit.dockerfile.v0
<missing>      4 days ago   RUN |1 AIRSONIC_VER=v10.6.2 /bin/sh -c (mkdi…   4.71kB    buildkit.dockerfile.v0
<missing>      4 days ago   ENV AIRSONIC_MAJOR_VER=10                       0B        buildkit.dockerfile.v0
<missing>      4 days ago   ARG AIRSONIC_VER                                0B        buildkit.dockerfile.v0
<missing>      4 days ago   RUN /bin/sh -c (apk --no-cache add ca-certif…   181MB     buildkit.dockerfile.v0
<missing>      4 days ago   MAINTAINER Matt Bentley <mbentley@mbentley.n…   0B        buildkit.dockerfile.v0
<missing>      4 days ago   /bin/sh -c #(nop)  CMD ["/bin/sh"]              0B
<missing>      4 days ago   /bin/sh -c #(nop) ADD file:f278386b0cef68136…   5.6MB

docker history of working dh_output2.txt:

IMAGE          CREATED      CREATED BY                                      SIZE      COMMENT
2f71e4b4043a   4 days ago   CMD ["java" "-Dserver.address=0.0.0.0" "-Dse…   0B        buildkit.dockerfile.v0
<missing>      4 days ago   ENTRYPOINT ["/entrypoint.sh"]                   0B        buildkit.dockerfile.v0
<missing>      4 days ago   VOLUME [/data]                                  0B        buildkit.dockerfile.v0
<missing>      4 days ago   EXPOSE map[4040/tcp:{}]                         0B        buildkit.dockerfile.v0
<missing>      4 days ago   WORKDIR /var/airsonic                           0B        buildkit.dockerfile.v0
<missing>      4 days ago   USER airsonic                                   0B        buildkit.dockerfile.v0
<missing>      4 days ago   COPY entrypoint.sh /entrypoint.sh # buildkit    865B      buildkit.dockerfile.v0
<missing>      4 days ago   RUN |1 AIRSONIC_VER=v10.6.2 /bin/sh -c (mkdi…   168B      buildkit.dockerfile.v0
<missing>      4 days ago   RUN |1 AIRSONIC_VER=v10.6.2 /bin/sh -c (mkdi…   15B       buildkit.dockerfile.v0
<missing>      4 days ago   RUN |1 AIRSONIC_VER=v10.6.2 /bin/sh -c (AIRS…   84.7MB    buildkit.dockerfile.v0
<missing>      4 days ago   RUN |1 AIRSONIC_VER=v10.6.2 /bin/sh -c (mkdi…   4.71kB    buildkit.dockerfile.v0
<missing>      4 days ago   ENV AIRSONIC_MAJOR_VER=10                       0B        buildkit.dockerfile.v0
<missing>      4 days ago   ARG AIRSONIC_VER                                0B        buildkit.dockerfile.v0
<missing>      4 days ago   RUN /bin/sh -c (apk --no-cache add ca-certif…   181MB     buildkit.dockerfile.v0
<missing>      4 days ago   MAINTAINER Matt Bentley <mbentley@mbentley.n…   0B        buildkit.dockerfile.v0
<missing>      4 days ago   /bin/sh -c #(nop)  CMD ["/bin/sh"]              0B
<missing>      4 days ago   /bin/sh -c #(nop) ADD file:f278386b0cef68136…   5.6MB

In case it is helpful, here are the docker inspect outputs: Broken Working

I am using Jenkins to perform my builds. I see nothing to indicate why this is failing:

Broken build:

+ docker buildx build --builder builder1 --pull --push --progress plain --build-arg AIRSONIC_VER=v10.6.2 --platform linux/amd64 -t mbentley/airsonic:latest -f Dockerfile --cache-from=type=registry,ref=registry.casa.mbentley.net/mbentley/airsonic:latest-cache --cache-to=type=registry,ref=registry.casa.mbentley.net/mbentley/airsonic:latest-cache,mode=max .
#1 [internal] load build definition from Dockerfile
#1 sha256:8df12d67db120c22f2cd768148b423f1e8db4bac2fa3d77864d01a0b90e8d2eb
#1 transferring dockerfile: 2.27kB done
#1 DONE 0.0s

#2 [internal] load .dockerignore
#2 sha256:ba34cb6f62f35c9741fb8b46667ff98c3ec347a1c1461c4294cb6081f05d292c
#2 transferring context: 2B done
#2 DONE 0.1s

#3 [internal] load metadata for docker.io/library/alpine:latest
#3 sha256:d4fb25f5b5c00defc20ce26f2efc4e288de8834ed5aa59dff877b495ba88fda6
#3 DONE 5.5s

#4 importing cache manifest from registry.casa.mbentley.net/mbentley/airsonic:latest-cache
#4 sha256:54d6ce93e51693820c800f2f73075a6183237f347857e154552181a870bbb9d1
#4 DONE 0.0s

#5 [1/8] FROM docker.io/library/alpine:latest@sha256:234cb88d3020898631af0ccbbcca9a66ae7306ecd30c9720690858c1b007d2a0
#5 sha256:83cc88488a3bbdd9d2e20ed4b0eb8ecd26a4dff43878c2e68a842fed290ad078
#5 resolve docker.io/library/alpine:latest@sha256:234cb88d3020898631af0ccbbcca9a66ae7306ecd30c9720690858c1b007d2a0 0.0s done
#5 DONE 0.0s

#11 [internal] load build context
#11 sha256:d4647f1f5d80ffc907a5589365df0ab882d64d7f2a5ad2e945093014bfc1f410
#11 transferring context: 907B 0.0s done
#11 DONE 0.0s

#12 [7/8] COPY entrypoint.sh /entrypoint.sh
#12 sha256:1c45220cbed07bd67cd26c7e52bce81257cd0eee7ee6be4a94cb3e515a92b92d
#12 CACHED

#7 [3/8] RUN (mkdir /var/airsonic &&  addgroup -g 504 airsonic &&  adduser -h /var/airsonic -D -u 504 -g airsonic -G airsonic -s /sbin/nologin airsonic &&  chown -R airsonic:airsonic /var/airsonic)
#7 sha256:1dd82e7179d413ef03ab329f8e1daf6d9d82624ebb6124f071fd8a3690381720
#7 CACHED

#10 [6/8] RUN (mkdir /data &&  cd /data &&  mkdir db index16 lucene2 lastfmcache thumbs music Podcast playlists .cache .java &&  touch airsonic.properties rollback.sql &&  cd /var/airsonic &&  ln -s /data/db &&  ln -s /data/index16 &&  ln -s /data/lucene2 &&  ln -s /data/lastfmcache &&  ln -s /data/thumbs &&  ln -s /data/music &&  ln -s /data/Podcast &&  ln -s /data/playlists &&  ln -s /data/.cache &&  ln -s /data/.java &&  ln -s /data/airsonic.properties &&  ln -s /data/rollback.sql &&  chown -R airsonic:airsonic /data)
#10 sha256:537ff7293a632ef34d77a985eca7229badd7d2b9309d5e5e11e78ce6cb790cbf
#10 CACHED

#6 [2/8] RUN (apk --no-cache add ca-certificates ffmpeg ttf-dejavu openjdk8 wget jq)
#6 sha256:8eaba8e869c391b7166657cc0a1455b604737370729a18d94ef373a087941341
#6 CACHED

#8 [4/8] RUN (AIRSONIC_VER="$(wget -q -O - https://api.github.com/repos/airsonic/airsonic/releases/latest | jq -r .tag_name)" &&  if [ "$(echo v10.6.2 | awk -F '.' '{print $1}')" != "v10" ]; then echo "Latest version number is no longer 10"; exit 1; fi &&  wget "https://github.com/airsonic/airsonic/releases/download/v10.6.2/airsonic.war" -O /var/airsonic/airsonic.war &&  chown airsonic:airsonic /var/airsonic/airsonic.war)
#8 sha256:3835982beb295b97e3aa4974aca74b0ba79f245ab2e64d5c62615c58cbf37051
#8 CACHED

#9 [5/8] RUN (mkdir /var/airsonic/transcode &&  ln -s /usr/bin/ffmpeg /var/airsonic/transcode/ffmpeg &&  chown -R airsonic:airsonic /var/airsonic/transcode)
#9 sha256:ad84205011c43d7cd451e5f369174eb417683b646df092e20fd67b7a4f745a5a
#9 CACHED

#13 [8/8] WORKDIR /var/airsonic
#13 sha256:cb1ba2e204c69fdfd1ec2b5618052a48c8c9457e19355452ae95b325ec244d09
#13 CACHED

#14 exporting to image
#14 sha256:e8c613e07b0b7ff33893b694f7759a10d42e180f2b4dc349fb57dc6b71dcab00
#14 exporting layers done
#14 exporting manifest sha256:e78d073b03e7802217825e7763d33ebf5f07ed4d2adc0bd94d4345854a15d4c3 0.0s done
#14 exporting config sha256:fa972d2adb4a1fcc619a9ae6c02662c74ba0b564ef40675a0a52d40b110cd21c 0.0s done
#14 pushing layers
#14 ...

#15 [auth] mbentley/airsonic:pull,push token for registry-1.docker.io
#15 sha256:af6bfb8d65f9654c0328f15c4fcd08bd98be5645b915ca1105506ee7e5ea8467
#15 DONE 0.0s

#14 exporting to image
#14 sha256:e8c613e07b0b7ff33893b694f7759a10d42e180f2b4dc349fb57dc6b71dcab00
#14 pushing layers 1.2s done
#14 pushing manifest for docker.io/mbentley/airsonic:latest
#14 pushing manifest for docker.io/mbentley/airsonic:latest 0.3s done
#14 DONE 1.6s

#16 exporting cache
#16 sha256:2700d4ef94dee473593c5c614b55b2dedcca7893909811a8f2b48291a1f581e4
#16 preparing build cache for export done
#16 writing layer sha256:47fcdba1db8f6bb2e24cadfbf88957aecd675c89553230b5266ef1876b863ac3 done
#16 writing layer sha256:53380879c22c14c31df822b3976e5f7dc41d082e46dc33b5821183449cdd6be3 done
#16 writing layer sha256:5843afab387455b37944e709ee8c78d7520df80f8d01cf7f861aae63beeddb6b done
#16 writing layer sha256:5c126ace4b8e4d5e2d1fa6699ac82dddc270a48ac66d40ecca9bbccf7e61d697 done
#16 writing layer sha256:8057d354b41f93a2c1cf01265f97392dbd0c415c8ead5425d90a082c97db2fb7 done
#16 writing layer sha256:bfed4388b8390995898ec1f304e4cce6874d6dd620f4b84dd3b1fb0de36da27a done
#16 writing layer sha256:f74a07ff9c7f30e0525b94e45f4a04ba8130ce9fd8070cd7e8c3bc2b4e9ad810 done
#16 writing config sha256:c2a253a67a918a0511147e5777544935738f87265bb7dcbf5300d697b573b0aa 0.0s done
#16 writing manifest sha256:57e405aaf72e4bd5937ac8e80c39858b7414caec2cc2fc0e1f9d98d6606f31b8 0.0s done
#16 DONE 0.1s

Working build:

+ docker buildx build --builder builder1 --pull --push --progress plain --build-arg AIRSONIC_VER=v10.6.2 --platform linux/amd64 -t mbentley/airsonic:latest -f Dockerfile --cache-from=type=registry,ref=registry.casa.mbentley.net/mbentley/airsonic:latest-cache --cache-to=type=registry,ref=registry.casa.mbentley.net/mbentley/airsonic:latest-cache,mode=max .
#1 [internal] booting buildkit
#1 sha256:bd448caf6032bd5a5d77c0fc37ff5cecfdc320c10d188966d0df614b61366592
#1 starting container buildx_buildkit_builder1
#1 starting container buildx_buildkit_builder1 2.7s done
#1 DONE 2.7s

#2 [internal] load build definition from Dockerfile
#2 sha256:8d15bda46382c1246f234725de3207970eedeeee9d9898bb171e5645f0b778a7
#2 transferring dockerfile: 2.27kB done
#2 DONE 0.0s

#3 [internal] load .dockerignore
#3 sha256:3f4f9de50cf45693d066a337c96dfab339bf23c2f5ef8898b6736adafd224959
#3 transferring context: 2B done
#3 DONE 0.0s

#4 [internal] load metadata for docker.io/library/alpine:latest
#4 sha256:d4fb25f5b5c00defc20ce26f2efc4e288de8834ed5aa59dff877b495ba88fda6
#4 DONE 0.6s

#6 [1/8] FROM docker.io/library/alpine:latest@sha256:234cb88d3020898631af0ccbbcca9a66ae7306ecd30c9720690858c1b007d2a0
#6 sha256:83cc88488a3bbdd9d2e20ed4b0eb8ecd26a4dff43878c2e68a842fed290ad078
#6 resolve docker.io/library/alpine:latest@sha256:234cb88d3020898631af0ccbbcca9a66ae7306ecd30c9720690858c1b007d2a0 0.0s done
#6 DONE 0.0s

#5 importing cache manifest from registry.casa.mbentley.net/mbentley/airsonic:latest-cache
#5 sha256:54d6ce93e51693820c800f2f73075a6183237f347857e154552181a870bbb9d1
#5 DONE 0.0s

#12 [internal] load build context
#12 sha256:24d567b1f92090557f3d2c271a92fc28093ed628db2a310a00a55b0b94422448
#12 transferring context: 907B done
#12 DONE 0.0s

#8 [3/8] RUN (mkdir /var/airsonic &&  addgroup -g 504 airsonic &&  adduser -h /var/airsonic -D -u 504 -g airsonic -G airsonic -s /sbin/nologin airsonic &&  chown -R airsonic:airsonic /var/airsonic)
#8 sha256:1dd82e7179d413ef03ab329f8e1daf6d9d82624ebb6124f071fd8a3690381720
#8 CACHED

#7 [2/8] RUN (apk --no-cache add ca-certificates ffmpeg ttf-dejavu openjdk8 wget jq)
#7 sha256:8eaba8e869c391b7166657cc0a1455b604737370729a18d94ef373a087941341
#7 CACHED

#9 [4/8] RUN (AIRSONIC_VER="$(wget -q -O - https://api.github.com/repos/airsonic/airsonic/releases/latest | jq -r .tag_name)" &&  if [ "$(echo v10.6.2 | awk -F '.' '{print $1}')" != "v10" ]; then echo "Latest version number is no longer 10"; exit 1; fi &&  wget "https://github.com/airsonic/airsonic/releases/download/v10.6.2/airsonic.war" -O /var/airsonic/airsonic.war &&  chown airsonic:airsonic /var/airsonic/airsonic.war)
#9 sha256:3835982beb295b97e3aa4974aca74b0ba79f245ab2e64d5c62615c58cbf37051
#9 CACHED

#10 [5/8] RUN (mkdir /var/airsonic/transcode &&  ln -s /usr/bin/ffmpeg /var/airsonic/transcode/ffmpeg &&  chown -R airsonic:airsonic /var/airsonic/transcode)
#10 sha256:ad84205011c43d7cd451e5f369174eb417683b646df092e20fd67b7a4f745a5a
#10 CACHED

#11 [6/8] RUN (mkdir /data &&  cd /data &&  mkdir db index16 lucene2 lastfmcache thumbs music Podcast playlists .cache .java &&  touch airsonic.properties rollback.sql &&  cd /var/airsonic &&  ln -s /data/db &&  ln -s /data/index16 &&  ln -s /data/lucene2 &&  ln -s /data/lastfmcache &&  ln -s /data/thumbs &&  ln -s /data/music &&  ln -s /data/Podcast &&  ln -s /data/playlists &&  ln -s /data/.cache &&  ln -s /data/.java &&  ln -s /data/airsonic.properties &&  ln -s /data/rollback.sql &&  chown -R airsonic:airsonic /data)
#11 sha256:537ff7293a632ef34d77a985eca7229badd7d2b9309d5e5e11e78ce6cb790cbf
#11 CACHED

#13 [7/8] COPY entrypoint.sh /entrypoint.sh
#13 sha256:13912f5cdaad79f8c1b2301e80261308ce58310a7153e1bb02bf11d6af685ad3
#13 CACHED

#14 [8/8] WORKDIR /var/airsonic
#14 sha256:3319582f1b3b240ceb7589d54dbcdbcc712d4dfacfdff4eb8ed1d743fce52cff
#14 CACHED

#15 exporting to image
#15 sha256:e8c613e07b0b7ff33893b694f7759a10d42e180f2b4dc349fb57dc6b71dcab00
#15 exporting layers done
#15 exporting manifest sha256:29aa0bb26757325405da60b4aa7d92fa3cf0fb489982ed4c19bc3580ad94ada1 0.0s done
#15 exporting config sha256:2f71e4b4043ae8203965cb91faa008af18fcb2c15286b433c1148d36fb4d4f88 0.0s done
#15 pushing layers
#15 ...

#16 [auth] mbentley/airsonic:pull,push token for registry-1.docker.io
#16 sha256:6f923cb5824dd7970ff88871d551967c6161fbf066ccde63a0e0b8cfaa0e9ae8
#16 DONE 0.0s

#15 exporting to image
#15 sha256:e8c613e07b0b7ff33893b694f7759a10d42e180f2b4dc349fb57dc6b71dcab00
#15 pushing layers 0.7s done
#15 pushing manifest for docker.io/mbentley/airsonic:latest
#15 pushing manifest for docker.io/mbentley/airsonic:latest 0.2s done
#15 DONE 1.0s

#17 exporting cache
#17 sha256:2700d4ef94dee473593c5c614b55b2dedcca7893909811a8f2b48291a1f581e4
#17 preparing build cache for export done
#17 writing layer sha256:47fcdba1db8f6bb2e24cadfbf88957aecd675c89553230b5266ef1876b863ac3 done
#17 writing layer sha256:53380879c22c14c31df822b3976e5f7dc41d082e46dc33b5821183449cdd6be3 done
#17 writing layer sha256:5843afab387455b37944e709ee8c78d7520df80f8d01cf7f861aae63beeddb6b done
#17 writing layer sha256:5c126ace4b8e4d5e2d1fa6699ac82dddc270a48ac66d40ecca9bbccf7e61d697 done
#17 writing layer sha256:8057d354b41f93a2c1cf01265f97392dbd0c415c8ead5425d90a082c97db2fb7 done
#17 writing layer sha256:bfed4388b8390995898ec1f304e4cce6874d6dd620f4b84dd3b1fb0de36da27a done
#17 writing layer sha256:f74a07ff9c7f30e0525b94e45f4a04ba8130ce9fd8070cd7e8c3bc2b4e9ad810 done
#17 writing config sha256:c2a253a67a918a0511147e5777544935738f87265bb7dcbf5300d697b573b0aa done
#17 writing manifest sha256:57e405aaf72e4bd5937ac8e80c39858b7414caec2cc2fc0e1f9d98d6606f31b8 done
#17 DONE 0.0s

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 19 (8 by maintainers)

Commits related to this issue

Most upvoted comments

I have investigated this issue further and made another repro, based on @Patrick-Remy’s excellent work. It does not need to import 2 identical caches to make the issue appear, and has an even shorter Dockerfile.

I summarised my understanding of the issue in the repro README (mind you, I have never looked at the buildkit codebase before, so this might be wrong) and copied it below for convenience.

Why is this broken?

Let’s take a look at the Dockerfile provided in this repository:

FROM alpine:latest

# create a layer (empty or not)
RUN echo 1

# create a layer that also depends on the context
COPY repro.txt /

# create an empty layer
RUN echo 2

When importing the cache of a run that has empty layers removed, some vertexes will point to the same result, e.g. COPY repro.txt / and RUN echo 2.

In cache.remotecache.v1.(*cacheResultStorage).LoadWithParents, we try to load a cache result with its parents. We start by looking up the corresponding item in a map, and because there are 2 possible values, it will randomly return one or the other.

If the ‘wrong’ item gets used (COPY repro.txt / in our example), then only a partial list of results will be loaded. They get returned to solver.(*cacheManager).LoadWithParents, which will filter them and end up with the same partial list of results.

Those will eventually be saved in the buildkitd cache in solver.(*combinedCacheManager).Load, thus missing the entry for RUN echo 2.

During a second run with the same cache, but this time with a partially populated buildkitd cache, if the ‘wrong’ item gets used again in cache.remotecache.v1.(*cacheResultStorage).LoadWithParents, and the partial list of results is loaded and returned to solver.(*cacheManager).LoadWithParents, something different from the previous run might happen.

During the result filtering, results originating from both caches could be walked, and the result for RUN echo 1 could end up being returned as the first element of the list, instead of the one for COPY repro.txt / or RUN echo 2.

Unfortunately, solver.(*combinedCacheManager).Load assumes that the first result is the parent and will return that one, which eventually results in an image missing a layer!

I made one PR regarding the repro. https://github.com/moby/buildkit/pull/2261 It fixes the repro but obviously is not the main issue here. This has left me quite puzzled how changing trivial things in the repro changes the behavior. The 1 and 2 caches contain identical files but it does not work if 2 does not run. Even if I clear the local state and now 2 should have no effect it still changes behavior. I’ll continue to look into this but we have hit the deadline with the release and need to move on there.

I’ve had the chance to do a couple re-builds of all 90 of my images that I push to Docker Hub using buildkit from your PR @jgiannuzzi and while it is a small sample size of two full runs, I am not seeing any zero bytes layers being detected 🤞

Nice, thanks @Patrick-Remy! I’ve been able to use your repro myself on Docker for Mac and it takes anywhere from 10-30 seconds fairly reliably to get it to reproduce the issue.

Just to add a quick note, I did test a number of versions with your script to reproduce it (v0.8.3 v0.8.2 v0.8.1 v0.8.0 v0.7.2 v0.7.1) and found that the issue does first start appearing as of 0.8.0.