buildkit: ADD from url fails with 'invalid not-modified ETag'

Build fails at:

ADD https://getcomposer.org/composer.phar /usr/local/bin/composer

Log:

 => [1/17] FROM docker.io/library/php:7.3-fpm@sha256:cf8e94d24d94329f13bcd430ae586f80278247e1c43e5f8b3d52c4ab16d2464f                                                                                                         0.0s
 => ERROR https://getcomposer.org/composer.phar                                                                                                                                                                               0.3s
 => [internal] load build context                                                                                                                                                                                             0.0s
------
 > https://getcomposer.org/composer.phar:
------
invalid not-modified ETag: "5c912760-1d3e0e"

I’m wondering if this is related to https://github.com/moby/buildkit/pull/835 but thought worth reporting so it can also be tested.

The etag seems to work just fine, i.e. curl -I -H 'if-none-match: "5c912760-1d3e0e"' https://getcomposer.org/composer.phar returns 304.

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Reactions: 5
  • Comments: 18 (5 by maintainers)

Commits related to this issue

Most upvoted comments

I’m also still having this issue (with the same exact URL) with docker 20.10.5

Which version is this supposed to have been fixed in?

To get around the docker system prune annoyance, we stopped using ADD and just used a RUN command using curl

please be aware that ADD is eveluated on every build and will not use cache in case the resource has change while a RUN curl will cache the result once and use the build cache from that point on. So using RUN and curl is not a full replcaement here

I believe I understand the cause.

The fix should be very simple ~, however I’m not set up for building in this ecosystem. Hopefully someone can provide a patch~ . EDIT see (untested) PR #1159.

I was seeing this too with a Google Cloud Storage URL and so decided to investigate. It seems unlikely that both GitHub releases and Google Cloud Storage are “misbehaving” with respect to ETags, and it’s likely that instead BuildKit may be misinterpreting the If-None-Match spec.

Background info

Docker version info Docker for Mac 2.1.1.0 (27260)
Client:
 Debug Mode: false
 Plugins:
  app: Docker Application (Docker Inc., v0.8.0)
  buildx: Build with BuildKit (Docker Inc., v0.2.2-10-g3f18b65-tp-docker)

Server: Containers: 15 Running: 0 Paused: 0 Stopped: 15 Images: 2126 Server Version: 19.03.1 Storage Driver: overlay2 Backing Filesystem: extfs Supports d_type: true Native Overlay Diff: true Logging Driver: json-file Cgroup Driver: cgroupfs Plugins: Volume: local Network: bridge host ipvlan macvlan null overlay Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog Swarm: inactive Runtimes: runc Default Runtime: runc Init Binary: docker-init containerd version: 894b81a4b802e4eb2a91d1ce216b8817763c29fb runc version: 425e105d5a03fabd737a126ad93d62a9eeede87f init version: fec3683 Security Options: seccomp Profile: default Kernel Version: 4.14.131-linuxkit Operating System: Docker Desktop OSType: linux Architecture: x86_64 CPUs: 4 Total Memory: 1.945GiB Name: docker-desktop ID: WESG:FJU6:DS3Q:6OS3:D7SX:VK5N:Q4UH:A5BS:T46V:2MJH:YKWB:XQPX Docker Root Dir: /var/lib/docker Debug Mode: true File Descriptors: 29 Goroutines: 44 System Time: 2019-09-04T19:03:23.537795672Z EventsListeners: 2 HTTP Proxy: gateway.docker.internal:3128 HTTPS Proxy: gateway.docker.internal:3129 Registry: https://index.docker.io/v1/ Labels: Experimental: true Insecure Registries: 127.0.0.0/8 Live Restore Enabled: false Product License: Community Engine

Reproduction Steps

Dockerfile

FROM scratch
ADD https://storage.googleapis.com/gpt-2/models/774M/checkpoint

The first run (or a run after the cache is cleared) succeeds:

DOCKER_BUILDKIT=1 docker build .
[+] Building 0.3s (5/5) FINISHED
 => [internal] load .dockerignore                                                              0.0s
 => => transferring context: 244B                                                              0.0s
 => [internal] load build definition from Dockerfile                                           0.0s
 => => transferring dockerfile: 121B                                                           0.0s
 => https://storage.googleapis.com/gpt-2/models/774M/checkpoint                                0.0s
 => [1/1] ADD https://storage.googleapis.com/gpt-2/models/774M/checkpoint .                    0.0s
 => exporting to image                                                                         0.0s
 => => exporting layers                                                                        0.0s
 => => writing image sha256:9a630453cc77705a4f57c121393a38a4f45f65eba6155ba6c3f27ecab18e2b05   0.0s

Running the same command again:

[+] Building 0.2s (3/4)
 => [internal] load build definition from Dockerfile                                           0.0s
 => => transferring dockerfile: 36B                                                            0.0s
 => [internal] load .dockerignore                                                              0.0s
 => => transferring context: 35B                                                               0.0s
 => ERROR https://storage.googleapis.com/gpt-2/models/774M/checkpoint                          0.1s
------
 > https://storage.googleapis.com/gpt-2/models/774M/checkpoint:
------
invalid not-modified ETag:

Clearing the cache with docker builder prune resets the state.

Cause

If a single ETag is requested in If-None-Match, the server may not include that (unambiguous) ETag header in the response.

Detailed demonstration

Requesting the file directly:

curl --http1.1 -s -L -D /dev/stderr -o /dev/null https://storage.googleapis.com/gpt-2/models/774M/checkpoint

Response headers. Notice the ETag.

HTTP/1.1 200 OK
X-GUploader-UploadID: AEnB2Up6PhdYRb_UZ18VAl30f6XLzFkOoBPnSYSjSKqzk90Go8Zqk-zZoenkbL3inKQz1ozoLjcObKKuIbvOV7XFlSSFO6aW0Q
Expires: Wed, 04 Sep 2019 19:23:16 GMT
Date: Wed, 04 Sep 2019 19:23:16 GMT
Cache-Control: private, max-age=0
Last-Modified: Tue, 20 Aug 2019 15:50:08 GMT
ETag: "ca0368fcd3c4c1a99aca42511d0c1f12"
x-goog-generation: 1566316208157027
x-goog-metageneration: 1
x-goog-stored-content-encoding: identity
x-goog-stored-content-length: 77
Content-Type: application/octet-stream
x-goog-hash: crc32c=BI0EFw==
x-goog-hash: md5=ygNo/NPEwamaykJRHQwfEg==
x-goog-storage-class: MULTI_REGIONAL
Accept-Ranges: bytes
Content-Length: 77
Server: UploadServer
Alt-Svc: quic=":443"; ma=2592000; v="46,43,39"

Requesting the file If-None-Match with that ETag:

curl --http1.1 -s -L -D /dev/stderr -o /dev/null -H 'If-None-Match: "ca0368fcd3c4c1a99aca42511d0c1f12"' https://storage.googleapis.com/gpt-2/models/774M/checkpoint

Notice that the ETag is not included in the response headers.

HTTP/1.1 304 Not Modified
X-GUploader-UploadID: AEnB2UpEqNrI5fDJjglD2f4--3CCzskMyUg-Fo1RZxoqqHq17HG8W_gURMO6uUVy9B6Mg4450GyA4yRTjPqEJY8v6dtxhuHcLQ
Expires: Wed, 04 Sep 2019 19:23:36 GMT
Date: Wed, 04 Sep 2019 19:23:36 GMT
Cache-Control: private, max-age=0
Last-Modified: Tue, 20 Aug 2019 15:50:08 GMT
Content-Length: 0
Server: UploadServer
Alt-Svc: quic=":443"; ma=2592000; v="46,43,39"

Requesting the file with multiple ETags in If-None-Match:

curl --http1.1 -s -L -D /dev/stderr -o /dev/null -H 'If-None-Match: "ca0368fcd3c4c1a99aca42511d0c1f12", "foobar"' https://storage.googleapis.com/gpt-2/models/774M/checkpoint

Now the ETag is included to disambiguate.

HTTP/1.1 200 OK
X-GUploader-UploadID: AEnB2UpD_nKA4ZMNpJvC97lMJfyXjcr9myMAxojFypxW8lUNiEGiwdaOtezf74-OBCHDXn7T4Ru57oelrDHb01wY9IMU1Qdl6A
Expires: Wed, 04 Sep 2019 19:26:02 GMT
Date: Wed, 04 Sep 2019 19:26:02 GMT
Cache-Control: private, max-age=0
Last-Modified: Tue, 20 Aug 2019 15:50:08 GMT
ETag: "ca0368fcd3c4c1a99aca42511d0c1f12"
x-goog-generation: 1566316208157027
x-goog-metageneration: 1
x-goog-stored-content-encoding: identity
x-goog-stored-content-length: 77
Content-Type: application/octet-stream
x-goog-hash: crc32c=BI0EFw==
x-goog-hash: md5=ygNo/NPEwamaykJRHQwfEg==
x-goog-storage-class: MULTI_REGIONAL
Accept-Ranges: bytes
Content-Length: 77
Server: UploadServer
Alt-Svc: quic=":443"; ma=2592000; v="46,43,39"

Fix

The following lines of code should be updated so that, if only one ETag was requested, and no ETag header was returned in the response, the requested ETag is assumed:

The error is produced by code that we use from BuildKit; in this part of the BuildKit code; https://github.com/moby/buildkit/blob/efc63357b84c844d72d68e017dcdbe8741122380/source/http/httpsource.go#L214-L226

I’m not sure if it’s an issue with the code, or if it’s an issue with the server from which you’re downloading though. I’m not very familiar with that part of the BuildKit code, but from looking at the code, it seems that it’s producing that error if BuildKit previously downloaded that URL (and stored the ETag), and when checking if the URL is still current (or if the cache can be used), it got a 304 “StatusNotModified” status from the server, but the server actually replied with a different ETag (in other words; the server responds that the content wasn’t modified, but the ETag indicates that it was)?

Still happening with Docker 20.10.18. No OAuth or authentication is involved:

$ curl -I "https://oss.sonatype.org/service/local/artifact/maven/content?r=releases&g=org.openrefine&a=openrefine&v=3.6.1&c=linux&p=tar.gz"
HTTP/2 200 
date: Wed, 26 Oct 2022 13:25:28 GMT
content-type: application/x-gzip
content-length: 141540183
server: Nexus/2.15.1-02 Noelios-Restlet-Engine/1.1.6-SONATYPE-5348-V8
x-frame-options: SAMEORIGIN
x-content-type-options: nosniff
last-modified: Mon, 22 Aug 2022 19:56:56 GMT
etag: "{SHA1{cf151677b2a0184b73d637dbce9e6c82d98684de}}"
content-disposition: attachment; filename="openrefine-3.6.1-linux.tar.gz"
vary: Accept-Charset,Accept-Encoding,Accept-Language,Accept

The original issue was fixed two years ago; if you encounter this, and have more details, please use https://github.com/moby/buildkit/issues/2420 instead.

I’m also still having this issue with docker 20.10.8

I was getting this issue, but I found doing a docker system prune meant that I could rebuild again. Maybe this will help someone.

Not that this should change the fix, as its clearly need for the real world cases, but, out of interest, RFC 7232 describing the 304 response appear does require the server to send back the ETag

The server generating a 304 response MUST generate any of the following header fields that would have been sent in a 200 (OK) response to the same request: Cache-Control, Content-Location, Date, ETag, Expires, and Vary.