moby: Dockerfile ADD remote url does not use any HTTP header so always re-downloads

  • To Reproduce:
    • Build the below Dockerfile a couple of times docker build . Each time the jar file gets downloaded even though the file has not changed on Nexus.
FROM ubuntu

ADD ["https://oss.sonatype.org/service/local/artifact/maven/content?r=public&g=org.eclipse.xtext&a=org.eclipse.xtext.builder&v=2.9.0-SNAPSHOT", "/opt/xtext-builder.jar"]

CMD ["echo", "hello!"]
  • However, Nexus returns HTTP headers like Last-Modified or ETag that could be used to prevent the re-download and just use what’s in the Docker cache.
# curl -I "https://oss.sonatype.org/service/local/artifact/maven/content?r=public&g=org.eclipse.xtext&a=org.eclipse.xtext.builder&v=2.9.0-SNAPSHOT"

HTTP/1.1 200 OK
Content-Disposition: attachment; filename="org.eclipse.xtext.builder-2.9.0-20150820.042448-99.jar"
Content-Length: 341045
Content-Type: application/java-archive
Date: Thu, 20 Aug 2015 10:14:34 GMT
ETag: "{SHA1{44253ea5406c02ead80789cbc763c8c27ba87124}}"
Last-Modified: Thu, 20 Aug 2015 04:24:49 GMT
Server: nginx
Vary: Accept-Charset, Accept-Encoding, Accept-Language, Accept
X-Content-Type-Options: nosniff
X-Frame-Options: SAMEORIGIN
Connection: keep-alive

docker version: Docker version 1.8.1, build d12ea79

docker info:
Containers: 2
Images: 365
Storage Driver: aufs
 Root Dir: /var/lib/docker/aufs
 Backing Filesystem: extfs
 Dirs: 369
 Dirperm1 Supported: false
Execution Driver: native-0.2
Logging Driver: json-file
Kernel Version: 3.13.0-49-generic
Operating System: Ubuntu 14.04.2 LTS
CPUs: 4
Total Memory: 15.67 GiB
Name: UBUNTU4116V
ID: OTB6:5W7M:D7PU:M6Q2:KKCI:PKI3:2TS4:SPZ5:6LQC:CYIW:N4AJ:XREU
WARNING: No swap limit support

uname -a: Linux UBUNTU4116V 3.13.0-49-generic #83-Ubuntu SMP Fri Apr 10 20:11:33 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux Ubuntu image running on VMWare

About this issue

  • Original URL
  • State: open
  • Created 9 years ago
  • Reactions: 16
  • Comments: 19 (11 by maintainers)

Commits related to this issue

Most upvoted comments

@ed-alertedh I think this is implemented in BuildKit; can you try with BuildKit enabled? (set the DOCKER_BUILDKIT=1 environment variable before running docker build)

After a lengthy discussion in our maintainers meeting;

  • We’re open to adding this in future
  • However, the current caching mechanism makes it very complicated to check either on ETAG, or the checksum of the downloaded files, and would require “hacky” changes to the cache-store (which we want to avoid)
  • There are plans to refactor the caching store to be more flexible
  • After that refactor is done, we can revisit, and see if we can implement this

Didn’t spot that, thanks. ETag seems like a safer way to do it, though. That’s typically a hash of the file’s contents.

There’s probably a certain degree of “good enough” for busting the cache too. E.g. RUN wget https://... will never get downloaded again if it changes, and people seem to be alright with that. If you absolutely want a URL to be downloaded again, you can do a docker build --no-cache.

It would be great for my build workflow to have this option. Actually, I’m downloading a 200 Mb file everytime! I don’t want to put this file in my git repo! Also, the file could eventually change.

https://github.com/docker/docker/issues/12361#issuecomment-93992321

The decision was made to, basically, not trust things like the URL or timestamps, and instead actually check the data itself to make sure nothing has changed.

@ORESoftware that output looks like you’re using the classic builder; can you try using the buildkit builder? See my comment above https://github.com/moby/moby/issues/15717#issuecomment-493854811

I am in favour of comparing last-modified/etag as an initial check and falling through to a data-check, was just noting a previous position.

Why bother parsing it? Just hash the last-modified/etag, then compare it to the existing hash. If it matches then great, no need to download. I guess you’d store this in addition to the file hash.