moby: Dockerfile ADD remote url does not use any HTTP header so always re-downloads
- To Reproduce:
- Build the below Dockerfile a couple of times
docker build .
Each time the jar file gets downloaded even though the file has not changed on Nexus.
- Build the below Dockerfile a couple of times
FROM ubuntu
ADD ["https://oss.sonatype.org/service/local/artifact/maven/content?r=public&g=org.eclipse.xtext&a=org.eclipse.xtext.builder&v=2.9.0-SNAPSHOT", "/opt/xtext-builder.jar"]
CMD ["echo", "hello!"]
- However, Nexus returns HTTP headers like
Last-Modified
orETag
that could be used to prevent the re-download and just use what’s in the Docker cache.
# curl -I "https://oss.sonatype.org/service/local/artifact/maven/content?r=public&g=org.eclipse.xtext&a=org.eclipse.xtext.builder&v=2.9.0-SNAPSHOT"
HTTP/1.1 200 OK
Content-Disposition: attachment; filename="org.eclipse.xtext.builder-2.9.0-20150820.042448-99.jar"
Content-Length: 341045
Content-Type: application/java-archive
Date: Thu, 20 Aug 2015 10:14:34 GMT
ETag: "{SHA1{44253ea5406c02ead80789cbc763c8c27ba87124}}"
Last-Modified: Thu, 20 Aug 2015 04:24:49 GMT
Server: nginx
Vary: Accept-Charset, Accept-Encoding, Accept-Language, Accept
X-Content-Type-Options: nosniff
X-Frame-Options: SAMEORIGIN
Connection: keep-alive
docker version: Docker version 1.8.1, build d12ea79
docker info:
Containers: 2
Images: 365
Storage Driver: aufs
Root Dir: /var/lib/docker/aufs
Backing Filesystem: extfs
Dirs: 369
Dirperm1 Supported: false
Execution Driver: native-0.2
Logging Driver: json-file
Kernel Version: 3.13.0-49-generic
Operating System: Ubuntu 14.04.2 LTS
CPUs: 4
Total Memory: 15.67 GiB
Name: UBUNTU4116V
ID: OTB6:5W7M:D7PU:M6Q2:KKCI:PKI3:2TS4:SPZ5:6LQC:CYIW:N4AJ:XREU
WARNING: No swap limit support
uname -a: Linux UBUNTU4116V 3.13.0-49-generic #83-Ubuntu SMP Fri Apr 10 20:11:33 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
Ubuntu image running on VMWare
About this issue
- Original URL
- State: open
- Created 9 years ago
- Reactions: 16
- Comments: 19 (11 by maintainers)
@ed-alertedh I think this is implemented in BuildKit; can you try with BuildKit enabled? (set the
DOCKER_BUILDKIT=1
environment variable before runningdocker build
)After a lengthy discussion in our maintainers meeting;
Didn’t spot that, thanks. ETag seems like a safer way to do it, though. That’s typically a hash of the file’s contents.
There’s probably a certain degree of “good enough” for busting the cache too. E.g.
RUN wget https://...
will never get downloaded again if it changes, and people seem to be alright with that. If you absolutely want a URL to be downloaded again, you can do adocker build --no-cache
.It would be great for my build workflow to have this option. Actually, I’m downloading a 200 Mb file everytime! I don’t want to put this file in my git repo! Also, the file could eventually change.
https://github.com/docker/docker/issues/12361#issuecomment-93992321
@ORESoftware that output looks like you’re using the classic builder; can you try using the buildkit builder? See my comment above https://github.com/moby/moby/issues/15717#issuecomment-493854811
I am in favour of comparing last-modified/etag as an initial check and falling through to a data-check, was just noting a previous position.
Why bother parsing it? Just hash the last-modified/etag, then compare it to the existing hash. If it matches then great, no need to download. I guess you’d store this in addition to the file hash.