moby: Docker pull/push won't recognize images from docker load.. sometimes?

If you do a docker load of an image/repo exported with docker save <repo>:<tag>, you’ll have the images and layers available for building. If you, however, attempt to do a docker push <repo>:<tag>, it will push up all of the layers, even if they are identical to the ones in the remote registry for that <repo>:<tag>.

If, instead, you do a docker pull <repo>:<tag> before doing a docker push, it will download all of the layers (even though you’ve already got them via your previous docker load), but then subsequently be smart about only pulling/pushing new layers.

What’s even more mystifying is that if you’ve done a docker pull, then used docker rmi to remove all of the images you pulled, then doing docker load followed by docker push will properly track ‘already present’ layers.

There’s something a docker pull does to the local docker cache that establishes a link to the remote registry that I don’t fully understand. Is there some way this link can be established without downloading all of the (already present via docker load) remote images/layers?

Output of docker version:

Client:
 Version:      1.11.2
 API version:  1.23
 Go version:   go1.5.4
 Git commit:   b9f10c9
 Built:        Wed Jun  1 21:20:08 2016
 OS/Arch:      linux/amd64

Server:
 Version:      1.11.2
 API version:  1.23
 Go version:   go1.5.4
 Git commit:   b9f10c9
 Built:        Wed Jun  1 21:20:08 2016
 OS/Arch:      linux/amd64

Output of docker info:

Containers: 0
 Running: 0
 Paused: 0
 Stopped: 0
Images: 1
Server Version: 1.11.2
Storage Driver: btrfs
 Build Version: Btrfs v3.17
 Library Version: 101
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: null host bridge
Kernel Version: 4.4.12-boot2docker
Operating System: Buildroot 2016.02 (containerized)
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 1.955 GiB
Name: c79f2d0a-a4b0-4248-4d41-bab905aad64c
ID: WJMK:7VEO:PFBY:NSII:JD6N:AIHE:M3QK:YGS4:EBA7:DHJN:JRTO:WLQN
Docker Root Dir: /var/lib/docker
Debug mode (client): false
Debug mode (server): false
Registry: https://index.docker.io/v1/
WARNING: bridge-nf-call-iptables is disabled
WARNING: bridge-nf-call-ip6tables is disabled

Additional environment details (AWS, VirtualBox, physical, etc.):

Running inside of a runC-based container, but I’ve duplicated the behavior using OSX docker-machine/VirtualBox set up.

Steps to reproduce the issue: First:

Do a docker pull <repo>:<tag>.
Do a docker save -o image.tar <repo>:<tag>
In fresh environment, run docker load -i image.tar on file created from step 2.

Follow steps 1-2, then do each scenario below after doing step 3 in a fresh environment:

A. Do docker push <repo>:<tag> to try to push the loaded image. It will push all of the layers, even though they’re already present in the remote registry.
B. Do docker pull <repo>:<tag>. It will pull down all of the layers, even though they are already present from doing docker load.
C.
- a. Do docker pull <repo>:<tag>. (Everything is downloaded)
- b. Do docker rmi <repo>:<tag>. (All of the images are removed; another pull will download everything again)
- c. Do docker load -i image.tar. (Images will be back again)
- d. Do docker pull <repo>:<tag>. (Layers will not be downloaded, saying they already exist!)
- e. Instead of d, do docker push <repo>:<tag>. (Layers will not be pushed, saying they already exist!)

Describe the results you received: A. It will push all of the layers, even though they’re already present in the remote registry. B. It will pull down all of the layers, even though they are already present from doing docker load. C. After doing a docker pull followed by a docker rmi, the images/layers loaded via docker load will be recognized as ‘already exists’ when pushing to/pulling from the remote repo, even though they weren’t before.

Describe the results you expected: A. It should say “Layer already exists” when trying to push each layer, since they were loaded already. B. It should say “Already exists” for all of the layers loaded in 3, and not download them. C. The images and layers loaded via docker load should be treated as ‘already exists’ when doing docker pull or docker push regardless of whether or not a docker pull has happened before. Downloading everything via docker pull then deleting everything via docker rmi and then loading everything again via docker load should not be different from just doing a docker load in the first place.

Additional information you deem important (e.g. issue happens only occasionally): This is a huge impediment for an automated CI build process. Having no way to ‘prime the cache’ via docker load in a fresh build environment (which may be spun up dynamically!) and having to pay the price of doing a full docker pull of a potentially multi-hundred megabyte docker image just to ‘sync up’ with the remote repository when you’ve already got all of the data loaded is unacceptable.

Building and pushing only the new layers should be a very quick process (in some cases, 30 seconds or less), but the forced download of the entire image can turn this into a minimum of 5-10 minutes. Using a local registry configured for pull-through caching could help, but as per the documentation and the roadmap, this cannot be used for private registries. Because using Dockerhub to host repos (even private repos) isn’t an acceptable solution for many (including my organization), the local registry mirror isn’t a solution. Even were it possible, when all of the data has already been loaded via docker load needing to download it again for any reason is silly.

About this issue

Original URL
State: open
Created 8 years ago
Reactions: 11
Comments: 17 (7 by maintainers)

Most upvoted comments

@tonistiigi Ok; even if the solution I proposed does not work, the underlying problem remains.

This is a huge impediment for an automated CI build process. Having no way to ‘prime the cache’ via docker load in a fresh build environment (which may be spun up dynamically!) and having to pay the price of doing a full docker pull of a potentially multi-hundred megabyte docker image just to ‘sync up’ with the remote repository when you’ve already got all of the data loaded is unacceptable.

Building and pushing only the new layers should be a very quick process (in some cases, 30 seconds or less), but the forced download of the entire image can turn this into a minimum of 5-10 minutes. Using a local registry configured for pull-through caching could help, but as per the documentation and the roadmap, this cannot be used for private registries. Because using Dockerhub to host repos (even private repos) isn’t an acceptable solution for many (including my organization), the local registry mirror isn’t a solution.

What is the “Docker-approved” pattern for building and pushing new versions of a production image? In combination with problems described in issue #20316, it seems as though the expected usage is to perform all builds and pushes of a particular docker image from the same workstation. If you don’t, you either lose the build cache or, in the case that you’ve worked around that issue it by using docker save/docker load, you are forced to docker pull the entire image every time.

This is the very first paragraph of the Docker Overview documentation:

Docker is an open platform for developing, shipping, and running applications. Docker is designed to deliver your applications faster. With Docker you can separate your applications from your infrastructure and treat your infrastructure like a managed application. Docker helps you ship code faster, test faster, deploy faster, and shorten the cycle between writing code and running code.

When a goal of the technology is explicitly stated to be shipping/testing/deploying code faster, adding up to half an hour to the the build process is an extremely big problem.

ellerycrane on Jun 27, 2016

🙏

tmossey on Jun 23, 2016