moby: Docker pull/push won't recognize images from docker load.. sometimes?
If you do a docker load
of an image/repo exported with docker save <repo>:<tag>
, you’ll have the images and layers available for building. If you, however, attempt to do a docker push <repo>:<tag>
, it will push up all of the layers, even if they are identical to the ones in the remote registry for that <repo>:<tag>
.
If, instead, you do a docker pull <repo>:<tag>
before doing a docker push
, it will download all of the layers (even though you’ve already got them via your previous docker load
), but then subsequently be smart about only pulling/pushing new layers.
What’s even more mystifying is that if you’ve done a docker pull
, then used docker rmi
to remove all of the images you pulled, then doing docker load
followed by docker push
will properly track ‘already present’ layers.
There’s something a docker pull
does to the local docker cache that establishes a link to the remote registry that I don’t fully understand. Is there some way this link can be established without downloading all of the (already present via docker load
) remote images/layers?
Output of docker version
:
Client:
Version: 1.11.2
API version: 1.23
Go version: go1.5.4
Git commit: b9f10c9
Built: Wed Jun 1 21:20:08 2016
OS/Arch: linux/amd64
Server:
Version: 1.11.2
API version: 1.23
Go version: go1.5.4
Git commit: b9f10c9
Built: Wed Jun 1 21:20:08 2016
OS/Arch: linux/amd64
Output of docker info
:
Containers: 0
Running: 0
Paused: 0
Stopped: 0
Images: 1
Server Version: 1.11.2
Storage Driver: btrfs
Build Version: Btrfs v3.17
Library Version: 101
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: null host bridge
Kernel Version: 4.4.12-boot2docker
Operating System: Buildroot 2016.02 (containerized)
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 1.955 GiB
Name: c79f2d0a-a4b0-4248-4d41-bab905aad64c
ID: WJMK:7VEO:PFBY:NSII:JD6N:AIHE:M3QK:YGS4:EBA7:DHJN:JRTO:WLQN
Docker Root Dir: /var/lib/docker
Debug mode (client): false
Debug mode (server): false
Registry: https://index.docker.io/v1/
WARNING: bridge-nf-call-iptables is disabled
WARNING: bridge-nf-call-ip6tables is disabled
Additional environment details (AWS, VirtualBox, physical, etc.):
Running inside of a runC-based container, but I’ve duplicated the behavior using OSX docker-machine/VirtualBox set up.
Steps to reproduce the issue: First:
- Do a
docker pull <repo>:<tag>
. - Do a
docker save -o image.tar <repo>:<tag>
- In fresh environment, run
docker load -i image.tar
on file created from step 2.
Follow steps 1-2, then do each scenario below after doing step 3 in a fresh environment:
- A. Do
docker push <repo>:<tag>
to try to push the loaded image. It will push all of the layers, even though they’re already present in the remote registry. - B. Do
docker pull <repo>:<tag>
. It will pull down all of the layers, even though they are already present from doingdocker load
. - C.
- a. Do
docker pull <repo>:<tag>
. (Everything is downloaded) - b. Do
docker rmi <repo>:<tag>
. (All of the images are removed; another pull will download everything again) - c. Do
docker load -i image.tar
. (Images will be back again) - d. Do
docker pull <repo>:<tag>
. (Layers will not be downloaded, saying they already exist!) - e. Instead of d, do
docker push <repo>:<tag>
. (Layers will not be pushed, saying they already exist!)
- a. Do
Describe the results you received:
A. It will push all of the layers, even though they’re already present in the remote registry.
B. It will pull down all of the layers, even though they are already present from doing docker load
.
C. After doing a docker pull
followed by a docker rmi
, the images/layers loaded via docker load
will be recognized as ‘already exists’ when pushing to/pulling from the remote repo, even though they weren’t before.
Describe the results you expected:
A. It should say “Layer already exists” when trying to push each layer, since they were loaded already.
B. It should say “Already exists” for all of the layers loaded in 3, and not download them.
C. The images and layers loaded via docker load
should be treated as ‘already exists’ when doing docker pull
or docker push
regardless of whether or not a docker pull
has happened before. Downloading everything via docker pull
then deleting everything via docker rmi
and then loading everything again via docker load
should not be different from just doing a docker load
in the first place.
Additional information you deem important (e.g. issue happens only occasionally):
This is a huge impediment for an automated CI build process. Having no way to ‘prime the cache’ via docker load
in a fresh build environment (which may be spun up dynamically!) and having to pay the price of doing a full docker pull
of a potentially multi-hundred megabyte docker image just to ‘sync up’ with the remote repository when you’ve already got all of the data loaded is unacceptable.
Building and pushing only the new layers should be a very quick process (in some cases, 30 seconds or less), but the forced download of the entire image can turn this into a minimum of 5-10 minutes. Using a local registry configured for pull-through caching could help, but as per the documentation and the roadmap, this cannot be used for private registries. Because using Dockerhub to host repos (even private repos) isn’t an acceptable solution for many (including my organization), the local registry mirror isn’t a solution. Even were it possible, when all of the data has already been loaded via docker load
needing to download it again for any reason is silly.
About this issue
- Original URL
- State: open
- Created 8 years ago
- Reactions: 11
- Comments: 17 (7 by maintainers)
@tonistiigi Ok; even if the solution I proposed does not work, the underlying problem remains.
What is the “Docker-approved” pattern for building and pushing new versions of a production image? In combination with problems described in issue #20316, it seems as though the expected usage is to perform all builds and pushes of a particular docker image from the same workstation. If you don’t, you either lose the build cache or, in the case that you’ve worked around that issue it by using
docker save
/docker load
, you are forced todocker pull
the entire image every time.This is the very first paragraph of the Docker Overview documentation:
When a goal of the technology is explicitly stated to be shipping/testing/deploying code faster, adding up to half an hour to the the build process is an extremely big problem.
🙏