test-infra: E2E test images: httpd images failed to push to staging
What happened:
The Image Builder postsubmit jobs post-kubernetes-push-e2e-httpd-test-images and post-kubernetes-push-e2e-new-httpd-test-images are failing with a 401 Unauthorized error while trying to push to gcr.io/k8s-staging-e2e-test-images.
What you expected to happen:
It should have been able to push the images.
How to reproduce it (as minimally and precisely as possible):
Rerun the jobs.
Please provide links to example occurrences, if any:
[1] https://testgrid.k8s.io/sig-testing-images#post-kubernetes-push-e2e-httpd-test-images [2] https://testgrid.k8s.io/sig-testing-images#post-kubernetes-push-e2e-httpd-new-test-images [3] https://testgrid.k8s.io/sig-testing-images#kubernetes-e2e-windows-servercore-cache
Anything else we need to know?:
Worth noting that the job passed on 2021.02.09, but failed on 2021.02.15. The prow job config is fine, running the k8s-staging-e2e-test-images.sh script that generated the job reveals no diff.
Additionally, on 2021.02.11 the kubernetes-e2e-windows-servercore-cache job passed [3], a job which is similarly defined to the other 2 jobs.
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 46 (46 by maintainers)
It seems that it was succesful for the following images:
glusterdynamic-provisioner: https://prow.k8s.io/view/gs/kubernetes-jenkins/logs/post-kubernetes-push-e2e-glusterdynamic-provisioner-test-images/1366557273958125568 httpd: https://prow.k8s.io/view/gs/kubernetes-jenkins/logs/post-kubernetes-push-e2e-httpd-test-images/1366557274016845824 nginx: https://prow.k8s.io/view/gs/kubernetes-jenkins/logs/post-kubernetes-push-e2e-nginx-test-images/1366557274096537600 nginx-new: https://prow.k8s.io/view/gs/kubernetes-jenkins/logs/post-kubernetes-push-e2e-nginx-test-images/1366557274096537600 perl: https://prow.k8s.io/view/gs/kubernetes-jenkins/logs/post-kubernetes-push-e2e-perl-test-images/1366557274436276224
It seems that it failed for:
busybox: https://prow.k8s.io/view/gs/kubernetes-jenkins/logs/post-kubernetes-push-e2e-busybox-test-images/1366557273911988224 httpd-new: https://prow.k8s.io/view/gs/kubernetes-jenkins/logs/post-kubernetes-push-e2e-httpd-test-images/1366557274016845824
It seems that for the httpd-new image, it generated the same SHA:
This sha already exists:
Somehow, I missed adding the label to the Dockerfile… oups! Sent a PR here: https://github.com/kubernetes/kubernetes/pull/99631
For the busybox image, it failed on the Windows images, I’ll take a look at that. But the idea worked. 😃
I have been looking into it for a bit. So, the
--output=type=registryis the only viable solution for us, considering we also want the Windows images there. It is a known fact that you simply cannot have Windows images in a “normal docker” on a Linux:From my experiments is that
docker buildx build --output=type=dockerthe other Linux images can actually be referenced locally throughdocker imagesand so on, but not the Windows images. Trying to build the Windows images with docker buildx and the output type=docker, we can see that it actually tries to import into Docker images:There’s no error printed, but it doesn’t end up in docker images. I’ve also tried to build the Windows images with output type=oci, which generated a .tar file. You can actually import it locally with
docker import image.tar, but inspecting the imported image [1], you can see that it sets the os/arch type tolinux/amd64, which is not quite right. Other than that, it can be seen that the User, env variables (including thePATH) are stripped away, which is problematic. Trying to import the image with--platform "windows/amd64"we can see:Which confirms that
docker buildx build --output=type=dockerprobably encountered the same issue. Furthermore, having Windows images on a Linux node has been a pretty frequent question. I this this covers the reason nicely [2].So, I still think
--output=type=registryis the way to go.--output=type=ocior tar could work, but then we’d have to push them to the registry ourselves, which is supposed to be docker’s / buildx’s purpose in the first place.[1] https://paste.ubuntu.com/p/m2tmX3yGMW/ [2] https://forums.docker.com/t/docker-daemon-on-ubuntu-pull-windows-containers-or-create-my-own/28823/6
Oh. That might be it. It might be because the exact same hash is being pushed.
I have been looking into it for a bit. So, the
--output=type=registryis the only viable solution for us, considering we also want the Windows images there. It is a known fact that you simply cannot have Windows images in a “normal docker” on a LinuxOh. That might be it.
Right,
gcr.io/k8s-staging-e2e-test-images/nginx:1.14-monkeys-linux-amd64andgcr.io/k8s-staging-e2e-test-images/nginx:1.14-alpine-linux-amd64has the same sha, which is identical tonginx:1.14-alpinesince the image was mirrored. I had the same sha on my own registry as well. So, it should work if we generate a new sha then. We could make a trivial change in the Dockerfile:Building this, I then get the sha:
Which is now different from the previous
a2d0ea7d3550b0853d04263025e6cfcc353f3e102fe725d19b9fc51282603f02. Being a different sha, it should be pushable. If we look at the prow job history, we’d see that the httpd and nginx inage jobs worked exactly once: when the images dockerhub images were mirrored for the first time. This doesn’t affect pushing images to dockerhub, I would have encountered this issue too before.IMO, we could go ahead with this fix, making a note in the
README.mdtoo, and push on the fix to merge on docker.