minikube: Can't pull modestly sized image from Quay.io

What Happened?

Pulling a 600MB image and larger from Quay.io fails with context deadline exceeded.

The image is there, I can pull it with docker / podman / singularity.

I tagged the ubuntu:20.04 base image and pushed it to our Quay.io repo. This image is 27MB and minikube is able to successfully pull and run the image. But it fails to pull the 600MB image from the same repo.

This is not an authentication issue with the private repo because the 27MB image works.

On my network monitoring I can see the traffic from the 600MB image being pulled in, and it is pulling the full image in. On Quay.io I can see the pull logs for the image and it is being pulled, but it always fails due context deadline exceeded.

This is not a network issue. I have a stable and extremely fast connection.

I am at my wits end here. This is only a 600MB image. Larger images also fail. What is happening?

Attach the log file

$minikube start
  minikube v1.26.1 on Ubuntu 20.04 (kvm/amd64)
✨  Automatically selected the docker driver. Other choices: ssh, none
  Using Docker driver with root privileges
  Starting control plane node minikube in cluster minikube
  Pulling base image ...
  Downloading Kubernetes v1.24.3 preload ...
    > preloaded-images-k8s-v18-v1...:  405.75 MiB / 405.75 MiB  100.00% 317.54 
  Creating docker container (CPUs=2, Memory=25100MB) ...
  Preparing Kubernetes v1.24.3 on Docker 20.10.17 ...
    ▪ Generating certificates and keys ...
    ▪ Booting up control plane ...
    ▪ Configuring RBAC rules ...
  Verifying Kubernetes components...
    ▪ Using image gcr.io/k8s-minikube/storage-provisioner:v5
  Enabled addons: storage-provisioner, default-storageclass
  kubectl not found. If you need it, try: 'minikube kubectl -- get pods -A'
  Done! kubectl is now configured to use "minikube" cluster and "default" namespace by default

I cannot give you anything more than these relevant lines from minikube logs redacted for hopefully obvious reasons.

Aug 14 14:07:55 minikube kubelet[2069]: E0814 14:07:55.120188    2069 remote_image.go:218] "PullImage from image service failed" err="rpc error: code = Unknown desc = context deadline exceeded" image="quay.io/USER/IMAGE:TAG"
...
Aug 14 14:07:55 minikube kubelet[2069]: E0814 14:07:55.120724    2069 pod_workers.go:951] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"REDACTED\" with ErrImagePull: \"rpc error: code = Unknown desc = context deadline exceeded\"" pod="REDACTED/REDACTED" podUID=28b393c0-3573-41de-9747-386d911ab8fd
Aug 14 14:07:55 minikube kubelet[2069]: E0814 14:07:55.889526    2069 pod_workers.go:951] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"REDACTED\" with ImagePullBackOff: \"Back-off pulling image \\\"quay.io/USER/IMAGE:TAG\\\"\"" pod="REDACTED/REDACTED" podUID=28b393c0-3573-41de-9747-386d911ab8fd

Operating System

Ubuntu

Driver

Docker

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Reactions: 5
  • Comments: 35

Most upvoted comments

It was working with docker in kubernetes -1.23 (dockershim), and it will be working again in 1.24+ with cri-dockerd 0.2.6

Turns out using a different container run time, as Joss talked about earlier, fixed our problem. ex: minikube start --container-runtime=containerd. The Images are pulling and buildling just fine now. Wouldn’t have been able to figure that out without the debugging help here, so thanks to both of you!

Not really, fixed in cri-dockerd 0.2.6

A workaround would be to upgrade it

Workaround, meanwhile:

minikube ssh docker pull

Thanks again for all your help. Enjoy your sunday 😃

Short term, moving to a different runtime seems to be the solution.

That was supposed to be the long term solution… 😃 But, as the saying goes, now you have two problems ?

Once we can sort out the legacy docker socket issues, we too might move over to containerd as the default runtime.

Then again, the code says that short operations are two minutes (not the one observed)

It’s definitely closer to 1 minute than 2. I saw the 2 minute default the other day but we ruled it out at the time because we were definitely seeing it fail sooner.

Also I can see the whole 600MB image being pulled in in my network monitoring. It’s more than a trickle it’s a firehose 😃