cri-o: Error: Image not known after upgrade cri-o upgrade

Description

I am updating cri-o across the nodes in my Kubernetes cluster. Prior to upgrade, everything seems to be working fine. After upgrade, some (but not all) workloads will no longer run. They get stuck in a CreateContainerError state. When I describe the pods for these workloads, I see the following:

      Successfully assigned wazuh/wazuh-7bb996795-2zjfb to ip-10-240-5-229.eu-central-1.compute.internal
      Normal   Pulled     5m33s (x8 over 7m4s)  kubelet, ip-10-240-5-229.eu-central-1.compute.internal  Successfully pulled image "datica/wazuh:3.6.1"
      Warning  Failed     5m33s (x8 over 7m4s)  kubelet, ip-10-240-5-229.eu-central-1.compute.internal  Error: image not known

Also, when I attempt to delete the pods, they seem to get stuck in a Terminating state.

Steps to reproduce the issue:

Upgrade coreos on hosts from v1967.6.0 to v2191.5.0 (to get glibc 2.29)
Upgrade cri-o from v1.13.3 to v1.15.1-dev
Upgrade Kubernetes control plane components from 1.13.10 to 1.15.3 (I’m testing now to see if this issue occurs without upgrading k8s components, will update after)

Describe the results you received: Workloads are no longer running correctly. Some can’t start, and have the error above in their events. Deleting pods results in them being stuck indefinitely in a Terminating state.

Describe the results you expected: All workloads would continue running.

Additional information you deem important (e.g. issue happens only occasionally): I found one Redhat thread that discussed a similar issue and indicated upgrading cri-o from certain versions could cause problems. Not sure if it’s relevant though, and the thread did not really explain why their issue was occurring.

Output of crio --version: Prior to upgrade:

crio version 1.13.3
commit: "5a3c24900797986fd3f1f39094aeea8c4a4354ef"

After upgrade:

crio version 1.15.1-dev
commit: "7eb0fb039a8e379bcda319826a34caec983e8519-dirty"

Additional environment details (AWS, VirtualBox, physical, etc.): CoreOS nodes running in AWS.

About this issue

Original URL
State: closed
Created 5 years ago
Comments: 16 (8 by maintainers)

Most upvoted comments

also 1.14.10 would probably be best too

haircommander on Sep 24, 2019

can you first try to upgrade to the latest on the 1.13 branch before going to 1.15? There were known issues in container/image storage that were fixed after 1.13.9

haircommander on Sep 24, 2019