cri-o: Error log: Failed to create existing container
Description
I checked the similar issues #3259 and #4465 , but fail to fix with the workaround.
My situation is that the error messages were triggered after restarting crio & kubelet.
Though the error messages can disappear after reboot, is there any solution to this problem without rebooting?
Steps to reproduce the issue:
systemctl stop kubelet
systemctl restart crio
systemctl start kubelet
journalctl -xeu kubelet -f
Describe the results you received:
I executed tests with different cgroup drivers, and I received errors and warnings below in kubelet logs every minute.
#kubelet & crio cgroup driver use systemd:
kubelet[16645]: E1119 12:07:05.393398 16645 manager.go:1123] Failed to create existing container: /kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-pod1e900d5d_3559_4bc2_9b52_761eb5ed3c3f.slice/crio-217e7ff7799d62d02282afd1a0f2a8bbba85e62c8614ad4bf0aa9c029f2a0661.scope: Error finding container 217e7ff7799d62d02282afd1a0f2a8bbba85e62c8614ad4bf0aa9c029f2a0661: Status 404 returned error &{%!s(*http.body=&{0xc001297aa0 <nil> <nil> false false {0 0} false false false <nil>}) {%!s(int32=0) %!s(uint32=0)} %!s(bool=false) <nil> %!s(func(error) error=0x55fd88bf3020) %!s(func() error=0x55fd88bf2fa0)}
#kubelet & crio cgroup driver use cgroupfs:
kubelet[7608]: E1203 12:30:49.996168 7608 manager.go:1123] Failed to create existing container: /kubepods/burstable/pod5f65e3d8d489e8fea295a7cd01aff842/crio-a3e66d2af18c551b3a281b5369de361a19ed1ea1ed508563fbdf94366cb321eb: Error finding container a3e66d2af18c551b3a281b5369de361a19ed1ea1ed508563fbdf94366cb321eb: Status 404 returned error &{%!s(*http.body=&{0xc000b6cfd8 <nil> <nil> false false {0 0} false false false <nil>}) {%!s(int32=0) %!s(uint32=0)} %!s(bool=false) <nil> %!s(func(error) error=0x55fe06c23f40) %!s(func() error=0x55fe06c23ec0)}
Describe the results you expected:
In kubelet logs, no errors or warnings of “Failed to create existing container”.
Additional information you deem important (e.g. issue happens only occasionally):
I received these logs every 60 seconds after restarting crio and kubelet.
Output of crio --version
:
crio version 1.22.1
Version: 1.22.1
GitCommit: 63ca93845d5fe05cdca826367afcb601ece8d7ad
GitTreeState: clean
BuildDate: 2021-11-11T20:24:17Z
GoVersion: go1.16.8
Compiler: gc
Platform: linux/amd64
Linkmode: dynamic
BuildTags: exclude_graphdriver_devicemapper, seccomp
SeccompEnabled: true
AppArmorEnabled: false
Additional environment details (AWS, VirtualBox, physical, etc.):
OS: Centos 7
Kubernetes v1.22.1
crio configuration:
...
[crio.runtime]
conmon_cgroup = "system.slice"
cgroup_manager = "systemd"
...
Kubelet configuration:
apiVersion: kubelet.config.k8s.io/v1beta1
authentication:
anonymous:
enabled: false
webhook:
cacheTTL: 0s
enabled: true
x509:
clientCAFile: /etc/kubernetes/pki/ca.crt
authorization:
mode: Webhook
webhook:
cacheAuthorizedTTL: 0s
cacheUnauthorizedTTL: 0s
cgroupDriver: systemd
clusterDNS:
- 10.96.0.10
clusterDomain: cluster.local
cpuManagerPolicy: static
cpuManagerReconcilePeriod: 5s
evictionPressureTransitionPeriod: 0s
fileCheckFrequency: 0s
healthzBindAddress: 127.0.0.1
healthzPort: 10248
httpCheckFrequency: 0s
imageMinimumGCAge: 0s
kind: KubeletConfiguration
kubeReserved:
cpu: 500m
kubeletCgroups: /systemd/system.slice
logging: {}
memorySwap: {}
nodeStatusReportFrequency: 0s
nodeStatusUpdateFrequency: 0s
rotateCertificates: true
runtimeRequestTimeout: 0s
shutdownGracePeriod: 0s
shutdownGracePeriodCriticalPods: 0s
staticPodPath: /etc/kubernetes/manifests
streamingConnectionIdleTimeout: 0s
syncFrequency: 0s
volumeStatsAggPeriod: 0s
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 28 (7 by maintainers)
So I got bitten by this, and decided to troubleshoot it a bit. It looks like when
kubelet
is not running and you delete a pod fromcri-o
directly (usingcrictl rmp $POD_ID
), its correspondingsystemd
slice is left around, for some reason:and looking at the system logs, there’s no
that happens when
kubelet
is running and a pod is deleted through the k8s API server.This is a bit surprising, because, the slice description being
libcontainer container kubepods-besteffort-pod2e0090b5_3385_4d73_93fa_31e5e17999c7.slice
, I would have expected forrunc
to manage its lifecycle, but the only explanation for the above behaviour is that it’s thekubelet
that manages the slices?Later edit: turns out that
kubelet
also useslibcontainer
to managesystemd
cgroups, hence the slice name (which is set here, btw: https://github.com/opencontainers/runc/blob/255fe4099ed06edf5416c6af7dd736fcd8f3c5d2/libcontainer/cgroups/systemd/v1.go#L177 ).So in order to fix this, after stopping
kubelet
, deleting all the pods withcrictl rmp
, I also manually deleted the leftover slices:I then started
kubelet
, which created new pods, and the error messages went away.Been digging this for a while. I’ve discovered that the uids of the pods are actually there.
But kubelet said nope shits returning 404.
Is there a way to manually simulate what kubelet was trying to request and verify against on CRI?
I figured the only way to solve this is to manually step through the process.
Version Info
Edit: Just upgraded kubelet to 1.26.1 and the results are still the same.
Here are the CRI-O’s debug level log.
crio_debug.txt
crictl pods
:crictl ps -a
:Also, a workaround did not worked,
and there are some errors in
crictl rmp -fa
after
crictl rmp -fa
crictl pods
:crictl ps -a
: