cri-o: Pods with memory requests/limits set cannot start on 1.23.1 + ubuntu 2004 + cri-o 1.23.0
Description
This cluster was originally created with k8s 1.22.2 on ubuntu 2004 vms using kubeadm with no special config. When upgrading to 1.23.1, pods with resource requests and/or limits set fail to start with the following error:
Error: container create failed: time="2022-01-03T10:22:18Z" level=error msg="container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: process_linux.go:508: setting cgroup config for procHooks process caused: open /sys/fs/cgroup/memory/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod2f5254d9_5b91_4987_8ea9_ddf323e3623b.slice/crio-35fa56fb8d2ad4995c85027176ef30ffcddc67f6ec96a8ea34ba4318670b7d99.scope/memory.memsw.limit_in_bytes: no such file or directory"
cri-o logs show the following:
Jan 03 10:41:08 worker0-k8s-mgmt crio[731]: time="2022-01-03 10:41:08.927188409Z" level=info msg="Checking image status: k8s.gcr.io/coredns/coredns:v1.8.6" id=278e27cc-031c-44f5-8773-c5c65133ead0 name=/runtime.v1.ImageService/ImageStatus
Jan 03 10:41:08 worker0-k8s-mgmt crio[731]: time="2022-01-03 10:41:08.928357846Z" level=info msg="Image status: &ImageStatusResponse{Image:&Image{Id:a4ca41631cc7ac19ce1be3ebf0314ac5f47af7c711f17066006db82ee3b75b03,RepoTags:[k8s.gcr.io/coredns/coredns:v1.8.6],RepoDigests:[k8s.gcr.io/coredns/coredns@sha256:5b6ec0d6de9baaf3e92d0f66cd96a25b9edbce8716f5f15dcd1a616b3abd590e k8s.gcr.io/coredns/coredns@sha256:8916c89e1538ea3941b58847e448a2c6d940c01b8e716b20423d2d8b189d3972],Size_:46959895,Uid:nil,Username:,Spec:nil,},Info:map[string]string{},}" id=278e27cc-031c-44f5-8773-c5c65133ead0 name=/runtime.v1.ImageService/ImageStatus
Jan 03 10:41:08 worker0-k8s-mgmt crio[731]: time="2022-01-03 10:41:08.929932099Z" level=info msg="Checking image status: k8s.gcr.io/coredns/coredns:v1.8.6" id=b995fb6f-8820-4d3e-995a-87498db9e66a name=/runtime.v1.ImageService/ImageStatus
Jan 03 10:41:08 worker0-k8s-mgmt crio[731]: time="2022-01-03 10:41:08.930778697Z" level=info msg="Image status: &ImageStatusResponse{Image:&Image{Id:a4ca41631cc7ac19ce1be3ebf0314ac5f47af7c711f17066006db82ee3b75b03,RepoTags:[k8s.gcr.io/coredns/coredns:v1.8.6],RepoDigests:[k8s.gcr.io/coredns/coredns@sha256:5b6ec0d6de9baaf3e92d0f66cd96a25b9edbce8716f5f15dcd1a616b3abd590e k8s.gcr.io/coredns/coredns@sha256:8916c89e1538ea3941b58847e448a2c6d940c01b8e716b20423d2d8b189d3972],Size_:46959895,Uid:nil,Username:,Spec:nil,},Info:map[string]string{},}" id=b995fb6f-8820-4d3e-995a-87498db9e66a name=/runtime.v1.ImageService/ImageStatus
Jan 03 10:41:08 worker0-k8s-mgmt crio[731]: time="2022-01-03 10:41:08.931419905Z" level=info msg="Creating container: kube-system/coredns-8554ccb6dd-tzdcj/coredns" id=68474b21-6263-43c1-a002-50998549538d name=/runtime.v1.RuntimeService/CreateContainer
Jan 03 10:41:08 worker0-k8s-mgmt crio[731]: time="2022-01-03 10:41:08.931482393Z" level=warning msg="Allowed annotations are specified for workload [] "
Jan 03 10:41:08 worker0-k8s-mgmt crio[731]: time="2022-01-03 10:41:08.931494869Z" level=warning msg="Allowed annotations are specified for workload []"
Jan 03 10:41:08 worker0-k8s-mgmt crio[731]: time="2022-01-03 10:41:08.948575637Z" level=warning msg="Failed to open /etc/passwd: open /var/lib/containers/storage/overlay/83c9c9ec5684fa2fd1c943d300507751866efb0be5ae15652cc6a97d2be47571/merged/etc/passwd: no such file or directory"
Jan 03 10:41:08 worker0-k8s-mgmt crio[731]: time="2022-01-03 10:41:08.948617867Z" level=warning msg="Failed to open /etc/group: open /var/lib/containers/storage/overlay/83c9c9ec5684fa2fd1c943d300507751866efb0be5ae15652cc6a97d2be47571/merged/etc/group: no such file or directory"
Jan 03 10:41:09 worker0-k8s-mgmt crio[731]: time="2022-01-03 10:41:09.007934302Z" level=error msg="Container creation error: time=\"2022-01-03T10:41:09Z\" level=error msg=\"container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: process_linux.go:508: setting cgroup config for procHooks process caused: open /sys/fs/cgroup/memory/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod2f5254d9_5b91_4987_8ea9_ddf323e3623b.slice/crio-4894546f2ac322dd116a01ae0da1c05cae1b1e079f8552ea5a3f84ef9a3fa816.scope/memory.memsw.limit_in_bytes: no such file or directory\"\n" id=68474b21-6263-43c1-a002-50998549538d name=/runtime.v1.RuntimeService/CreateContainer
Jan 03 10:41:09 worker0-k8s-mgmt crio[731]: time="2022-01-03 10:41:09.015875129Z" level=info msg="createCtr: deleting container ID 4894546f2ac322dd116a01ae0da1c05cae1b1e079f8552ea5a3f84ef9a3fa816 from idIndex" id=68474b21-6263-43c1-a002-50998549538d name=/runtime.v1.RuntimeService/CreateContainer
Jan 03 10:41:09 worker0-k8s-mgmt crio[731]: time="2022-01-03 10:41:09.016122205Z" level=info msg="createCtr: deleting container ID 4894546f2ac322dd116a01ae0da1c05cae1b1e079f8552ea5a3f84ef9a3fa816 from idIndex" id=68474b21-6263-43c1-a002-50998549538d name=/runtime.v1.RuntimeService/CreateContainer
Jan 03 10:41:09 worker0-k8s-mgmt crio[731]: time="2022-01-03 10:41:09.016293899Z" level=info msg="createCtr: deleting container ID 4894546f2ac322dd116a01ae0da1c05cae1b1e079f8552ea5a3f84ef9a3fa816 from idIndex" id=68474b21-6263-43c1-a002-50998549538d name=/runtime.v1.RuntimeService/CreateContainer
Jan 03 10:41:09 worker0-k8s-mgmt crio[731]: time="2022-01-03 10:41:09.038543960Z" level=info msg="createCtr: deleting container ID 4894546f2ac322dd116a01ae0da1c05cae1b1e079f8552ea5a3f84ef9a3fa816 from idIndex" id=68474b21-6263-43c1-a002-50998549538d name=/runtime.v1.RuntimeService/CreateContainer
Steps to reproduce the issue:
- use kubeadm to upgrade the cluster
- upgrade cri-o from suse/libcontainers repo from 1.22 to 1.23
- schedule pods with resource limits onto upgraded node - they fail to start
- downgrade cri-o back to 1.22
- pods start
Describe the results you received:
Pods with resource limits/requests set fail to start.
Describe the results you expected:
Pods should start.
Additional information you deem important (e.g. issue happens only occasionally):
I note that when running crio manually, I see the following logs:
WARN[2022-01-03 10:59:48.273861323Z] node configuration validation for memoryswap cgroup failed: node not configured with memory swap
INFO[2022-01-03 10:59:48.273886219Z] Node configuration value for memoryswap cgroup is false
INFO[2022-01-03 10:59:48.273900456Z] Node configuration value for cgroup v2 is false
This feels somewhat relevant because the reason the pod cannot start is the lack of memory.memsw.limit_in_bytes - as I understand it this is related to swap (which is disabled). I’m also puzzled by the log about cgroupv2 configuration being false - crio is configured to use systemd as the cgroup manager and systemd is using cgroupv2.
Downgrading cri-o to 1.22 allows pods to start as normal.
Output of crio --version:
crio version 1.23.0
Version: 1.23.0
GitCommit: 9b7f5ae815c22a1d754abfbc2890d8d4c10e240d
GitTreeState: clean
BuildDate: 2021-12-21T21:40:34Z
GoVersion: go1.17.5
Compiler: gc
Platform: linux/amd64
Linkmode: dynamic
BuildTags: apparmor, exclude_graphdriver_devicemapper, containers_image_ostree_stub, seccomp
SeccompEnabled: true
AppArmorEnabled: true
Additional environment details (AWS, VirtualBox, physical, etc.):
- Ubuntu 2004
- kernel 5.4.0
- cgroupv2 in use by systemd
- systemd 245
I’m unsure if it’s related, but containers-common was also upgraded at the same time from 1-21 to 1-22.
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 23 (12 by maintainers)
Commits related to this issue
- Downgrade cri-o to 1.22 - See https://github.com/cri-o/cri-o/issues/5527 — committed to memes/lab-config by memes 2 years ago
oopsies, this is definitely just a bug in cri-o: https://github.com/cri-o/cri-o/pull/5539 (we used to do this but accidentally dropped it when swap support was added)
fix is merged in main branch, I’m backporting to 1.23 and intend on cutting a 1.23.1 soon