cri-o: Pods with memory requests/limits set cannot start on 1.23.1 + ubuntu 2004 + cri-o 1.23.0

Description

This cluster was originally created with k8s 1.22.2 on ubuntu 2004 vms using kubeadm with no special config. When upgrading to 1.23.1, pods with resource requests and/or limits set fail to start with the following error:

Error: container create failed: time="2022-01-03T10:22:18Z" level=error msg="container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: process_linux.go:508: setting cgroup config for procHooks process caused: open /sys/fs/cgroup/memory/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod2f5254d9_5b91_4987_8ea9_ddf323e3623b.slice/crio-35fa56fb8d2ad4995c85027176ef30ffcddc67f6ec96a8ea34ba4318670b7d99.scope/memory.memsw.limit_in_bytes: no such file or directory"

cri-o logs show the following:

Jan 03 10:41:08 worker0-k8s-mgmt crio[731]: time="2022-01-03 10:41:08.927188409Z" level=info msg="Checking image status: k8s.gcr.io/coredns/coredns:v1.8.6" id=278e27cc-031c-44f5-8773-c5c65133ead0 name=/runtime.v1.ImageService/ImageStatus
Jan 03 10:41:08 worker0-k8s-mgmt crio[731]: time="2022-01-03 10:41:08.928357846Z" level=info msg="Image status: &ImageStatusResponse{Image:&Image{Id:a4ca41631cc7ac19ce1be3ebf0314ac5f47af7c711f17066006db82ee3b75b03,RepoTags:[k8s.gcr.io/coredns/coredns:v1.8.6],RepoDigests:[k8s.gcr.io/coredns/coredns@sha256:5b6ec0d6de9baaf3e92d0f66cd96a25b9edbce8716f5f15dcd1a616b3abd590e k8s.gcr.io/coredns/coredns@sha256:8916c89e1538ea3941b58847e448a2c6d940c01b8e716b20423d2d8b189d3972],Size_:46959895,Uid:nil,Username:,Spec:nil,},Info:map[string]string{},}" id=278e27cc-031c-44f5-8773-c5c65133ead0 name=/runtime.v1.ImageService/ImageStatus
Jan 03 10:41:08 worker0-k8s-mgmt crio[731]: time="2022-01-03 10:41:08.929932099Z" level=info msg="Checking image status: k8s.gcr.io/coredns/coredns:v1.8.6" id=b995fb6f-8820-4d3e-995a-87498db9e66a name=/runtime.v1.ImageService/ImageStatus
Jan 03 10:41:08 worker0-k8s-mgmt crio[731]: time="2022-01-03 10:41:08.930778697Z" level=info msg="Image status: &ImageStatusResponse{Image:&Image{Id:a4ca41631cc7ac19ce1be3ebf0314ac5f47af7c711f17066006db82ee3b75b03,RepoTags:[k8s.gcr.io/coredns/coredns:v1.8.6],RepoDigests:[k8s.gcr.io/coredns/coredns@sha256:5b6ec0d6de9baaf3e92d0f66cd96a25b9edbce8716f5f15dcd1a616b3abd590e k8s.gcr.io/coredns/coredns@sha256:8916c89e1538ea3941b58847e448a2c6d940c01b8e716b20423d2d8b189d3972],Size_:46959895,Uid:nil,Username:,Spec:nil,},Info:map[string]string{},}" id=b995fb6f-8820-4d3e-995a-87498db9e66a name=/runtime.v1.ImageService/ImageStatus
Jan 03 10:41:08 worker0-k8s-mgmt crio[731]: time="2022-01-03 10:41:08.931419905Z" level=info msg="Creating container: kube-system/coredns-8554ccb6dd-tzdcj/coredns" id=68474b21-6263-43c1-a002-50998549538d name=/runtime.v1.RuntimeService/CreateContainer
Jan 03 10:41:08 worker0-k8s-mgmt crio[731]: time="2022-01-03 10:41:08.931482393Z" level=warning msg="Allowed annotations are specified for workload [] "
Jan 03 10:41:08 worker0-k8s-mgmt crio[731]: time="2022-01-03 10:41:08.931494869Z" level=warning msg="Allowed annotations are specified for workload []"
Jan 03 10:41:08 worker0-k8s-mgmt crio[731]: time="2022-01-03 10:41:08.948575637Z" level=warning msg="Failed to open /etc/passwd: open /var/lib/containers/storage/overlay/83c9c9ec5684fa2fd1c943d300507751866efb0be5ae15652cc6a97d2be47571/merged/etc/passwd: no such file or directory"
Jan 03 10:41:08 worker0-k8s-mgmt crio[731]: time="2022-01-03 10:41:08.948617867Z" level=warning msg="Failed to open /etc/group: open /var/lib/containers/storage/overlay/83c9c9ec5684fa2fd1c943d300507751866efb0be5ae15652cc6a97d2be47571/merged/etc/group: no such file or directory"
Jan 03 10:41:09 worker0-k8s-mgmt crio[731]: time="2022-01-03 10:41:09.007934302Z" level=error msg="Container creation error: time=\"2022-01-03T10:41:09Z\" level=error msg=\"container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: process_linux.go:508: setting cgroup config for procHooks process caused: open /sys/fs/cgroup/memory/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod2f5254d9_5b91_4987_8ea9_ddf323e3623b.slice/crio-4894546f2ac322dd116a01ae0da1c05cae1b1e079f8552ea5a3f84ef9a3fa816.scope/memory.memsw.limit_in_bytes: no such file or directory\"\n" id=68474b21-6263-43c1-a002-50998549538d name=/runtime.v1.RuntimeService/CreateContainer
Jan 03 10:41:09 worker0-k8s-mgmt crio[731]: time="2022-01-03 10:41:09.015875129Z" level=info msg="createCtr: deleting container ID 4894546f2ac322dd116a01ae0da1c05cae1b1e079f8552ea5a3f84ef9a3fa816 from idIndex" id=68474b21-6263-43c1-a002-50998549538d name=/runtime.v1.RuntimeService/CreateContainer
Jan 03 10:41:09 worker0-k8s-mgmt crio[731]: time="2022-01-03 10:41:09.016122205Z" level=info msg="createCtr: deleting container ID 4894546f2ac322dd116a01ae0da1c05cae1b1e079f8552ea5a3f84ef9a3fa816 from idIndex" id=68474b21-6263-43c1-a002-50998549538d name=/runtime.v1.RuntimeService/CreateContainer
Jan 03 10:41:09 worker0-k8s-mgmt crio[731]: time="2022-01-03 10:41:09.016293899Z" level=info msg="createCtr: deleting container ID 4894546f2ac322dd116a01ae0da1c05cae1b1e079f8552ea5a3f84ef9a3fa816 from idIndex" id=68474b21-6263-43c1-a002-50998549538d name=/runtime.v1.RuntimeService/CreateContainer
Jan 03 10:41:09 worker0-k8s-mgmt crio[731]: time="2022-01-03 10:41:09.038543960Z" level=info msg="createCtr: deleting container ID 4894546f2ac322dd116a01ae0da1c05cae1b1e079f8552ea5a3f84ef9a3fa816 from idIndex" id=68474b21-6263-43c1-a002-50998549538d name=/runtime.v1.RuntimeService/CreateContainer

Steps to reproduce the issue:

use kubeadm to upgrade the cluster
upgrade cri-o from suse/libcontainers repo from 1.22 to 1.23
schedule pods with resource limits onto upgraded node - they fail to start
downgrade cri-o back to 1.22
pods start

Describe the results you received:

Pods with resource limits/requests set fail to start.

Describe the results you expected:

Pods should start.

Additional information you deem important (e.g. issue happens only occasionally):

I note that when running crio manually, I see the following logs:

WARN[2022-01-03 10:59:48.273861323Z] node configuration validation for memoryswap cgroup failed: node not configured with memory swap
INFO[2022-01-03 10:59:48.273886219Z] Node configuration value for memoryswap cgroup is false
INFO[2022-01-03 10:59:48.273900456Z] Node configuration value for cgroup v2 is false

This feels somewhat relevant because the reason the pod cannot start is the lack of memory.memsw.limit_in_bytes - as I understand it this is related to swap (which is disabled). I’m also puzzled by the log about cgroupv2 configuration being false - crio is configured to use systemd as the cgroup manager and systemd is using cgroupv2.

Downgrading cri-o to 1.22 allows pods to start as normal.

Output of crio --version:

crio version 1.23.0
Version:          1.23.0
GitCommit:        9b7f5ae815c22a1d754abfbc2890d8d4c10e240d
GitTreeState:     clean
BuildDate:        2021-12-21T21:40:34Z
GoVersion:        go1.17.5
Compiler:         gc
Platform:         linux/amd64
Linkmode:         dynamic
BuildTags:        apparmor, exclude_graphdriver_devicemapper, containers_image_ostree_stub, seccomp
SeccompEnabled:   true
AppArmorEnabled:  true

Additional environment details (AWS, VirtualBox, physical, etc.):

Ubuntu 2004
kernel 5.4.0
cgroupv2 in use by systemd
systemd 245

I’m unsure if it’s related, but containers-common was also upgraded at the same time from 1-21 to 1-22.

About this issue

Original URL
State: closed
Created 2 years ago
Comments: 23 (12 by maintainers)

Commits related to this issue

Downgrade cri-o to 1.22 - See https://github.com/cri-o/cri-o/issues/5527 — committed to memes/lab-config by memes 2 years ago

Most upvoted comments

oopsies, this is definitely just a bug in cri-o: https://github.com/cri-o/cri-o/pull/5539 (we used to do this but accidentally dropped it when swap support was added)

haircommander on Jan 11, 2022

fix is merged in main branch, I’m backporting to 1.23 and intend on cutting a 1.23.1 soon

haircommander on Feb 9, 2022