cri-o: Pods with memory requests/limits set cannot start on 1.23.1 + ubuntu 2004 + cri-o 1.23.0

Description

This cluster was originally created with k8s 1.22.2 on ubuntu 2004 vms using kubeadm with no special config. When upgrading to 1.23.1, pods with resource requests and/or limits set fail to start with the following error:

Error: container create failed: time="2022-01-03T10:22:18Z" level=error msg="container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: process_linux.go:508: setting cgroup config for procHooks process caused: open /sys/fs/cgroup/memory/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod2f5254d9_5b91_4987_8ea9_ddf323e3623b.slice/crio-35fa56fb8d2ad4995c85027176ef30ffcddc67f6ec96a8ea34ba4318670b7d99.scope/memory.memsw.limit_in_bytes: no such file or directory"

cri-o logs show the following:

Jan 03 10:41:08 worker0-k8s-mgmt crio[731]: time="2022-01-03 10:41:08.927188409Z" level=info msg="Checking image status: k8s.gcr.io/coredns/coredns:v1.8.6" id=278e27cc-031c-44f5-8773-c5c65133ead0 name=/runtime.v1.ImageService/ImageStatus
Jan 03 10:41:08 worker0-k8s-mgmt crio[731]: time="2022-01-03 10:41:08.928357846Z" level=info msg="Image status: &ImageStatusResponse{Image:&Image{Id:a4ca41631cc7ac19ce1be3ebf0314ac5f47af7c711f17066006db82ee3b75b03,RepoTags:[k8s.gcr.io/coredns/coredns:v1.8.6],RepoDigests:[k8s.gcr.io/coredns/coredns@sha256:5b6ec0d6de9baaf3e92d0f66cd96a25b9edbce8716f5f15dcd1a616b3abd590e k8s.gcr.io/coredns/coredns@sha256:8916c89e1538ea3941b58847e448a2c6d940c01b8e716b20423d2d8b189d3972],Size_:46959895,Uid:nil,Username:,Spec:nil,},Info:map[string]string{},}" id=278e27cc-031c-44f5-8773-c5c65133ead0 name=/runtime.v1.ImageService/ImageStatus
Jan 03 10:41:08 worker0-k8s-mgmt crio[731]: time="2022-01-03 10:41:08.929932099Z" level=info msg="Checking image status: k8s.gcr.io/coredns/coredns:v1.8.6" id=b995fb6f-8820-4d3e-995a-87498db9e66a name=/runtime.v1.ImageService/ImageStatus
Jan 03 10:41:08 worker0-k8s-mgmt crio[731]: time="2022-01-03 10:41:08.930778697Z" level=info msg="Image status: &ImageStatusResponse{Image:&Image{Id:a4ca41631cc7ac19ce1be3ebf0314ac5f47af7c711f17066006db82ee3b75b03,RepoTags:[k8s.gcr.io/coredns/coredns:v1.8.6],RepoDigests:[k8s.gcr.io/coredns/coredns@sha256:5b6ec0d6de9baaf3e92d0f66cd96a25b9edbce8716f5f15dcd1a616b3abd590e k8s.gcr.io/coredns/coredns@sha256:8916c89e1538ea3941b58847e448a2c6d940c01b8e716b20423d2d8b189d3972],Size_:46959895,Uid:nil,Username:,Spec:nil,},Info:map[string]string{},}" id=b995fb6f-8820-4d3e-995a-87498db9e66a name=/runtime.v1.ImageService/ImageStatus
Jan 03 10:41:08 worker0-k8s-mgmt crio[731]: time="2022-01-03 10:41:08.931419905Z" level=info msg="Creating container: kube-system/coredns-8554ccb6dd-tzdcj/coredns" id=68474b21-6263-43c1-a002-50998549538d name=/runtime.v1.RuntimeService/CreateContainer
Jan 03 10:41:08 worker0-k8s-mgmt crio[731]: time="2022-01-03 10:41:08.931482393Z" level=warning msg="Allowed annotations are specified for workload [] "
Jan 03 10:41:08 worker0-k8s-mgmt crio[731]: time="2022-01-03 10:41:08.931494869Z" level=warning msg="Allowed annotations are specified for workload []"
Jan 03 10:41:08 worker0-k8s-mgmt crio[731]: time="2022-01-03 10:41:08.948575637Z" level=warning msg="Failed to open /etc/passwd: open /var/lib/containers/storage/overlay/83c9c9ec5684fa2fd1c943d300507751866efb0be5ae15652cc6a97d2be47571/merged/etc/passwd: no such file or directory"
Jan 03 10:41:08 worker0-k8s-mgmt crio[731]: time="2022-01-03 10:41:08.948617867Z" level=warning msg="Failed to open /etc/group: open /var/lib/containers/storage/overlay/83c9c9ec5684fa2fd1c943d300507751866efb0be5ae15652cc6a97d2be47571/merged/etc/group: no such file or directory"
Jan 03 10:41:09 worker0-k8s-mgmt crio[731]: time="2022-01-03 10:41:09.007934302Z" level=error msg="Container creation error: time=\"2022-01-03T10:41:09Z\" level=error msg=\"container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: process_linux.go:508: setting cgroup config for procHooks process caused: open /sys/fs/cgroup/memory/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod2f5254d9_5b91_4987_8ea9_ddf323e3623b.slice/crio-4894546f2ac322dd116a01ae0da1c05cae1b1e079f8552ea5a3f84ef9a3fa816.scope/memory.memsw.limit_in_bytes: no such file or directory\"\n" id=68474b21-6263-43c1-a002-50998549538d name=/runtime.v1.RuntimeService/CreateContainer
Jan 03 10:41:09 worker0-k8s-mgmt crio[731]: time="2022-01-03 10:41:09.015875129Z" level=info msg="createCtr: deleting container ID 4894546f2ac322dd116a01ae0da1c05cae1b1e079f8552ea5a3f84ef9a3fa816 from idIndex" id=68474b21-6263-43c1-a002-50998549538d name=/runtime.v1.RuntimeService/CreateContainer
Jan 03 10:41:09 worker0-k8s-mgmt crio[731]: time="2022-01-03 10:41:09.016122205Z" level=info msg="createCtr: deleting container ID 4894546f2ac322dd116a01ae0da1c05cae1b1e079f8552ea5a3f84ef9a3fa816 from idIndex" id=68474b21-6263-43c1-a002-50998549538d name=/runtime.v1.RuntimeService/CreateContainer
Jan 03 10:41:09 worker0-k8s-mgmt crio[731]: time="2022-01-03 10:41:09.016293899Z" level=info msg="createCtr: deleting container ID 4894546f2ac322dd116a01ae0da1c05cae1b1e079f8552ea5a3f84ef9a3fa816 from idIndex" id=68474b21-6263-43c1-a002-50998549538d name=/runtime.v1.RuntimeService/CreateContainer
Jan 03 10:41:09 worker0-k8s-mgmt crio[731]: time="2022-01-03 10:41:09.038543960Z" level=info msg="createCtr: deleting container ID 4894546f2ac322dd116a01ae0da1c05cae1b1e079f8552ea5a3f84ef9a3fa816 from idIndex" id=68474b21-6263-43c1-a002-50998549538d name=/runtime.v1.RuntimeService/CreateContainer

Steps to reproduce the issue:

  1. use kubeadm to upgrade the cluster
  2. upgrade cri-o from suse/libcontainers repo from 1.22 to 1.23
  3. schedule pods with resource limits onto upgraded node - they fail to start
  4. downgrade cri-o back to 1.22
  5. pods start

Describe the results you received:

Pods with resource limits/requests set fail to start.

Describe the results you expected:

Pods should start.

Additional information you deem important (e.g. issue happens only occasionally):

I note that when running crio manually, I see the following logs:

WARN[2022-01-03 10:59:48.273861323Z] node configuration validation for memoryswap cgroup failed: node not configured with memory swap
INFO[2022-01-03 10:59:48.273886219Z] Node configuration value for memoryswap cgroup is false
INFO[2022-01-03 10:59:48.273900456Z] Node configuration value for cgroup v2 is false

This feels somewhat relevant because the reason the pod cannot start is the lack of memory.memsw.limit_in_bytes - as I understand it this is related to swap (which is disabled). I’m also puzzled by the log about cgroupv2 configuration being false - crio is configured to use systemd as the cgroup manager and systemd is using cgroupv2.

Downgrading cri-o to 1.22 allows pods to start as normal.

Output of crio --version:

crio version 1.23.0
Version:          1.23.0
GitCommit:        9b7f5ae815c22a1d754abfbc2890d8d4c10e240d
GitTreeState:     clean
BuildDate:        2021-12-21T21:40:34Z
GoVersion:        go1.17.5
Compiler:         gc
Platform:         linux/amd64
Linkmode:         dynamic
BuildTags:        apparmor, exclude_graphdriver_devicemapper, containers_image_ostree_stub, seccomp
SeccompEnabled:   true
AppArmorEnabled:  true

Additional environment details (AWS, VirtualBox, physical, etc.):

  • Ubuntu 2004
  • kernel 5.4.0
  • cgroupv2 in use by systemd
  • systemd 245

I’m unsure if it’s related, but containers-common was also upgraded at the same time from 1-21 to 1-22.

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 23 (12 by maintainers)

Commits related to this issue

Most upvoted comments

oopsies, this is definitely just a bug in cri-o: https://github.com/cri-o/cri-o/pull/5539 (we used to do this but accidentally dropped it when swap support was added)

fix is merged in main branch, I’m backporting to 1.23 and intend on cutting a 1.23.1 soon