kubernetes: Huge amount of logs: kubelet_getters.go:300] "Path does not exist" path="/var/lib/kubelet/pods/86cba354-348c-4826-9837-df9c616a8862/volumes"
What happened?
After we updated two of our clusters from 1.22 to v1.24.4 we noticed a huge amount of logs (tens of entries per second) in kubelet:
kubelet_getters.go:300] "Path does not exist" path="/var/lib/kubelet/pods/86cba354-348c-4826-9837-df9c616a8862/volumes"
Neither does the volumes subdir exist, nor the corresponding subdir for the pod ("/var/lib/kubelet/pods/86cba354-348c-4826-9837-df9c616a8862 in this case).
Kubelet currently produces multiple gigabytes of logs each day since the upgrade.
What did you expect to happen?
No log spam (at least not with verbosity=0).
How can we reproduce it (as minimally and precisely as possible)?
Not really sure. We changed nothing in the setup. We just upgraded to v1.24.4. The cluster does not have any volume providers running. Just a ‘vanilla’ kubernetes (ubuntu 20.04, kubeadm, cillium, cri-o).
Anything else we need to know?
No response
Kubernetes version
$ kubectl version
Client Version: version.Info{Major:"1", Minor:"24", GitVersion:"v1.24.4", GitCommit:"95ee5ab382d64cfe6c28967f36b53970b8374491", GitTreeState:"clean", BuildDate:"2022-08-17T18:54:23Z", GoVersion:"go1.18.5", Compiler:"gc", Platform:"linux/amd64"}
Kustomize Version: v4.5.4
Server Version: version.Info{Major:"1", Minor:"24", GitVersion:"v1.24.4", GitCommit:"95ee5ab382d64cfe6c28967f36b53970b8374491", GitTreeState:"clean", BuildDate:"2022-08-17T18:47:37Z", GoVersion:"go1.18.5", Compiler:"gc", Platform:"linux/amd64"}
Cloud provider
OS version
$ cat /etc/os-release
NAME="Ubuntu"
VERSION="20.04.4 LTS (Focal Fossa)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 20.04.4 LTS"
VERSION_ID="20.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=focal
UBUNTU_CODENAME=focal
$ uname -a
Linux k8s-master-p-1 5.15.0-46-generic #49~20.04.1-Ubuntu SMP Thu Aug 4 19:15:44 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Install tools
Container runtime (CRI) and version (if applicable)
Version: 1.24.2 GitCommit: bd548b04f78a30e1e9d7c17162714edd50edd6ca GitTreeState: clean BuildDate: 2022-08-09T16:42:39Z GoVersion: go1.18.2 Compiler: gc Platform: linux/amd64 Linkmode: dynamic BuildTags: apparmor, exclude_graphdriver_devicemapper, containers_image_ostree_stub, seccomp SeccompEnabled: true AppArmorEnabled: true
Related plugins (CNI, CSI, …) and versions (if applicable)
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 37 (17 by maintainers)
I stumbled upon this issue, as we have the same problem on all our Rancher RKE (k8s v1.24 and v1.25) clusters where the
kubeletcontainer is spamming"Path does not exist"messages - this resulted in gigabytes of logs from one of our clusters which is running dozens of kubernetes-jobs on ~3 nodes.For a quick and ugly fix I created the script below that periodically cleans the corresponding
/sys/fs/cgroup/misc-folders via a simple cronjob (* * * * * root /opt/scripts/cgroup_pod_garbage_collector.sh -i 50s >/dev/null 2>&1) - not sure if it’s super smart what I’m doing and the script can surely be done in a much better way, but we needed a quick fix, thekubeletcontainers on the nodes caused extremely high CPU-usage writing those logs and our ELK also complained about space-usage 😉I haven’t seen any issues yet on the clusters where this script is active and docker logs for the
kubeletcontainer also look much much cleaner. Log-space in our ELK was reduced from for example ~15G per hour from one of the nodes to ~3G per hour. CPU usage also dropped significantly from 20% to 4% after executing the script manually on one of the nodes.Looks like it’s a problem with runc <= 1.1.3
After upgrading runc everyhting works again.
edit: nope, loglines are still present with k8s v1.24.4 and runc 1.1.4. But the memory leak and cgroup error messages seem to be gone.
edit: typo in runc versions. 1.1.3 and 1.1.4, not 1.3/1.4.
Hi,
We are observing the same issue on Bottlerocket official AMI images. Suddenly a node during the night gets flooded with those errors and fills up the disk.
Does anyone run Bottlerocket OS ?
In my case the same ids kept showing up for hours. This was due to the misc cgroup not being properly cleaned up. And because not everything was cleaned up the pods kept getting picked up by the housekeeping loop.
On the first run the directory is removed. all consecutive runs this log is produced.
So while this log isn’t the problem it is (the only) indication that the kubelet fails to completely delete pods from the system.