kubernetes: Volume unmount failures saying "could not get consistent content of /proc/mounts"

We saw these kinds of errors in Kubelet when trying to cleanup bind-mounts for the pod’s mounted subPaths:

Jun 25 06:57:48 ip-172-18-228-96.ec2.internal kubelet[4955]: E0625 06:57:48.750465 4955 nestedpendingoperations.go:301] Operation for "{volumeName:kubernetes.io/configmap/d4e712ef-fc5b-476c-aae4-3b37eff4a947-php-config podName:d4e712ef-fc5b-476c-aae4-3b37eff4a947 nodeName:}" failed. No retries permitted until 2021-06-25 06:57:49.25043195 +0000 UTC m=+138568.074599660 (durationBeforeRetry 500ms). *Error*: "error cleaning subPath mounts for volume \"php-config\" (UniqueName: \"kubernetes.io/configmap/d4e712ef-fc5b-476c-aae4-3b37eff4a947-php-config\") pod \"d4e712ef-fc5b-476c-aae4-3b37eff4a947\" (UID: \"d4e712ef-fc5b-476c-aae4-3b37eff4a947\") : error processing /var/lib/kubelet/pods/d4e712ef-fc5b-476c-aae4-3b37eff4a947/volume-subpaths/php-config/main: error cleaning subpath mount /var/lib/kubelet/pods/d4e712ef-fc5b-476c-aae4-3b37eff4a947/volume-subpaths/php-config/main/15: could not get consistent content of /proc/mounts after 3 attempts"

They happen when the Kubelet attempts to get a consistent content of the /proc/mounts file across consecutive reads, but fails despite 3 attempts. Given heavy churn of volumes/mounts on that node, the /proc/mounts file is changing very frequently, increasing the likelihood of the above issue. The exact call trace seems to be the following:

Kubelet calling UnmountVolume
which calls GenerateUnmountVolumeFunc which calls subpather.CleanSubPaths
which calls doCleanSubPaths which calls doCleanSubPath
which calls mount.CleanupMountPoint (with extensiveMountPointCheck set to true) which calls doCleanupMountPoint
which calls removePathIfNotMountPoint which calls IsNotMountPoint (since extensiveMountPointCheck == true)
which calls mounter.List that calls ListProcMounts
where we try to do the ConsistentRead of the /proc/mounts file

The reason for doing this consistency check is unclear to me. After some tracking, it seems to date back to this old pull request https://github.com/kubernetes/kubernetes/pull/3180 by @thockin. Tim - could you explain the reason you had on your mind for it? 😃

The effect of this issue is it seems to slow down the pod cleanup latency by up to 1s (due to the retries) on busy nodes. We saw this in the unmount codepath, but could potentially be happening at other places too where we try listing the mounts.

W.r.t the fix, do any of these make sense?

increase the no. of attempts for the consistent read
get rid of the consistent read check if unneeded
ignore these errors

/sig storage /sig scalability

About this issue

Original URL
State: closed
Created 3 years ago
Reactions: 2
Comments: 19 (14 by maintainers)

Most upvoted comments

This should be fixed by https://github.com/kubernetes/kubernetes/pull/109217

jsafrane on Sep 26, 2022

Hi @jsafrane

QQ: Why do need we need an extensiveMountPointCheck for subPath volumess? https://github.com/kubernetes/kubernetes/blob/master/pkg/volume/util/subpath/subpath_linux.go#L297

FYI: I am investigating this at https://github.com/kubernetes/kubernetes/issues/104976

manugupt1 on Sep 14, 2021