kubernetes: Volume unmount failures saying "could not get consistent content of /proc/mounts"
We saw these kinds of errors in Kubelet when trying to cleanup bind-mounts for the pod’s mounted subPaths:
Jun 25 06:57:48 ip-172-18-228-96.ec2.internal kubelet[4955]: E0625 06:57:48.750465 4955 nestedpendingoperations.go:301] Operation for "{volumeName:kubernetes.io/configmap/d4e712ef-fc5b-476c-aae4-3b37eff4a947-php-config podName:d4e712ef-fc5b-476c-aae4-3b37eff4a947 nodeName:}" failed. No retries permitted until 2021-06-25 06:57:49.25043195 +0000 UTC m=+138568.074599660 (durationBeforeRetry 500ms). *Error*: "error cleaning subPath mounts for volume \"php-config\" (UniqueName: \"kubernetes.io/configmap/d4e712ef-fc5b-476c-aae4-3b37eff4a947-php-config\") pod \"d4e712ef-fc5b-476c-aae4-3b37eff4a947\" (UID: \"d4e712ef-fc5b-476c-aae4-3b37eff4a947\") : error processing /var/lib/kubelet/pods/d4e712ef-fc5b-476c-aae4-3b37eff4a947/volume-subpaths/php-config/main: error cleaning subpath mount /var/lib/kubelet/pods/d4e712ef-fc5b-476c-aae4-3b37eff4a947/volume-subpaths/php-config/main/15: could not get consistent content of /proc/mounts after 3 attempts"
They happen when the Kubelet attempts to get a consistent content of the /proc/mounts file across consecutive reads, but fails despite 3 attempts. Given heavy churn of volumes/mounts on that node, the /proc/mounts file is changing very frequently, increasing the likelihood of the above issue. The exact call trace seems to be the following:
- Kubelet calling UnmountVolume
- which calls GenerateUnmountVolumeFunc which calls subpather.CleanSubPaths
- which calls doCleanSubPaths which calls doCleanSubPath
- which calls mount.CleanupMountPoint (with extensiveMountPointCheck set to
true) which calls doCleanupMountPoint - which calls removePathIfNotMountPoint which calls IsNotMountPoint (since extensiveMountPointCheck == true)
- which calls mounter.List that calls ListProcMounts
- where we try to do the ConsistentRead of the
/proc/mountsfile
The reason for doing this consistency check is unclear to me. After some tracking, it seems to date back to this old pull request https://github.com/kubernetes/kubernetes/pull/3180 by @thockin. Tim - could you explain the reason you had on your mind for it? 😃
The effect of this issue is it seems to slow down the pod cleanup latency by up to 1s (due to the retries) on busy nodes. We saw this in the unmount codepath, but could potentially be happening at other places too where we try listing the mounts.
W.r.t the fix, do any of these make sense?
- increase the no. of attempts for the consistent read
- get rid of the consistent read check if unneeded
- ignore these errors
/sig storage /sig scalability
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Reactions: 2
- Comments: 19 (14 by maintainers)
This should be fixed by https://github.com/kubernetes/kubernetes/pull/109217
Hi @jsafrane
QQ: Why do need we need an
extensiveMountPointCheckfor subPath volumess? https://github.com/kubernetes/kubernetes/blob/master/pkg/volume/util/subpath/subpath_linux.go#L297FYI: I am investigating this at https://github.com/kubernetes/kubernetes/issues/104976