kubernetes: CRI getFsInfo logs errors for valid filesystems mounted after kubelet start

What happened:

New overlay filesystems created for Pods after kubelet start trigger repeated error messages. For example:

cri_stats_provider.go:375] Failed to get the info of the filesystem with mountpoint "/var/lib/containers/storage/overlay/2243263dc8ba6c55d8867d3e47bcaaa145d15f0ace19420824d639d86defe2e3/merged": failed to get device for dir "/var/lib/containers/storage/overlay/2243263dc8ba6c55d8867d3e47bcaaa145d15f0ace19420824d639d86defe2e3/merged": could not find device with major: 0, minor: 83 in cached partitions map.

Error messages are logged every 10 seconds for every filesystem created after kubelet start. For large kubelet nodes, this issue creates an excessive amount of error logs.

What you expected to happen:

New overlay filesystems created for Pods after kubelet start should not log error messages.

How to reproduce it (as minimally and precisely as possible):

Use cri_stats_provider with cri-o and configure overlay storage. UsingLegacyCadvisorStats() needs to return false per https://github.com/cri-o/cri-o/pull/3054 for cri_stats_provider to be used. Start with an empty /var/lib/containers/storage and schedule Pods on the node.

Anything else we need to know?:

As part of https://github.com/kubernetes/kubernetes/pull/59475, cadvisor.GetFsInfoByFsUUID() was changed to cadvisor.GetDirFsInfo() in getFsInfo(); however, the test for cadvisorfs.ErrNoSuchDevice was not modified, changing how getFsInfo() handles a cache miss from cadvisor.

cadvisor caches UUIDs/partitions/mounts on kubelet start and the cache is never refreshed. Given that it’s expected that getFsInfo() will fail to retrieve data from cadvisor (see comments in ImageFsStats() and https://github.com/kubernetes/heapster/issues/1793), this condition should not be logged as an error. Alternatively, cadvisor fs could be modified to refresh on cache miss enabling getFsInfo() to return data on these filesystems.

Environment:

Kubernetes version (use kubectl version): 1.18.6
CRI: CRI-O

About this issue

Original URL
State: closed
Created 4 years ago
Comments: 45 (19 by maintainers)

Most upvoted comments

Hi folks, I still have this issue in most recent Kubernetes 1.24.6. cAdvisor 0.44.0 that fix google/cadvisor/pull/3018 went into that version, is now a long time ago already in 1.24.0. https://github.com/kubernetes/kubernetes/pull/109029 Dep bump to runc 1.1.0, cadvisor 0.44.0 https://github.com/kubernetes/kubernetes/pull/109675 Automated cherry pick of #109658: Bump cAdvisor to v0.44.1 28.4.2022

So is there possibly any chance to progress with a review of this PR please? https://github.com/kubernetes/kubernetes/pull/100448

@haircommander @ml-

111andre111 on Sep 22, 2022

Right now I am thinking of either handling the cache miss better,

diff --git a/pkg/kubelet/stats/cri_stats_provider.go b/pkg/kubelet/stats/cri_stats_provider.go
index 32802f9b823..9f2754f2310 100644
--- a/pkg/kubelet/stats/cri_stats_provider.go
+++ b/pkg/kubelet/stats/cri_stats_provider.go
@@ -370,7 +370,7 @@ func (p *criStatsProvider) getFsInfo(fsID *runtimeapi.FilesystemIdentifier) *cad
        fsInfo, err := p.cadvisor.GetDirFsInfo(mountpoint)
        if err != nil {
                msg := fmt.Sprintf("Failed to get the info of the filesystem with mountpoint %q: %v.", mountpoint, err)
-               if err == cadvisorfs.ErrNoSuchDevice {
+               if err == cadvisorfs.ErrNoSuchDevice || strings.Contains(err.Error(), "in cached partitions map") {
                        klog.V(2).Info(msg)
                } else {
                        klog.Error(msg)

or returning the DeviceInfo even if the fs type is not btrfs in cadvisor,

diff --git a/vendor/github.com/google/cadvisor/fs/fs.go b/vendor/github.com/google/cadvisor/fs/fs.go
index cb45c33c933..906f4a6e1c1 100644
--- a/vendor/github.com/google/cadvisor/fs/fs.go
+++ b/vendor/github.com/google/cadvisor/fs/fs.go
@@ -559,6 +559,10 @@ func (i *RealFsInfo) GetDirFsDevice(dir string) (*DeviceInfo, error) {
                mount, found = i.mounts[dir]
        }
 
+       if found && mount.FsType != "btrfs" && mount.Major != 0 && strings.HasPrefix(mount.Source, "/dev/") {
+               return &DeviceInfo{mount.Source, uint(mount.Major), mount.uint(mount.Minor)}, nil
+       }
+
        if found && mount.FsType == "btrfs" && mount.Major == 0 && strings.HasPrefix(mount.Source, "/dev/") {
                major, minor, err := getBtrfsMajorMinorIds(&mount)
                if err != nil {

or both.

harche on Nov 17, 2020