kubernetes: Missing labels in kubelet's cAdvisor metrics

What happened:

I’m using cri-containerd as container runtime for Kubernetes. Due to architecture needs I have containerd socket at custom location of /u01/data/run/containerd/containerd.sock instead of standard path /run/containerd/containerd.sock My kubelet is started with parameters --container-runtime=remote --container-runtime-endpoint=unix:///u01/data/run/containerd/containerd.sock and works well with one exception. With such configuration, cAdvisor in kubelet can’t connect to containerd and does not populate container releated fields ie. in such metrics as container_cpu_usage_seconds_total or container_memory_working_set_bytes there is no label “container” that provides container name.

With container label missing, there is no data for container level resources consumption (kubectl top pod --containers) and it brakes queries used in Prometheus-Adapter / Metrics Server since they relay on that label:

containerQuery: sum(irate(container_cpu_usage_seconds_total{<<.LabelMatchers>>,container!="POD",container!="",pod!=""}[5m])) by (<<.GroupBy>>)  
containerQuery: sum(container_memory_working_set_bytes{<<.LabelMatchers>>,container!="POD",container!="",pod!=""}) by `(<<.GroupBy>>)

What you expected to happen:

Kubelet should pass socket provided in parameter --container-runtime-endpoint to its cAdvisor, where it can be used to connect to container runtime (containerd in my case.

How to reproduce it (as minimally and precisely as possible):

Place your container runtime socket in other location than cAdvisor expects it, and pass it to kubelet with --container-runtime-endpoint parameter. So it can not be at /run/containerd/containerd.sock when using cri-containerd or unix:///var/run/docker.sock when using Docker.

Anything else we need to know?:

I have two workarouds for now, but both are somewhat inacceptable:

  1. Create directory /run/containerd with symbolic link pointing to real socket file. In my case /run/containerd/containerd.sock is symbolic link to /u01/data/run/containerd/containerd.sock File /run/containerd/containerd.sock is default location for containerd socket in cAdvisor implementation https://github.com/google/cadvisor/blob/master/container/containerd/factory.go#L34 so it can find and use it. But I don’t use the custom location for this socket to have it linked to the default one.

  2. Use deprecated parameter to kubelet: --containerd=/u01/data/run/containerd/containerd.sock

That works because it points cadvisor to real containerd socket, and that value is passed to paramater mentioned above. Kubelet started with that parameter logs warning: kubelet[6694]: Flag --containerd has been deprecated, This is a cadvisor flag that was mistakenly registered with the Kubelet. Due to legacy concerns, it will follow the standard CLI deprecation timeline before being removed.

That is because that parameter is marked as DEPRECATED and mistakenly registered with the kubelet in https://github.com/kubernetes/kubernetes/blob/master/cmd/kubelet/app/options/globalflags_linux.go#L55 and will be deleted in future releases of Kubernetes. So this is a very short term solution.

Environment:

  • Kubernetes version (use kubectl version):
Client Version: version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.3", GitCommit:"06ad960bfd03b39c8310aaf92d1e7c12ce618213", GitTreeState:"clean", BuildDate:"2020-02-11T18:14:22Z", GoVersion:"go1.13.6", Compiler:"gc", Platform:"linux/amd64"}   
Server Version: version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.3", GitCommit:"06ad960bfd03b39c8310aaf92d1e7c12ce618213", GitTreeState:"clean", BuildDate:"2020-02-11T18:07:13Z", GoVersion:"go1.13.6", Compiler:"gc", Platform:"linux/amd64"}
  • Cloud provider or hardware configuration: KVM-based Virtual Machines
  • OS (e.g: cat /etc/os-release): CentOS Linux release 7.7.1908 (Core)
  • Kernel (e.g. uname -a): 3.10.0-1062.12.1.el7.x86_64
  • Install tools: kubeadm
  • Network plugin and version (if this is a network-related bug): cni-v0.8.5
  • Others: containerd-v1.3.3 runc-v1.0.0-rc10

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Reactions: 4
  • Comments: 36 (15 by maintainers)

Most upvoted comments

I don’t think we will be able to backport changes to 1.19-1.22. As I mentioned earlier in this thread, there is a circular dependency with CRI that we hope to resolve over time, but it is not resolved in earlier versions.

The recommendation is to use cAdvisor standalone to collect these metrics for now.

/close

I have same issue I use containerd as my container runtime and when I want to access cadvisor metrics using kubelet some of metrics missing pod or namespace lable and all of them misses conatiner name

container_fs_io_time_weighted_seconds_total{container="",device="/dev/vda1",id="/",image="",name="",namespace="",pod=""} 0.039979188 1601917426087
container_fs_io_time_weighted_seconds_total{container="",device="tmpfs",id="/",image="",name="",namespace="",pod=""} 0 1601917426087
# HELP container_fs_limit_bytes Number of bytes that can be consumed by the container on this filesystem.
# TYPE container_fs_limit_bytes gauge
container_fs_limit_bytes{container="",device="/dev/vda1",id="/",image="",name="",namespace="",pod=""} 2.0794411008e+11 1601917426087
container_fs_limit_bytes{container="",device="tmpfs",id="/",image="",name="",namespace="",pod=""} 6.85989888e+09 1601917426087
# HELP container_fs_read_seconds_total Cumulative count of seconds spent reading
# TYPE container_fs_read_seconds_total counter
container_fs_read_seconds_total{container="",device="",id="/",image="",name="",namespace="",pod=""} 0 1601917426087
container_fs_read_seconds_total{container="",device="/dev/vda1",id="/",image="",name="",namespace="",pod=""} 0.000454528 1601917426087
container_fs_read_seconds_total{container="",device="tmpfs",id="/",image="",name="",namespace="",pod=""} 0 1601917426087
# HELP container_fs_reads_bytes_total Cumulative count of bytes read
# TYPE container_fs_reads_bytes_total counter
container_fs_reads_bytes_total{container="",device="",id="/",image="",name="",namespace="",pod=""} 0 1601917426087
container_fs_reads_bytes_total{container="",device="/dev/rbd1",id="/kubepods",image="",name="",namespace="",pod=""} 0 1601917426357
container_fs_reads_bytes_total{container="",device="/dev/rbd1",id="/kubepods/burstable",image="",name="",namespace="",pod=""} 0 1601917421357
container_fs_reads_bytes_total{container="",device="/dev/rbd1",id="/kubepods/burstable/pod0d394880-82d4-4567-8eb1-af1d59558afa",image="",name="",namespace="ceph-csi",pod="csi-rbdplugin-fnt6w"} 0 1601917425086

Any help?

kubernetes versio:

Client Version: version.Info{Major:"1", Minor:"16", GitVersion:"v1.16.4", GitCommit:"224be7bdce5a9dd0c2fd0d46b83865648e2fe0ba", GitTreeState:"clean", BuildDate:"2019-12-11T12:47:40Z", GoVersion:"go1.12.12", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"16", GitVersion:"v1.16.14", GitCommit:"d2a081c8e14e21e28fe5bdfa38a817ef9c0bb8e3", GitTreeState:"clean", BuildDate:"2020-08-13T12:24:51Z", GoVersion:"go1.13.15", Compiler:"gc", Platform:"linux/amd64"}

containerd version

containerd github.com/containerd/containerd v1.2.13 7ad184331fa3e55e52b890ea95e65ba581ae3429

just an update, both issues are fixed on cAdvisor side and there is work happening now to vendor fixes to k/k in 1.23.

About EKS: the symlink way did not work for me. But using deprecated flag in kubelet args:

--containerd=/run/dockershim.sock

solved the problem

@SergeyKanzhelev is this fixed in v1.23? I know you have mentioned that it is not supported in v1.22 and v1.21.

cadvisor 0.43.0 fix this issue ,but cadvisor metrics label is not equal to cadvisor in kubelet.

please backport to 1.19-1.22, thank you.

thank you for a ping. We failed to vendor fixes to kubernetes because of circular dependency to CRI. Standalone cAdvisor needs to be used in these cases. There is some information: https://cloud.google.com/kubernetes-engine/docs/concepts/using-containerd#metrics-missing that needs to be ported to https://kubernetes.io/docs/tasks/administer-cluster/migrating-from-dockershim/. Filed docs task: https://github.com/kubernetes/website/issues/30681

I see different groups of metrics mentioned in this issue: container_cpu - fixed with ami here container_memory - fixed with ami here container_fs - still broken? container_network - still borken?

Does the workaround and fixes apply container_fs and container_network metrics too? Or is there a separate issue tracking those?

Hi,

On EKS to don’t generate a new issues with anothers components that use the /run/dockershim.sock and not use deprecated parameters (--containerd) on kubelet i created a symlink, basically:

1.Create a symlink (ln -sf /run/dockershim.sock /run/containerd/containerd.sock) before the kubelet daemon starts.

I’m using terraform-eks-module, so i created a pre_userdata with this content:

echo “Creating cAdvisor containerd symlink” mkdir /run/containerd/ && ln -sf /run/dockershim.sock /run/containerd/containerd.sock

It will add to LaunchTemplate Userdata to create a containerd directory (because it not exist) and create a symlink before kubelet start.

I hope that help

Regards

I confirm the same issue. After migrating to containerd CRI (I use AWS EKS 1.20/1.21) a part of metrics disappeared. eks-containerd For example, on the screenshot, there are no any container label and only on pod one