kubernetes: Missing labels in kubelet's cAdvisor metrics
What happened:
I’m using cri-containerd as container runtime for Kubernetes. Due to architecture needs I have containerd socket at custom location of /u01/data/run/containerd/containerd.sock
instead of standard path /run/containerd/containerd.sock
My kubelet is started with parameters --container-runtime=remote --container-runtime-endpoint=unix:///u01/data/run/containerd/containerd.sock
and works well with one exception.
With such configuration, cAdvisor in kubelet can’t connect to containerd and does not populate container releated fields ie. in such metrics as container_cpu_usage_seconds_total
or container_memory_working_set_bytes
there is no label “container” that provides container name.
With container label missing, there is no data for container level resources consumption (kubectl top pod --containers
) and it brakes queries used in Prometheus-Adapter / Metrics Server since they relay on that label:
containerQuery: sum(irate(container_cpu_usage_seconds_total{<<.LabelMatchers>>,container!="POD",container!="",pod!=""}[5m])) by (<<.GroupBy>>)
containerQuery: sum(container_memory_working_set_bytes{<<.LabelMatchers>>,container!="POD",container!="",pod!=""}) by `(<<.GroupBy>>)
What you expected to happen:
Kubelet should pass socket provided in parameter --container-runtime-endpoint
to its cAdvisor, where it can be used to connect to container runtime (containerd in my case.
How to reproduce it (as minimally and precisely as possible):
Place your container runtime socket in other location than cAdvisor expects it, and pass it to kubelet with --container-runtime-endpoint
parameter. So it can not be at /run/containerd/containerd.sock
when using cri-containerd or unix:///var/run/docker.sock
when using Docker.
Anything else we need to know?:
I have two workarouds for now, but both are somewhat inacceptable:
-
Create directory
/run/containerd
with symbolic link pointing to real socket file. In my case/run/containerd/containerd.sock
is symbolic link to/u01/data/run/containerd/containerd.sock
File/run/containerd/containerd.sock
is default location for containerd socket in cAdvisor implementation https://github.com/google/cadvisor/blob/master/container/containerd/factory.go#L34 so it can find and use it. But I don’t use the custom location for this socket to have it linked to the default one. -
Use deprecated parameter to kubelet:
--containerd=/u01/data/run/containerd/containerd.sock
That works because it points cadvisor to real containerd socket, and that value is passed to paramater mentioned above. Kubelet started with that parameter logs warning:
kubelet[6694]: Flag --containerd has been deprecated, This is a cadvisor flag that was mistakenly registered with the Kubelet. Due to legacy concerns, it will follow the standard CLI deprecation timeline before being removed.
That is because that parameter is marked as DEPRECATED and mistakenly registered with the kubelet in https://github.com/kubernetes/kubernetes/blob/master/cmd/kubelet/app/options/globalflags_linux.go#L55 and will be deleted in future releases of Kubernetes. So this is a very short term solution.
Environment:
- Kubernetes version (use
kubectl version
):
Client Version: version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.3", GitCommit:"06ad960bfd03b39c8310aaf92d1e7c12ce618213", GitTreeState:"clean", BuildDate:"2020-02-11T18:14:22Z", GoVersion:"go1.13.6", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.3", GitCommit:"06ad960bfd03b39c8310aaf92d1e7c12ce618213", GitTreeState:"clean", BuildDate:"2020-02-11T18:07:13Z", GoVersion:"go1.13.6", Compiler:"gc", Platform:"linux/amd64"}
- Cloud provider or hardware configuration: KVM-based Virtual Machines
- OS (e.g:
cat /etc/os-release
):CentOS Linux release 7.7.1908 (Core)
- Kernel (e.g.
uname -a
):3.10.0-1062.12.1.el7.x86_64
- Install tools: kubeadm
- Network plugin and version (if this is a network-related bug): cni-v0.8.5
- Others: containerd-v1.3.3 runc-v1.0.0-rc10
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Reactions: 4
- Comments: 36 (15 by maintainers)
I don’t think we will be able to backport changes to 1.19-1.22. As I mentioned earlier in this thread, there is a circular dependency with CRI that we hope to resolve over time, but it is not resolved in earlier versions.
The recommendation is to use cAdvisor standalone to collect these metrics for now.
/close
I have same issue I use containerd as my container runtime and when I want to access cadvisor metrics using kubelet some of metrics missing pod or namespace lable and all of them misses conatiner name
Any help?
kubernetes versio:
containerd version
just an update, both issues are fixed on cAdvisor side and there is work happening now to vendor fixes to k/k in 1.23.
About EKS: the symlink way did not work for me. But using deprecated flag in kubelet args:
solved the problem
@SergeyKanzhelev is this fixed in v1.23? I know you have mentioned that it is not supported in v1.22 and v1.21.
cadvisor 0.43.0 fix this issue ,but cadvisor metrics label is not equal to cadvisor in kubelet.
please backport to 1.19-1.22, thank you.
thank you for a ping. We failed to vendor fixes to kubernetes because of circular dependency to CRI. Standalone cAdvisor needs to be used in these cases. There is some information: https://cloud.google.com/kubernetes-engine/docs/concepts/using-containerd#metrics-missing that needs to be ported to https://kubernetes.io/docs/tasks/administer-cluster/migrating-from-dockershim/. Filed docs task: https://github.com/kubernetes/website/issues/30681
I see different groups of metrics mentioned in this issue: container_cpu - fixed with ami here container_memory - fixed with ami here container_fs - still broken? container_network - still borken?
Does the workaround and fixes apply container_fs and container_network metrics too? Or is there a separate issue tracking those?
Hi,
On EKS to don’t generate a new issues with anothers components that use the
/run/dockershim.sock
and not use deprecated parameters (--containerd
) on kubelet i created a symlink, basically:1.Create a symlink (ln -sf /run/dockershim.sock /run/containerd/containerd.sock) before the kubelet daemon starts.
I’m using terraform-eks-module, so i created a pre_userdata with this content:
It will add to LaunchTemplate Userdata to create a containerd directory (because it not exist) and create a symlink before kubelet start.
I hope that help
Regards
I confirm the same issue. After migrating to containerd CRI (I use AWS EKS 1.20/1.21) a part of metrics disappeared.
For example, on the screenshot, there are no any
container
label and only onpod
one