kubernetes: Missing container metrics in kubelet (cAdvisor) in v1.5.1

Is this a request for help? (If yes, you should use our troubleshooting guide and community support channels, see http://kubernetes.io/docs/troubleshooting/.): No, this looks like a regression in v1.5.1

What keywords did you search in Kubernetes issues before filing this one? (If you have found any duplicates, you should instead reply there.): kubelet, metrics, cAdvisor


Is this a BUG REPORT or FEATURE REQUEST? (choose one): BUG REPORT

Kubernetes version (use kubectl version):

Client Version: version.Info{Major:"1", Minor:"5", GitVersion:"v1.5.1", GitCommit:"82450d03cb057bab0950214ef122b67c83fb11df", GitTreeState:"clean", BuildDate:"2016-12-14T00:57:05Z", GoVersion:"go1.7.4", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"5", GitVersion:"v1.5.1", GitCommit:"82450d03cb057bab0950214ef122b67c83fb11df", GitTreeState:"clean", BuildDate:"2016-12-14T00:52:01Z", GoVersion:"go1.7.4", Compiler:"gc", Platform:"linux/amd64"}

Environment:

  • Cloud provider or hardware configuration: aws
  • OS (e.g. from /etc/os-release): os_image="Container Linux by CoreOS 1235.5.0 (Ladybug)
  • Kernel (e.g. uname -a): Linux ip-10-72-161-5 4.7.3-coreos-r2 #1 SMP Sun Jan 8 00:32:25 UTC 2017 x86_64 Intel® Xeon® CPU E5-2680 v2 @ 2.80GHz GenuineIntel GNU/Linux
  • Install tools:
  • Others:

What happened: after upgrading to v1.5.1 cAdvisor does not show any subcontainers, resulting in missing ALL container system metrics. For example:

core@ip-10-72-161-5 ~ $ curl localhost:10255/metrics | grep container_cpu_user_seconds_total
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 44371  100 44371    0     0  5264# HELP container_cpu_user_seconds_total Cumulative user cpu time consumed in seconds.
k     # TYPE container_cpu_user_seconds_total counter
 0 container_cpu_user_seconds_total{id="/"} 0
--:--:-- --:--:-- --:--:-- 6190k

What you expected to happen: cAdviser returns sub-containers and container metrics like so:

core@ip-10-72-6-143 ~ $ curl localhost:10255/metrics | grep container_cpu_user_seconds_total | more
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0# HELP container_cpu_user_seconds_total Cumulative user cpu time consumed in seconds.
# TYPE container_cpu_user_seconds_total counter
container_cpu_user_seconds_total{id="/"} 2.74665981e+06
container_cpu_user_seconds_total{id="/docker"} 2.33384234e+06
container_cpu_user_seconds_total{id="/init.scope"} 6.26

How to reproduce it (as minimally and precisely as possible): upgrade kubelet to v1.5.1 and check metrics: via metrics endpoint: curl localhost:10255/metrics or via cAdviser UI: http://10.72.20.134:4194/containers/

Anything else do we need to know:

docker version:

docker version
Client:
 Version:      1.12.3
 API version:  1.24
 Go version:   go1.6.3
 Git commit:   34a2ead
 Built:        
 OS/Arch:      linux/amd64

Server:
 Version:      1.12.3
 API version:  1.24
 Go version:   go1.6.3
 Git commit:   34a2ead
 Built:        
 OS/Arch:      linux/amd64

About this issue

  • Original URL
  • State: closed
  • Created 7 years ago
  • Comments: 44 (26 by maintainers)

Most upvoted comments

@dashpole we should cherrypick the cAdvisor fix into the release-v0.24 branch for cherrypicking to k8s 1.4/1.5 branches.

Actually nevermind… apparently the behavior of cadvisor has changed in 1.7. I was able to get Prometheus to grab pod metrics by following the setup mentioned here: https://raw.githubusercontent.com/prometheus/prometheus/master/documentation/examples/prometheus-kubernetes.yml

cherrypick to 1.5: #43113

@dashpole any chance we can get a vendor update for the cadvisor in for the next release?

Ill check it out.

I’m seeing this issue as well and its blocking HPA from doing any scaling in my production clusters. I would like to avoid adding hacks such as periodically restarting the kubelet if possible. Is anyone currently working on a fix for this? If not I don’t mind digging into it a bit, if someone can point me in the right direction for where to start that would be great. Seems like cAdvisor code in kubelet would be a good start?