kubernetes: Restarting containerd (and kubelet) does not impact "runtime" startTime in summary API

What happened: Containerd (and optionally the kubelet) was restarted.

What you expected to happen: runtime’s startTime should be the restart time.

$ curl -s localhost:10255/stats/summary | grep -i runtime -A 2
    "name": "runtime",
    "startTime": "2021-01-22T01:29:09Z",
    "cpu": {

How to reproduce it (as minimally and precisely as possible):

# Note the current startTime
$ curl -s localhost:10255/stats/summary | grep -i runtime -A 2

# Restart containerd
$ sudo systemctl restart containerd

# Note the current startTime
$ curl -s localhost:10255/stats/summary | grep -i runtime -A 2

I would expect the time to change. The reason why it does not is because cAdvisor is looking in the cgroup fs and finding the oldest modified time of cgroup.clone_children in all subsystems. In other words, for each of the supported subsystems, it’s looking at the oldest mtime of /sys/fs/cgroup/$subsystem/system.slice/containerd.service/cgroup.clone_children. When containerd is restarted, that file is unchanged.

https://github.com/google/cadvisor/blob/730e7df6dbddf323b4cdd54cc91156cfdd9cf127/container/common/helpers.go#L74-L91

In this code, for the runtime:

  • cgroupPaths is a map (eg { cpu: /sys/fs/cgroup/cpu/system.slice/containerd.service/, memory: /sys/fs/cgroup/memory/system.slice/containerd.service/ })
  • spec.CreationTime is set as the startTime in summary API

Anything else we need to know?:

Environment:

  • Kubernetes version (use kubectl version): At least 1.16+ just from reading it and experiments.
  • Cloud provider or hardware configuration: GKE
  • OS (e.g: cat /etc/os-release): Container-Optimized OS however this seems OS-agnostic
  • Kernel (e.g. uname -a): 5.4.49+ however kernel shouldn’t matter
  • Install tools:
  • Network plugin and version (if this is a network-related bug):
  • Others:

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 15 (12 by maintainers)

Most upvoted comments

Ah! My bad.

This was fixed in cadvisor v0.39.0 which landed in https://github.com/kubernetes/kubernetes/pull/99875 so I think this can actually be closed.

/close

The runtime metrics are supposed to be metrics for the container runtime itself. This is related to the ability to monitor container runtime health, not saying that all containers should have been restarted.

/triage accepted