kubernetes: Restarting containerd (and kubelet) does not impact "runtime" startTime in summary API
What happened: Containerd (and optionally the kubelet) was restarted.
What you expected to happen: runtime
’s startTime
should be the restart time.
$ curl -s localhost:10255/stats/summary | grep -i runtime -A 2
"name": "runtime",
"startTime": "2021-01-22T01:29:09Z",
"cpu": {
How to reproduce it (as minimally and precisely as possible):
# Note the current startTime
$ curl -s localhost:10255/stats/summary | grep -i runtime -A 2
# Restart containerd
$ sudo systemctl restart containerd
# Note the current startTime
$ curl -s localhost:10255/stats/summary | grep -i runtime -A 2
I would expect the time to change. The reason why it does not is because cAdvisor is looking in the cgroup fs and finding the oldest modified time of cgroup.clone_children
in all subsystems. In other words, for each of the supported subsystems, it’s looking at the oldest mtime of /sys/fs/cgroup/$subsystem/system.slice/containerd.service/cgroup.clone_children
. When containerd is restarted, that file is unchanged.
In this code, for the runtime
:
cgroupPaths
is a map (eg{ cpu: /sys/fs/cgroup/cpu/system.slice/containerd.service/, memory: /sys/fs/cgroup/memory/system.slice/containerd.service/ }
)spec.CreationTime
is set as thestartTime
in summary API
Anything else we need to know?:
Environment:
- Kubernetes version (use
kubectl version
): At least 1.16+ just from reading it and experiments. - Cloud provider or hardware configuration: GKE
- OS (e.g:
cat /etc/os-release
):Container-Optimized OS
however this seems OS-agnostic - Kernel (e.g.
uname -a
):5.4.49+
however kernel shouldn’t matter - Install tools:
- Network plugin and version (if this is a network-related bug):
- Others:
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 15 (12 by maintainers)
Ah! My bad.
This was fixed in cadvisor v0.39.0 which landed in https://github.com/kubernetes/kubernetes/pull/99875 so I think this can actually be closed.
/close
The
runtime
metrics are supposed to be metrics for the container runtime itself. This is related to the ability to monitor container runtime health, not saying that all containers should have been restarted./triage accepted