kepler: Kubelet update is blocked by network IO?

When I try to add benchmark for updateKubeletMetrics I found it seems a blocked network IO?

Here is the tracing stack:

https://github.com/sustainable-computing-io/kepler/blob/3f117b3c8b198806a7f26cc0e46bb18830bb0789/pkg/collector/container_cgroup_collector.go#L45

func GetContainerMetrics() (containerCPU, containerMem map[string]float64, nodeCPU, nodeMem float64, retErr error) {
	return podLister.ListMetrics()
}
// ListMetrics accesses Kubelet's metrics and obtain pods and node metrics
func (k *KubeletPodLister) ListMetrics() (containerCPU, containerMem map[string]float64, nodeCPU, nodeMem float64, retErr error) {
	resp, err := httpGet(metricsURL)
...

hence each time for updateKubeletMetrics, before it goes to loop containers/pods, there is a IO at networking to get and convert kubelet. Usually we thought IO is blocking, as the major time cost, and meaning less for benchmarking based on container. Suggestion as decouple this network IO from code stack.

and btw, why we rebuild constants below for each container?

	cpuMetricName := collector_metric.AvailableKubeletMetrics[0]
	memMetricName := collector_metric.AvailableKubeletMetrics[1]

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Comments: 18 (1 by maintainers)

Most upvoted comments

Thanks @rootfs, it makes sense.

So it will also be related to the discussion in #605 an #558 of using an external lib to collect cgroup metrics. The external lib should support both v1 and v2, then we will not need the kubelet metrics anymore

The following is my mental map. For cgroup v1, kubelet metrics is the way to go.

Models Features Use Case
CounterOnly HW Perf Counters Baremetal servers
CGroupOnly Cgroup stats (CPU, memory, IO) CGroup V2 on VMs
KubeletOnly Kubelet stats from cAdvisor CGroup V1 Kubernetes on VM