kepler: Kubelet update is blocked by network IO?

When I try to add benchmark for updateKubeletMetrics I found it seems a blocked network IO?

Here is the tracing stack:

https://github.com/sustainable-computing-io/kepler/blob/3f117b3c8b198806a7f26cc0e46bb18830bb0789/pkg/collector/container_cgroup_collector.go#L45

func GetContainerMetrics() (containerCPU, containerMem map[string]float64, nodeCPU, nodeMem float64, retErr error) {
	return podLister.ListMetrics()
}

// ListMetrics accesses Kubelet's metrics and obtain pods and node metrics
func (k *KubeletPodLister) ListMetrics() (containerCPU, containerMem map[string]float64, nodeCPU, nodeMem float64, retErr error) {
	resp, err := httpGet(metricsURL)
...

hence each time for updateKubeletMetrics, before it goes to loop containers/pods, there is a IO at networking to get and convert kubelet. Usually we thought IO is blocking, as the major time cost, and meaning less for benchmarking based on container. Suggestion as decouple this network IO from code stack.

and btw, why we rebuild constants below for each container?

	cpuMetricName := collector_metric.AvailableKubeletMetrics[0]
	memMetricName := collector_metric.AvailableKubeletMetrics[1]

About this issue

Original URL
State: closed
Created a year ago
Comments: 18 (1 by maintainers)

Most upvoted comments

Thanks @rootfs, it makes sense.

So it will also be related to the discussion in #605 an #558 of using an external lib to collect cgroup metrics. The external lib should support both v1 and v2, then we will not need the kubelet metrics anymore

marceloamaral on Mar 29, 2023

The following is my mental map. For cgroup v1, kubelet metrics is the way to go.

Models	Features	Use Case
CounterOnly	HW Perf Counters	Baremetal servers
CGroupOnly	Cgroup stats (CPU, memory, IO)	CGroup V2 on VMs
KubeletOnly	Kubelet stats from cAdvisor	CGroup V1 Kubernetes on VM

rootfs on Mar 28, 2023