kepler: dial error: dial unix /tmp/estimator.sock: connect: no such file or directory

Describe the bug After rolling over the daemonset to the latest image on quay.io registry (sha256:01a86339a8acb566ddcee848640ed4419ad0bffac98529e9b489a3dcb1e671f5) there is the message from title being shown constantly. Example output of the problem:

2022/08/25 12:30:53 Kubelet Read: map[<pod-list-trimmed>]
2022/08/25 12:30:53 dial error: dial unix /tmp/estimator.sock: connect: no such file or directory
energy from pod (0 processes): name: <some-pod> namespace: <some-namespace>

Is the estimator.sock expected to be missing in current state of the project?

Each node is reporting the same error. As a sidenote, since then nodes are not logging any new kepler metrics to Prometheus. I am in no place to suggest that these are connected issues and the missing metrics might be some other local issue, but there’s that.

To Reproduce Steps to reproduce the behavior:

  1. Run kepler on OpenShift 4.11
  2. Check kepler-exporter container logs for presence of ‘/tmp/estimator.sock: connect: no such file or directory’

Expected behavior /tmp/estimator.sock error is not reported.

Desktop (please complete the following information):

  • OS: RedHat CoreOS 4.11

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 19 (8 by maintainers)

Most upvoted comments

Hello sunya-ch and thanks for giving this some attention!

  • Pod’s metrics are correctly reported with expected pods from expected nodes
  • I can see pod_energy_stat and other expected metrics in Prometheus, thus confirming that they are being sent
  • Prometheus address is set to 0.0.0.0:9102

The “grafana dashboard should be updated” note is what it boils down to, I think. I can see that dashboards use “pod_cpu_energy_total”, “pod_dram_energy_total” and “pod_energy_total” metrics (and others different from the list specified above), which I can also find in the Prometheus. Both the Grafana-defined names and the new ones can be found there, the new ones are being reported to Prometheus.

Is my understanding correct that there has been a metric name-change in the meantime and as so the Grafana dashboards found in grafana-dashboards are incompatible with the new metric names?

If that is so, thanks for getting this sorted out, I mean, thanks for helping flesh out the issue 😃

@Feelas thanks for the detailed test! If you can submit a PR on the grafana name change, that’ll be great.

@Feelas the message dial unix /tmp/estimator.sock: connect: no such file or directory is benign. The short story is that, the estimator sidecar is not yet started (this is being worked on in #104 and the estimator repo). Upcoming PRs will start up the estimator sidecar and create the sock.

Thanks for testing!