kepler: error loading BPF program: invalid argument
What happened?
Hello, I’m trying to use Kepler now on a machine with access to the counters. But it seems not to be working. On my VMs, I can see it working with the estimations, but now that I’m deploying it in these new machines, I just see 0s as the measurements.
I tried to install Kepler by helm chart or by building it and applying the deployment file afterwords, but I had no success.
When I install it with helm, I can see the following logs:
> kubectl logs kepler-wdsjr -n monitoring
I0713 14:19:44.756647 1 container_hc_collector.go:164] could not delete bpf table elements, err: Table.Delete: key 0xd6: no such file or directoryI0713 14:19:44.756672 1 container_hc_collector.go:164] could not delete bpf table elements, err: Table.Delete: key 0xd6: no such file or directory
I0713 14:19:44.756696 1 container_hc_collector.go:164] could not delete bpf table elements, err: Table.Delete: key 0xd6: no such file or directoryI0713 14:19:44.756735 1 container_hc_collector.go:164] could not delete bpf table elements, err: Table.Delete: key 0xd6: no such file or directory
I0713 14:19:44.756764 1 container_hc_collector.go:164] could not delete bpf table elements, err: Table.Delete: key 0xd6: no such file or directoryI0713 14:19:44.756791 1 container_hc_collector.go:164] could not delete bpf table elements, err: Table.Delete: key 0xd6: no such file or directory
I0713 14:19:44.756817 1 container_hc_collector.go:164] could not delete bpf table elements, err: Table.Delete: key 0xd6: no such file or directoryI0713 14:19:44.756846 1 container_hc_collector.go:164] could not delete bpf table elements, err: Table.Delete: key 0xd6: no such file or directory
I0713 14:19:44.756873 1 container_hc_collector.go:164] could not delete bpf table elements, err: Table.Delete: key 0xd6: no such file or directoryI0713 14:19:44.756911 1 container_hc_collector.go:164] could not delete bpf table elements, err: Table.Delete: key 0xd6: no such file or directory
I0713 14:19:44.756939 1 container_hc_collector.go:164] could not delete bpf table elements, err: Table.Delete: key 0xd6: no such file or directoryI0713 14:19:44.756965 1 container_hc_collector.go:164] could not delete bpf table elements, err: Table.Delete: key 0xd6: no such file or directory
I0713 14:19:44.756990 1 container_hc_collector.go:164] could not delete bpf table elements, err: Table.Delete: key 0xd6: no such file or directoryI0713 14:19:44.757054 1 container_hc_collector.go:164] could not delete bpf table elements, err: Table.Delete: key 0xd6: no such file or directory
I0713 14:19:44.757081 1 container_hc_collector.go:164] could not delete bpf table elements, err: Table.Delete: key 0xd6: no such file or directoryI0713 14:19:44.757113 1 container_hc_collector.go:164] could not delete bpf table elements, err: Table.Delete: key 0xd6: no such file or directory
I0713 14:19:44.757140 1 container_hc_collector.go:164] could not delete bpf table elements, err: Table.Delete: key 0xd6: no such file or directoryI0713 14:19:44.757167 1 container_hc_collector.go:164] could not delete bpf table elements, err: Table.Delete: key 0xd6: no such file or directory
I0713 14:19:44.757196 1 container_hc_collector.go:164] could not delete bpf table elements, err: Table.Delete: key 0xd6: no such file or directoryI0713 14:19:44.757225 1 container_hc_collector.go:164] could not delete bpf table elements, err: Table.Delete: key 0xd6: no such file or directory
I0713 14:19:44.757255 1 container_hc_collector.go:164] could not delete bpf table elements, err: Table.Delete: key 0xd6: no such file or directoryI0713 14:19:44.757283 1 container_hc_collector.go:164] could not delete bpf table elements, err: Table.Delete: key 0xd6: no such file or directory
I0713 14:19:44.757322 1 container_hc_collector.go:164] could not delete bpf table elements, err: Table.Delete: key 0xd6: no such file or directoryI0713 14:19:44.757348 1 container_hc_collector.go:164] could not delete bpf table elements, err: Table.Delete: key 0xd6: no such file or directory
I0713 14:19:44.757373 1 container_hc_collector.go:164] could not delete bpf table elements, err: Table.Delete: key 0xd6: no such file or directoryI0713 14:19:44.757399 1 container_hc_collector.go:164] could not delete bpf table elements, err: Table.Delete: key 0xd6: no such file or directory
I0713 14:19:44.757428 1 container_hc_collector.go:164] could not delete bpf table elements, err: Table.Delete: key 0xd6: no such file or directory
Then, when I query:
> kubectl exec -ti -n monitoring daemonset/kepler -- bash -c "curl localhost:9102/metrics" | grep kepler_container_core_joules_total | grep wskow
# HELP kepler_container_core_joules_total Aggregated RAPL value in core in joules
# TYPE kepler_container_core_joules_total counter
kepler_container_core_joules_total{command="",container_id="13760f6f9db879378c267c91f6a6cec71a3111f8c9a73a39e457756702516919",container_name="user-action",container_namespace="openwhisk",mode="dynamic",pod_name="wskow-invoker-00-1-prewarm-nodejs10"} 0
kepler_container_core_joules_total{command="",container_id="13760f6f9db879378c267c91f6a6cec71a3111f8c9a73a39e457756702516919",container_name="user-action",container_namespace="openwhisk",mode="idle",pod_name="wskow-invoker-00-1-prewarm-nodejs10"} 0
kepler_container_core_joules_total{command="",container_id="32b2e453cae162b208d13859bdfc4b4726d22186de5fefe949759c6b4ee6b4af",container_name="user-action",container_namespace="openwhisk",mode="dynamic",pod_name="wskow-invoker-00-2-prewarm-nodejs10"} 0
kepler_container_core_joules_total{command="",container_id="32b2e453cae162b208d13859bdfc4b4726d22186de5fefe949759c6b4ee6b4af",container_name="user-action",container_namespace="openwhisk",mode="idle",pod_name="wskow-invoker-00-2-prewarm-nodejs10"} 0
kepler_container_core_joules_total{command="",container_id="808040772cbb31c91c4f4084f1a680c97c237879c83288831ed9614b05f1cb7c",container_name="user-action",container_namespace="openwhisk",mode="dynamic",pod_name="wskow-invoker-00-9-guest-linpack"} 0
kepler_container_core_joules_total{command="",container_id="808040772cbb31c91c4f4084f1a680c97c237879c83288831ed9614b05f1cb7c",container_name="user-action",container_namespace="openwhisk",mode="idle",pod_name="wskow-invoker-00-9-guest-linpack"} 0
kepler_container_core_joules_total{command="",container_id="99ac37763d4a96b89c29e2e079796876fc2cb2d08d3febf54346ebafce0d6d96",container_name="user-action",container_namespace="openwhisk",mode="dynamic",pod_name="wskow-invoker-00-8-whisksystem-invokerhealthtestaction0"} 0
kepler_container_core_joules_total{command="",container_id="99ac37763d4a96b89c29e2e079796876fc2cb2d08d3febf54346ebafce0d6d96",container_name="user-action",container_namespace="openwhisk",mode="idle",pod_name="wskow-invoker-00-8-whisksystem-invokerhealthtestaction0"} 0
When I try to build by myself using make build-manifest OPTS="PROMETHEUS_DEPLOY"
I can see in the logs:
> kubectl logs kepler-exporter-vrvmg -n monitoring
I0713 10:29:47.059485 1 gpu.go:46] Failed to init nvml, err: could not init nvml: error opening libnvidia-ml.so.1: libnvidia-ml.so.1: cannot open shared object file: No such file or directory
I0713 10:29:47.065771 1 acpi.go:67] Could not find any ACPI power meter path. Is it a VM?
I0713 10:29:47.074301 1 exporter.go:151] Kepler running on version: 22f2c84
I0713 10:29:47.074322 1 config.go:212] using gCgroup ID in the BPF program: true
I0713 10:29:47.074352 1 config.go:214] kernel version: 4.19
I0713 10:29:47.074402 1 config.go:174] kernel source dir is set to /usr/share/kepler/kernel_sources
I0713 10:29:47.074449 1 exporter.go:171] EnabledBPFBatchDelete: true
I0713 10:29:47.074505 1 power.go:53] use sysfs to obtain power
I0713 10:29:47.604160 1 exporter.go:184] Initializing the GPU collector
I0713 10:29:47.604444 1 watcher.go:67] Using in cluster k8s config
modprobe: FATAL: Module kheaders not found in directory /lib/modules/4.19.0-24-amd64
chdir(/lib/modules/4.19.0-24-amd64/build): No such file or directory
I0713 10:29:47.710306 1 bcc_attacher.go:74] failed to attach the bpf program: <nil>
I0713 10:29:47.710331 1 bcc_attacher.go:143] failed to attach perf module with options [-DMAP_SIZE=10240 -DNUM_CPUS=32]: failed to attach the bpf program: <nil>, from default kernel source.
I0713 10:29:47.710357 1 bcc_attacher.go:146] trying to load eBPF module with kernel source dir /usr/share/kepler/kernel_sources/4.18.0-477.13.1.el8_8.x86_64
bpf: Failed to load program: Invalid argument
I0713 10:29:48.571949 1 bcc_attacher.go:150] failed to attach perf module with options [-DMAP_SIZE=10240 -DNUM_CPUS=32]: failed to load kprobe__finish_task_switch: error loading BPF program: invalid argument, from kernel source "/usr/share/kepler/kernel_sources/4.18.0-477.13.1.el8_8.x86_64"
I0713 10:29:48.571986 1 bcc_attacher.go:146] trying to load eBPF module with kernel source dir /usr/share/kepler/kernel_sources/5.14.0-284.11.1.el9_2.x86_64
bpf: Failed to load program: Invalid argument
I0713 10:29:49.392366 1 bcc_attacher.go:150] failed to attach perf module with options [-DMAP_SIZE=10240 -DNUM_CPUS=32]: failed to load kprobe__finish_task_switch: error loading BPF program: invalid argument, from kernel source "/usr/share/kepler/kernel_sources/5.14.0-284.11.1.el9_2.x86_64"
I0713 10:29:49.392431 1 bcc_attacher.go:158] failed to attach perf module with options [-DMAP_SIZE=10240 -DNUM_CPUS=32]: failed to load kprobe__finish_task_switch: error loading BPF program: invalid argument, not able to load eBPF modules
I0713 10:29:49.392483 1 exporter.go:201] failed to start : failed to attach bpf assets: failed to attach perf module with options [-DMAP_SIZE=10240 -DNUM_CPUS=32]: failed to load kprobe__finish_task_switch: error loading BPF program: invalid argument, not able to load eBPF modules
I0713 10:29:49.392628 1 exporter.go:228] Started Kepler in 2.318348644s
What is weird is that it complains about the /lib/modules, which are installed in both of machines that I’m using:
> kubectl exec -ti debug-9trwq bash
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl kubectl exec [POD] -- [COMMAND] instead.
root@paravance-49:/# nsenter --mount=/proc/1/ns/mnt -- sh -s
# ls /lib/modules
4.19.0-24-amd64
# ls /usr/lib/modules
4.19.0-24-amd64
> kubectl exec -ti debug-x6wlt bash
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl kubectl exec [POD] -- [COMMAND] instead.
root@paravance-40:/# nsenter --mount=/proc/1/ns/mnt -- sh -s
# ls /lib/modules
4.19.0-24-amd64
# ls /usr/lib/modules
4.19.0-24-amd64
And finally, the result of the query is the same as above.
Can you help me, please?
PS: In fact, my goal is not to have the measurements from the real countes, I want to validate the Kepler’s estimation by crossing them with the powermeters that are installed in these machines. So, if possible, I would like to keep using the estimations. But I also don’t know how to specify that.
Can you help me to solve both issues (the main one and the PS), please?
Thank you very much!!
What did you expect to happen?
To get the estimations from Kepler.
How can we reproduce it (as minimally and precisely as possible)?
Following the commands I exemplified above.
Anything else we need to know?
No response
Kepler image tag
Kubernetes version
> kubectl version
Client Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.0", GitCommit:"9e991415386e4cf155a24b1da15becaa390438d8", GitTreeState:"clean", BuildDate:"2020-03-25T14:58:59Z", GoVersion:"go1.13.8", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.4", GitCommit:"b695d79d4f967c403a96986f1750a35eb75e75f1", GitTreeState:"clean", BuildDate:"2021-11-17T15:42:41Z", GoVersion:"go1.16.10", Compiler:"gc", Platform:"linux/amd64"}
Cloud provider or bare metal
OS version
# On Linux:
$ cat /etc/os-release
# paste output here
$ uname -a
# paste output here
# On Windows:
C:\> wmic os get Caption, Version, BuildNumber, OSArchitecture
# paste output here
Install tools
Kepler deployment config
For on kubernetes:
$ KEPLER_NAMESPACE=kepler
# provide kepler configmap
$ kubectl get configmap kepler-cfm -n ${KEPLER_NAMESPACE}
# paste output here
# provide kepler deployment description
$ kubectl describe deployment kepler-exporter -n ${KEPLER_NAMESPACE}
For standalone:
put your Kepler command argument here
Container runtime (CRI) and version (if applicable)
Related plugins (CNI, CSI, …) and versions (if applicable)
About this issue
- Original URL
- State: closed
- Created a year ago
- Comments: 31 (9 by maintainers)
@sunya-ch , I just changed back to my original yaml to do the checks you suggested. Here are the tests:
clusterrolebinding:
Authorization:
Kepler logs are not showing anymore the message ’ cannot list resource “pods” '. Here are the full logs:
@sunya-ch , here is the output of the command: