node_exporter: ERROR: cpufreq collector failed after 1.499322s: strconv.ParseUint: parsing \"\": invalid syntax" source="collector.go:132"

Host operating system: output of `uname -a`

Linux EM-4V8NH42 3.10.0-327.el7.x86_64 #1 SMP Thu Nov 19 22:10:57 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

node_exporter version: output of `node_exporter --version`

node_exporter, version 0.18.1 (branch: HEAD, revision: 3db77732e925c08f675d7404a8c46466b2ece83e) build user: root@b50852a1acba build date: 20190604-16:41:18 go version: go1.12.5

node_exporter command line flags

/opt/node_exporter-0.18.1.linux-amd64/node_exporter --web.listen-address=“:9111”

Are you running node_exporter in Docker?

How to solve or avoid this problem? Thank you

About this issue

Original URL
State: open
Created 4 years ago
Comments: 15 (11 by maintainers)

Commits related to this issue

internal/util: Read(U)intFromFile: return file name in errors This improves error messages by adding the affected file name. This helps downstream users such as node_exporter to locate the source of ... — committed to hoffie/procfs by hoffie 4 years ago
internal/util: Read(U)intFromFile: return file name in errors This improves error messages by adding the affected file name. This helps downstream users to locate the source of errors which is useful... — committed to hoffie/procfs by hoffie 4 years ago

Most upvoted comments

On the affected systems, the following loop demonstrates the problem outside of node_exporter after some time:

$ while true; do sudo grep unknown /sys/devices/system/cpu/cpu*/cpufreq/*; sleep 1; done
/sys/devices/system/cpu/cpu23/cpufreq/cpuinfo_cur_freq:<unknown>

We have opened a support case with Red Hat and at least on our HPE machines running RHEL7 this issue can be explained by a broken cpufreq driver. According to Red Hat and HPE, the pcc-cpufreq driver is selected on such machines if the Power regulator setting in the BIOS is not set to OS Control Mode. This driver is considered broken and is no longer developed. Both HPE and Red Hat advise to change the BIOS setting. This will load the intel_pstate driver instead, which should not have this problem.

We have confirmed that changing the BIOS setting makes Linux load the intel_pstate driver. We have not yet run any larger tests if this fixes the problem described in this Github issue, but I’m optimistic. 😃

So, the generic answer is as expected: This is a problem in the cpufreq driver and nothing which node_exporter can fix. It also seems that this cosmetic issue is one of the less important symptoms of the buggy driver (which, it appears, can lead to unbootable or performance-degraded systems).

References: https://support.hpe.com/hpesc/public/docDisplay?docLocale=en_US&docId=emr_na-c04704148 https://patchwork.kernel.org/patch/10528797/ https://access.redhat.com/solutions/3421421

I do think the error message could be improved, to make it easier to see exactly what CPU/metric failed.

@SuperQ I had a quick look regarding improving the error messages. node_exporter uses procfs’ sysfs package and its parseCpufreqCpuinfo function. That function uses ReadUintFromFile. So I assume this is where the error should be wrapped with a hint about the affected filename. If this is wanted, I assume that it should be done with all those helper functions which take a path name, right?

hoffie on Aug 12, 2020

Just a quick update from my side: As expected, this issue went away for us by fixing the frequency scaling driver as recommended by the vendors (i.e. moving away from pcc-cpufreq). So, it was not a node_exporter issue, but a kernel (driver) bug in our case.

hoffie on May 17, 2021

Thanks @hoffie. If this was a consistent problem where it always returned <unknown>, it would be something to make an effort to patch. But if it’s a random/occasional error that affects only a specific set of platform combinations, I think we should leave it as is.

It’s an indicator of a real problem. Covering it up is more dangerous than the minor annoyance of some missed data and soft errors in the scrape. We already handle the collector failing in a soft way.