node_exporter: ARP collector error: rtnetlink NeighMessage has a wrong attribute data length
Host operating system: output of uname -a
Linux 5.14.21-150500.55.36-default #1 SMP PREEMPT_DYNAMIC Tue Oct 31 08:37:43 UTC 2023 (e7a2e23) x86_64 x86_64 x86_64 GNU/Linux
openSUSE Leap 15.5
node_exporter version: output of node_exporter --version
node_exporter, version 1.7.0 (branch: master, revision: 78af952e638b5e0d00640fbdeefd096df4a51dc2)
build user: ~~~
build date: ~~~
go version: go1.21.4.1
platform: linux/amd64
tags: netgo osusergo static_build
node_exporter command line flags
defaults
node_exporter log output
ts=2023-11-17T14:15:00.958Z caller=collector.go:169 level=error msg="collector failed" name=arp duration_seconds=0.000270392 err="could not get ARP entries: rtnetlink NeighMessage has a wrong attribute data length"
Are you running node_exporter in Docker?
no
What did you do that produced an error?
upgraded from version 1.6.0
What did you expect to see?
node_arp_entries
node_scrape_collector_success{collector="arp"} 1
What did you see instead?
node_scrape_collector_success{collector="arp"} 0
About this issue
- Original URL
- State: open
- Created 7 months ago
- Reactions: 5
- Comments: 24 (13 by maintainers)
Commits related to this issue
- gomod: update rtnentlink to fix ARP table issue See https://github.com/prometheus/node_exporter/issues/2849 and https://github.com/jsimonetti/rtnetlink/releases/tag/v1.4.0 for discussion. Change-Id:... — committed to monogon-dev/monogon by lorenz 7 months ago
- reverted due to https://github.com/prometheus/node_exporter/issues/2849 for latest release — committed to platform9/pf9-kube-prometheus by Ausfdes 6 months ago
Updated and released https://github.com/jsimonetti/rtnetlink/releases/tag/v1.4.0
The
ll_addr_n2a
function in the iproute2 source code sheds some light on how it handles various length NDA_LLADRs:The
type
is taken from the interface with which the link-local address is associated. Some of these types (ARPHRD_NETROM, ARPHRD_AX25) are quite archaic, and probably not likely to be encountered in most environments. However, several of them are still in widespread use, e.g. the various tunnel types.cf. (abridged):
Technically, the
ll_addr_n2a
function will handle a NDA_LLADDR with zero-lengthalen
:Since this code would still expect at least one byte of valid data in
addr[0]
, I suspect that elsewhere in the iproute2 source code they avoid calling this function with zero-lengthalen
.This looks like it is the issue currently. It appears
iproute2
silently ignores this entry.I think the correct course of action would be to allow zero-length
NDA_LLADDR
attributes on a neighbor entry, as clearly the kernel thinks it’s OK. This could have some effects to consumers of the module, as they would now have to manually filter out these entries.