telegraf: telegraf 1.21.3 crashing: &GoSNMP.Conn is missing
Relevent telegraf.conf
# Telegraf Configuration
#
# Telegraf is entirely plugin driven. All metrics are gathered from the
# declared inputs, and sent to the declared outputs.
#
# Plugins must be declared in here to be active.
# To deactivate a plugin, comment out the name and any variables.
#
# Use 'telegraf -config telegraf.conf -test' to see what metrics a config
# file would generate.
#
# Environment variables can be used anywhere in this config file, simply prepend
# them with $. For strings the variable must be within quotes (ie, "$STR_VAR"),
# for numbers and booleans they should be plain (ie, $INT_VAR, $BOOL_VAR)
# Global tags can be specified here in key="value" format.
[global_tags]
# dc = "us-east-1" # will tag all metrics with dc=us-east-1
# rack = "1a"
## Environment variables can be used as tags, and throughout the config file
# user = "$USER"
# Configuration for telegraf agent
[agent]
## Default data collection interval for all inputs
interval = "60s"
## Rounds collection interval to 'interval'
## ie, if interval="10s" then always collect on :00, :10, :20, etc.
round_interval = true
## Telegraf will send metrics to outputs in batches of at most
## metric_batch_size metrics.
## This controls the size of writes that Telegraf sends to output plugins.
metric_batch_size = 1000
## For failed writes, telegraf will cache metric_buffer_limit metrics for each
## output, and will flush this buffer on a successful write. Oldest metrics
## are dropped first when this buffer fills.
## This buffer only fills when writes fail to output plugin(s).
metric_buffer_limit = 10000
## Collection jitter is used to jitter the collection by a random amount.
## Each plugin will sleep for a random time within jitter before collecting.
## This can be used to avoid many plugins querying things like sysfs at the
## same time, which can have a measurable effect on the system.
collection_jitter = "0s"
## Default flushing interval for all outputs. You shouldn't set this below
## interval. Maximum flush_interval will be flush_interval + flush_jitter
flush_interval = "10s"
## Jitter the flush interval by a random amount. This is primarily to avoid
## large write spikes for users running a large number of telegraf instances.
## ie, a jitter of 5s and interval 10s means flushes will happen every 10-15s
flush_jitter = "0s"
## By default or when set to "0s", precision will be set to the same
## timestamp order as the collection interval, with the maximum being 1s.
## ie, when interval = "10s", precision will be "1s"
## when interval = "250ms", precision will be "1ms"
## Precision will NOT be used for service inputs. It is up to each individual
## service input to set the timestamp at the appropriate precision.
## Valid time units are "ns", "us" (or "µs"), "ms", "s".
precision = ""
## Logging configuration:
## Run telegraf with debug log messages.
debug = false
## Run telegraf in quiet mode (error log messages only).
quiet = false
## Specify the log file name. The empty string means to log to stderr.
logfile = ""
## Override default hostname, if empty use os.Hostname()
hostname = ""
## If set to true, do no set the "host" tag in the telegraf agent.
omit_hostname = false
###############################################################################
# OUTPUT PLUGINS #
###############################################################################
# Configuration for influxdb server to send metrics to
[[outputs.influxdb]]
## The full HTTP or UDP URL for your InfluxDB instance.
##
## Multiple urls can be specified as part of the same cluster,
## this means that only ONE of the urls will be written to each interval.
# urls = ["udp://127.0.0.1:8089"] # UDP endpoint example
urls = ["http://localhost:8086"] # required
## The target database for metrics (telegraf will create it if not exists).
database = "snmpdb" # required
###############################################################################
# INPUT #
###############################################################################
# # Retrieves SNMP values from remote agents
[[inputs.snmp]]
agents = [ "192.168.5.10:161" ]
# ## Timeout for each SNMP query.
timeout = "30s"
# ## Interval for each SNMP query.
interval = "15s"
# ## Number of retries to attempt within timeout.
retries = 3
# ## SNMP version, values can be 1, 2, or 3
version = 2
#
# ## SNMP community string.
community = "public"
#
# ## The GETBULK max-repetitions parameter
max_repetitions = 10
#
# ## SNMPv3 auth parameters
# #sec_name = "myuser"
# #auth_protocol = "md5" # Values: "MD5", "SHA", ""
# #auth_password = "pass"
# #sec_level = "authNoPriv" # Values: "noAuthNoPriv", "authNoPriv", "authPriv"
# #context_name = ""
# #priv_protocol = "" # Values: "DES", "AES", ""
# #priv_password = ""
#
# ## measurement name
# name = "QNAP"
#[[inputs.snmp.table]]
#name = "remote_servers"
# QNAP System
[[inputs.snmp.field]]
name = "CPU_Load"
oid = "1.3.6.1.4.1.24681.1.3.1.0"
#
[[inputs.snmp.field]]
name = "CPU1"
oid = "1.3.6.1.2.1.25.3.3.1.2.196608"
#
[[inputs.snmp.field]]
name = "CPU2"
oid = "1.3.6.1.2.1.25.3.3.1.2.196609"
#
[[inputs.snmp.field]]
name = "CPU3"
oid = "1.3.6.1.2.1.25.3.3.1.2.196610"
#
[[inputs.snmp.field]]
name = "CPU4"
oid = "1.3.6.1.2.1.25.3.3.1.2.196611"
#
[[inputs.snmp.field]]
name = "CPU5"
oid = "1.3.6.1.2.1.25.3.3.1.2.196612"
#
[[inputs.snmp.field]]
name = "CPU6"
oid = "1.3.6.1.2.1.25.3.3.1.2.196613"
#
[[inputs.snmp.field]]
name = "CPU7"
oid = "1.3.6.1.2.1.25.3.3.1.2.196614"
#
[[inputs.snmp.field]]
name = "CPU8"
oid = "1.3.6.1.2.1.25.3.3.1.2.196615"
#
[[inputs.snmp.field]]
name = "Total_RAM"
oid = "1.3.6.1.4.1.24681.1.3.2.0"
#
[[inputs.snmp.field]]
name = "Free_RAM"
oid = "1.3.6.1.4.1.24681.1.3.3.0"
#
[[inputs.snmp.field]]
name = "Fan1_Speed"
oid = "1.3.6.1.4.1.24681.1.3.15.1.3.1"
#
[[inputs.snmp.field]]
name = "Fan2_Speed"
oid = "1.3.6.1.4.1.24681.1.3.15.1.3.2"
#
[[inputs.snmp.field]]
name = "CPU_Temp"
oid = "1.3.6.1.4.1.24681.1.4.1.1.1.1.4.2.0"
#
[[inputs.snmp.field]]
name = "System_Temp"
oid = "1.3.6.1.4.1.24681.1.3.6.0"
# Input Hostnames
[[inputs.snmp.field]]
name = "hostname"
oid = "1.3.6.1.2.1.1.5.0"
is_tag = true
# Networking All Interfaces QNAP + SWITCHES
[[inputs.snmp.table]]
#ifTable,1.3.6.1.2.1.2.2,IF-MIB,OBJECT
name = "ifTable"
inherit_tags = [ "hostname" ]
#
#
[[inputs.snmp.table.field]]
#ifDescr,1.3.6.1.2.1.2.2.1.2,IF-MIB,OBJECT
name = "Interface"
oid = "1.3.6.1.2.1.2.2.1.2"
is_tag = true
#
[[inputs.snmp.table.field]]
#ifInOctets,1.3.6.1.2.1.2.2.1.10,IF-MIB,OBJECT
name = "RXBytes"
oid = "1.3.6.1.2.1.2.2.1.10"
#
[[inputs.snmp.table.field]]
#ifOutOctets,1.3.6.1.2.1.2.2.1.16,IF-MIB,OBJECT
#
name = "TXBytes"
oid = "1.3.6.1.2.1.2.2.1.16"
#
# End Networking Interfaces
# QNAP DISK Section
[[inputs.snmp]]
agents = [ "192.168.5.10:161"]
# ## Timeout for each SNMP query.
timeout = "30s"
interval = "10m"
# ## Number of retries to attempt within timeout.
retries = 3
# ## SNMP version, values can be 1, 2, or 3
version = 2
#
# ## SNMP community string.
community = "public"
#
# ## The GETBULK max-repetitions parameter
max_repetitions = 10
# QNAP HDD Table
[[inputs.snmp.table]]
#systemHdTableEX,1.3.6.1.4.1.24681.1.3.11,NAS-MIB,OBJECT-TYPE
name = "HDDTable"
inherit_tags = [ "hostname" ]
#
[[inputs.snmp.table.field]]
#hdDescrEX,1.3.6.1.4.1.24681.1.3.11.1.2,NAS-MIB,OBJECT-TYPE
#
name = "HDDDescription"
oid = "1.3.6.1.4.1.24681.1.3.11.1.2"
is_tag = true
#
[[inputs.snmp.table.field]]
#hdTemperatureEX,1.3.6.1.4.1.24681.1.3.11.1.3,NAS-MIB,OBJECT-TYPE
name = "Temperature"
oid = "1.3.6.1.4.1.24681.1.3.11.1.3"
#
[[inputs.snmp.table.field]]
#hdStatusEX,1.3.6.1.4.1.24681.1.3.11.1.4,NAS-MIB,OBJECT-TYPE
name = "Status"
oid = "1.3.6.1.4.1.24681.1.3.11.1.4"
#
[[inputs.snmp.table.field]]
#hdSmartInfoEX,1.3.6.1.4.1.24681.1.3.11.1.7,NAS-MIB,OBJECT-TYPE
name = "S.M.A.R.T."
oid = "1.3.6.1.4.1.24681.1.3.11.1.7"
# QNAP DISKPOOL Table
[[inputs.snmp.table]]
#poolTable,1.3.6.1.4.1.24681.1.4.1.1.1.2.2.2,NAS-MIB,OBJECT-TYPE
name = "poolTable"
inherit_tags = [ "hostname" ]
#
[[inputs.snmp.table.field]]
#poolID,1.3.6.1.4.1.24681.1.4.1.1.1.2.2.2.1.2,NAS-MIB,OBJECT-TYPE
name = "poolID"
oid = "1.3.6.1.4.1.24681.1.4.1.1.1.2.2.2.1.2"
is_tag = true
#
[[inputs.snmp.table.field]]
#poolCapacity,1.3.6.1.4.1.24681.1.4.1.1.1.2.2.2.1.3,NAS-MIB,OBJECT-TYPE
name = "poolCapacity"
oid = "1.3.6.1.4.1.24681.1.4.1.1.1.2.2.2.1.3"
#
[[inputs.snmp.table.field]]
#poolFreeSize,1.3.6.1.4.1.24681.1.4.1.1.1.2.2.2.1.4,NAS-MIB,OBJECT-TYPE
name = "poolFreeSize"
oid = "1.3.6.1.4.1.24681.1.4.1.1.1.2.2.2.1.4"
[[inputs.snmp.table.field]]
#poolStatus,1.3.6.1.4.1.24681.1.4.1.1.1.2.2.2.1.5,NAS-MIB,OBJECT-TYPE
name = "poolStatus"
oid = "1.3.6.1.4.1.24681.1.4.1.1.1.2.2.2.1.5"
# QNAP VOLUME Table
[[inputs.snmp.table]]
#volumeTable,1.3.6.1.4.1.24681.1.4.1.1.1.2.3.2,NAS-MIB,OBJECT-TYPE
name = "volumeTable"
inherit_tags = [ "hostname" ]
#
[[inputs.snmp.table.field]]
#volumeName,1.3.6.1.4.1.24681.1.4.1.1.1.2.3.2.1.8,NAS-MIB,OBJECT-TYPE
name = "volumeName"
oid = "1.3.6.1.4.1.24681.1.4.1.1.1.2.3.2.1.8"
is_tag = true
#
[[inputs.snmp.table.field]]
#volumeCapacity,1.3.6.1.4.1.24681.1.4.1.1.1.2.3.2.1.3,NAS-MIB,OBJECT-TYPE
name = "volumeCapacity"
oid = "1.3.6.1.4.1.24681.1.4.1.1.1.2.3.2.1.3"
#
[[inputs.snmp.table.field]]
#volumeFreeSize,1.3.6.1.4.1.24681.1.4.1.1.1.2.3.2.1.4,NAS-MIB,OBJECT-TYPE
name = "volumeFreeSize"
oid = "1.3.6.1.4.1.24681.1.4.1.1.1.2.3.2.1.4"
#
[[inputs.snmp.table.field]]
#volumeStatus,1.3.6.1.4.1.24681.1.4.1.1.1.2.3.2.1.5,NAS-MIB,OBJECT-TYPE
name = "volumeStatus"
oid = "1.3.6.1.4.1.24681.1.4.1.1.1.2.3.2.1.5"
Logs from Telegraf
Feb 1 16:47:15 snmpdb telegraf[122]: 2022-02-01T15:47:15Z E! [inputs.snmp] Error in plugin: agent 192.168.5.10:161: performing get on field CPU_Load: &GoSNMP.Conn is missing. Provide a connection or use Connect() Feb 1 16:47:15 snmpdb telegraf[122]: 2022-02-01T15:47:15Z E! [inputs.snmp] Error in plugin: agent 192.168.5.10:161: gathering table ifTable: performing bulk walk for field Interface: &GoSNMP.Conn is missing. Provide a connection or use Connect() Feb 1 16:47:30 snmpdb telegraf[122]: 2022-02-01T15:47:30Z E! [inputs.snmp] Error in plugin: agent 192.168.5.10:161: performing get on field CPU_Load: &GoSNMP.Conn is missing. Provide a connection or use Connect() Feb 1 16:47:30 snmpdb telegraf[122]: 2022-02-01T15:47:30Z E! [inputs.snmp] Error in plugin: agent 192.168.5.10:161: gathering table ifTable: performing bulk walk for field Interface: &GoSNMP.Conn is missing. Provide a connection or use Connect() Feb 1 16:47:45 snmpdb telegraf[122]: 2022-02-01T15:47:45Z E! [inputs.snmp] Error in plugin: agent 192.168.5.10:161: performing get on field CPU_Load: &GoSNMP.Conn is missing. Provide a connection or use Connect() Feb 1 16:47:45 snmpdb telegraf[122]: 2022-02-01T15:47:45Z E! [inputs.snmp] Error in plugin: agent 192.168.5.10:161: gathering table ifTable: performing bulk walk for field Interface: &GoSNMP.Conn is missing. Provide a connection or use Connect() Feb 1 16:48:00 snmpdb telegraf[122]: 2022-02-01T15:48:00Z E! [inputs.snmp] Error in plugin: agent 192.168.5.10:161: performing get on field CPU_Load: &GoSNMP.Conn is missing. Provide a connection or use Connect() Feb 1 16:48:00 snmpdb telegraf[122]: 2022-02-01T15:48:00Z E! [inputs.snmp] Error in plugin: agent 192.168.5.10:161: gathering table ifTable: performing bulk walk for field Interface: &GoSNMP.Conn is missing. Provide a connection or use Connect()
System info
Ubuntu 20.04LTS (LXD), Telegraf 1.21.3 affected, 1.20.4 is the last working release
Docker
No response
Steps to reproduce
- upgrade to latest telegraf release
- Workaround is downgrading to 1.20.4, works immediately.
Expected behavior
Collect snmp metrics
Actual behavior
telegraf stops working with the latest release, unable to collect data via snmp
Additional info
No response
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 43 (5 by maintainers)
I’m going to reopen your original issue #10890, as this issue has a different root cause and is actually fixed.
Hi @Hipska I can confirm: Version 1.22.4 is fixing the issue! https://github.com/influxdata/telegraf/releases/tag/v1.22.4
Thanks for your help!! 👍
BR/JO!
@jostrasser I think the following might as well help with your issue: #11042 I think you could try the latest nightly build.
I just downgraded to 1.20.4 which is also for me the last good version. This made also my issue with one SNMP client go away.
Hi @bondskin ,
nope, but the messages are different now. Without the
ExecStartPre=/bin/sleep 60
the service isn’t coming up:I would report that in the respective PR…
Same here. Stuck between a rock and a hard place… Still getting the error &SNMP.Conn is missing with 1.22.1 on some oid’s, but if I roll back, then I am not able to resolve some other OID’s in 1.21.x which are working properly in 1.22.x…
Same here. Telegraf version 1.22, running on CentOS 8 stream. Lot of different SNMP input plugins stopped working after update to 1.22 with same error:
2022-04-06T08:10:03Z E! [inputs.snmp::Inventory] Error in plugin: agent 192.168.25.52:161: gathering table VRF_interface: performing bulk walk for field vrfIntType: &GoSNMP.Conn is missing. Provide a connection or use Connect()
Restarting Telegraf does not help.
yes, @jostrasser , confirmed. Stopping and starting again does not fix the “&GoSNMP.Conn is missing. Provide a connection or use Connect()” error. At least, Telegraf keeps running, which was not the case with 1.21.x.
Yes, that’s correct. I have to manually restart the telegraf service afterward (or delaying the start). It is possible that the networking take some time to came up but it is strange that this only occurs on versions after 1.20.4.
Thanks for your feedback and support! If you need additional informations or if I can do anything let me know.