telegraf: OPCUA input plugin crashes Telegraf
Relevant telegraf.conf
# Configuration for telegraf agent
[agent]
## Default data collection interval for all inputs
interval = "1s"
## Rounds collection interval to 'interval'
## ie, if interval="10s" then always collect on :00, :10, :20, etc.
round_interval = true
## Telegraf will send metrics to outputs in batches of at most
## metric_batch_size metrics.
## This controls the size of writes that Telegraf sends to output plugins.
metric_batch_size = 1000
## Maximum number of unwritten metrics per output. Increasing this value
## allows for longer periods of output downtime without dropping metrics at the
## cost of higher maximum memory usage.
metric_buffer_limit = 10000
## Collection jitter is used to jitter the collection by a random amount.
## Each plugin will sleep for a random time within jitter before collecting.
## This can be used to avoid many plugins querying things like sysfs at the
## same time, which can have a measurable effect on the system.
collection_jitter = "0s"
## Default flushing interval for all outputs. Maximum flush_interval will be
## flush_interval + flush_jitter
flush_interval = "1s"
## Jitter the flush interval by a random amount. This is primarily to avoid
## large write spikes for users running a large number of telegraf instances.
## ie, a jitter of 5s and interval 10s means flushes will happen every 10-15s
flush_jitter = "0s"
## By default or when set to "0s", precision will be set to the same
## timestamp order as the collection interval, with the maximum being 1s.
## ie, when interval = "10s", precision will be "1s"
## when interval = "250ms", precision will be "1ms"
## Precision will NOT be used for service inputs. It is up to each individual
## service input to set the timestamp at the appropriate precision.
## Valid time units are "ns", "us" (or "µs"), "ms", "s".
precision = ""
## Log at debug level.
# debug = false
## Log only error level messages.
# quiet = false
## Log target controls the destination for logs and can be one of "file",
## "stderr" or, on Windows, "eventlog". When set to "file", the output file
## is determined by the "logfile" setting.
# logtarget = "file"
## Name of the file to be logged to when using the "file" logtarget. If set to
## the empty string then logs are written to stderr.
# logfile = ""
## The logfile will be rotated after the time interval specified. When set
## to 0 no time based rotation is performed. Logs are rotated only when
## written to, if there is no log activity rotation may be delayed.
# logfile_rotation_interval = "0d"
## The logfile will be rotated when it becomes larger than the specified
## size. When set to 0 no size based rotation is performed.
# logfile_rotation_max_size = "0MB"
## Maximum number of rotated archives to keep, any older logs are deleted.
## If set to -1, no archives are removed.
# logfile_rotation_max_archives = 5
## Pick a timezone to use when logging or type 'local' for local time.
## Example: America/Chicago
# log_with_timezone = ""
## Override default hostname, if empty use os.Hostname()
hostname = ""
## If set to true, do no set the "host" tag in the telegraf agent.
omit_hostname = false
[[outputs.influxdb_v2]]
## The URLs of the InfluxDB cluster nodes.
##
## Multiple URLs can be specified for a single cluster, only ONE of the
## urls will be written to each interval.
## ex: urls = ["https://us-west-2-1.aws.cloud2.influxdata.com"]
urls = ["http://localhost:8086"]
## Token for authentication.
token = "$INFLUX_TOKEN"
## Organization is the name of the organization you wish to write to; must exist.
organization = "nondisclosed"
## Destination bucket to write into.
bucket = "alsonotdisclosed"
## The value of this tag will be used to determine the bucket. If this
## tag is not set the 'bucket' option is used as the default.
# bucket_tag = ""
## If true, the bucket tag will not be added to the metric.
# exclude_bucket_tag = false
## Timeout for HTTP messages.
# timeout = "5s"
## Additional HTTP headers
# http_headers = {"X-Special-Header" = "Special-Value"}
## HTTP Proxy override, if unset values the standard proxy environment
## variables are consulted to determine which proxy, if any, should be used.
# http_proxy = "http://corporate.proxy:3128"
## HTTP User-Agent
# user_agent = "telegraf"
## Content-Encoding for write request body, can be set to "gzip" to
## compress body or "identity" to apply no encoding.
# content_encoding = "gzip"
## Enable or disable uint support for writing uints influxdb 2.0.
# influx_uint_support = false
## Optional TLS Config for use on HTTP connections.
# tls_ca = "/etc/telegraf/ca.pem"
# tls_cert = "/etc/telegraf/cert.pem"
# tls_key = "/etc/telegraf/key.pem"
## Use TLS but skip chain & host verification
# insecure_skip_verify = false
# Read metrics from MQTT topic(s)
[[inputs.mqtt_consumer]]
## Broker URLs for the MQTT server or cluster. To connect to multiple
## clusters or standalone servers, use a separate plugin instance.
## example: servers = ["tcp://localhost:1883"]
## servers = ["ssl://localhost:1883"]
## servers = ["ws://localhost:1883"]
servers = ["tcp://127.0.0.1:1883"]
## Topics that will be subscribed to.
topics = [
"telegraf/host01/cpu",
"telegraf/+/mem",
"sensors/#",
]
## The message topic will be stored in a tag specified by this value. If set
## to the empty string no topic tag will be created.
topic_tag = ""
## QoS policy for messages
## 0 = at most once
## 1 = at least once
## 2 = exactly once
##
## When using a QoS of 1 or 2, you should enable persistent_session to allow
## resuming unacknowledged messages.
# qos = 0
## Connection timeout for initial connection in seconds
# connection_timeout = "30s"
## Maximum messages to read from the broker that have not been written by an
## output. For best throughput set based on the number of metrics within
## each message and the size of the output's metric_batch_size.
##
## For example, if each message from the queue contains 10 metrics and the
## output metric_batch_size is 1000, setting this to 100 will ensure that a
## full batch is collected and the write is triggered immediately without
## waiting until the next flush_interval.
# max_undelivered_messages = 1000
## Persistent session disables clearing of the client session on connection.
## In order for this option to work you must also set client_id to identify
## the client. To receive messages that arrived while the client is offline,
## also set the qos option to 1 or 2 and don't forget to also set the QoS when
## publishing.
# persistent_session = false
## If unset, a random client ID will be generated.
# client_id = ""
## Username and password to connect MQTT server.
# username = "telegraf"
# password = "metricsmetricsmetricsmetrics"
## Optional TLS Config
# tls_ca = "/etc/telegraf/ca.pem"
# tls_cert = "/etc/telegraf/cert.pem"
# tls_key = "/etc/telegraf/key.pem"
## Use TLS but skip chain & host verification
# insecure_skip_verify = false
## Data format to consume.
## Each data format has its own unique set of configuration options, read
## more about them here:
## https://github.com/influxdata/telegraf/blob/master/docs/DATA_FORMATS_INPUT.md
data_format = "influx"
## Enable extracting tag values from MQTT topics
## _ denotes an ignored entry in the topic path
[[inputs.mqtt_consumer.topic_parsing]]
topic = "sensors/+"
measurement = "measurement/_"
tags = "_/sensor_id"
# fields = ""
## Value supported is int, float, unit
# [[inputs.mqtt_consumer.topic.types]]
# key = type
# Read metrics about cpu usage
[[inputs.cpu]]
## Whether to report per-cpu stats or not
percpu = true
## Whether to report total system cpu stats or not
totalcpu = true
## If true, collect raw CPU time metrics
collect_cpu_time = false
## If true, compute and report the sum of all non-idle CPU states
report_active = false
## If true and the info is available then add core_id and physical_id tags
core_tags = false
# Read metrics about disk usage by mount point
[[inputs.disk]]
## By default stats will be gathered for all mount points.
## Set mount_points will restrict the stats to only the specified mount points.
# mount_points = ["/"]
## Ignore mount points by filesystem type.
ignore_fs = ["tmpfs", "devtmpfs", "devfs", "iso9660", "overlay", "aufs", "squashfs"]
## Ignore mount points by mount options.
## The 'mount' command reports options of all mounts in parathesis.
## Bind mounts can be ignored with the special 'bind' option.
# ignore_mount_opts = []
# Read metrics about disk IO by device
# This plugin ONLY supports Linux
[[inputs.diskio]]
## By default, telegraf will gather stats for all devices including
## disk partitions.
## Setting devices will restrict the stats to the specified devices.
## NOTE: Globbing expressions (e.g. asterix) are not supported for
## disk synonyms like '/dev/disk/by-id'.
# devices = ["sda", "sdb", "vd*", "/dev/disk/by-id/nvme-eui.00123deadc0de123"]
## Uncomment the following line if you need disk serial numbers.
# skip_serial_number = false
#
## On systems which support it, device metadata can be added in the form of
## tags.
## Currently only Linux is supported via udev properties. You can view
## available properties for a device by running:
## 'udevadm info -q property -n /dev/sda'
## Note: Most, but not all, udev properties can be accessed this way. Properties
## that are currently inaccessible include DEVTYPE, DEVNAME, and DEVPATH.
# device_tags = ["ID_FS_TYPE", "ID_FS_USAGE"]
#
## Using the same metadata source as device_tags, you can also customize the
## name of the device via templates.
## The 'name_templates' parameter is a list of templates to try and apply to
## the device. The template may contain variables in the form of '$PROPERTY' or
## '${PROPERTY}'. The first template which does not contain any variables not
## present for the device is used as the device name tag.
## The typical use case is for LVM volumes, to get the VG/LV name instead of
## the near-meaningless DM-0 name.
# name_templates = ["$ID_FS_LABEL","$DM_VG_NAME/$DM_LV_NAME"]
# Get kernel statistics from /proc/stat
# This plugin ONLY supports Linux
[[inputs.kernel]]
# no configuration
# Read metrics about memory usage
[[inputs.mem]]
# no configuration
# Get the number of processes and group them by status
# This plugin ONLY supports non-Windows
[[inputs.processes]]
## Use sudo to run ps command on *BSD systems. Linux systems will read
## /proc, so this does not apply there.
#use_sudo = false
# Read metrics about swap memory usage
[[inputs.swap]]
# no configuration
# Read metrics about system load & uptime
[[inputs.system]]
# no configuration
[[inputs.opcua]]
name = "opcua"
endpoint = "opc.tcp://172.16.90.11:4840"
connect_timeout = "10s"
request_timeout = "1s"
security_policy = "None"
security_mode = "None"
auth_method = "Anonymous"
username = "admin"
password = "wago"
[[inputs.opcua.group]]
namespace ="3"
identifier_type ="s"
nodes = [
{name="Test01", identifier='"DB100"."Test01"'},
]
# Configuration for MQTT server to send metrics to
[[outputs.mqtt]]
## MQTT Brokers
## The list of brokers should only include the hostname or IP address and the
## port to the broker. This should follow the format `[{scheme}://]{host}:{port}`. For
## example, `localhost:1883` or `mqtt://localhost:1883`.
## Scheme can be any of the following: tcp://, mqtt://, tls://, mqtts://
## non-TLS and TLS servers can not be mix-and-matched.
servers = ["localhost:1883", ] # or ["mqtts://tls.example.com:1883"]
## Protocol can be `3.1.1` or `5`. Default is `3.1.1`
# protocol = "3.1.1"
## MQTT Topic for Producer Messages
## MQTT outputs send metrics to this topic format:
## {{ .TopicPrefix }}/{{ .Hostname }}/{{ .PluginName }}/{{ .Tag "tag_key" }}
## (e.g. prefix/web01.example.com/mem/some_tag_value)
## Each path segment accepts either a template placeholder, an environment variable, or a tag key
## of the form `{{.Tag "tag_key_name"}}`. Empty path elements as well as special MQTT characters
## (such as `+` or `#`) are invalid to form the topic name and will lead to an error.
## In case a tag is missing in the metric, that path segment omitted for the final topic.
topic = 'telegraf/{{ .Hostname }}/{{ .PluginName }}/{{ .Tag "opcua" }}'
layout = "field"
## QoS policy for messages
## The mqtt QoS policy for sending messages.
## See https://www.ibm.com/support/knowledgecenter/en/SSFKSJ_9.0.0/com.ibm.mq.dev.doc/q029090_.htm
## 0 = at most once
## 1 = at least once
## 2 = exactly once
# qos = 2
## Keep Alive
## Defines the maximum length of time that the broker and client may not
## communicate. Defaults to 0 which turns the feature off.
##
## For version v2.0.12 and later mosquitto there is a bug
## (see https://github.com/eclipse/mosquitto/issues/2117), which requires
## this to be non-zero. As a reference eclipse/paho.mqtt.golang defaults to 30.
# keep_alive = 0
## username and password to connect MQTT server.
# username = "telegraf"
# password = "metricsmetricsmetricsmetrics"
## client ID
## The unique client id to connect MQTT server. If this parameter is not set
## then a random ID is generated.
client_id = "Test"
## Timeout for write operations. default: 5s
# timeout = "5s"
## Optional TLS Config
# tls_ca = "/etc/telegraf/ca.pem"
# tls_cert = "/etc/telegraf/cert.pem"
# tls_key = "/etc/telegraf/key.pem"
## Use TLS but skip chain & host verification
# insecure_skip_verify = false
## When true, metrics will be sent in one MQTT message per flush. Otherwise,
## metrics are written one metric per MQTT message.
# batch = false
## When true, metric will have RETAIN flag set, making broker cache entries until someone
## actually reads it
# retain = false
## Each data format has its own unique set of configuration options, read
## more about them here:
## https://github.com/influxdata/telegraf/blob/master/docs/DATA_FORMATS_OUTPUT.md
data_format = "influx"
Logs from Telegraf
C:\v\InfluxData\telegraf\telegraf-1.26.1>telegraf.exe --config myconfig.conf
2023-05-10T07:39:31Z I! Loading config file: myconfig.conf
2023-05-10T07:39:31Z I! Starting Telegraf 1.26.1
2023-05-10T07:39:31Z I! Available plugins: 235 inputs, 9 aggregators, 27 processors, 22 parsers, 57 outputs, 2 secret-stores
2023-05-10T07:39:31Z I! Loaded inputs: cpu disk diskio kernel mem mqtt_consumer opcua processes swap system
2023-05-10T07:39:31Z I! Loaded aggregators:
2023-05-10T07:39:31Z I! Loaded processors:
2023-05-10T07:39:31Z I! Loaded secretstores:
2023-05-10T07:39:31Z I! Loaded outputs: influxdb_v2 mqtt
2023-05-10T07:39:31Z I! Tags enabled: host=myhost
2023-05-10T07:39:31Z I! [agent] Config: Interval:1s, Quiet:false, Hostname:"myhost", Flush Interval:1s
2023-05-10T07:39:31Z W! [inputs.kernel] current platform is not supported
2023-05-10T07:39:31Z W! [inputs.processes] Current platform is not supported
2023-05-10T07:39:31Z I! [inputs.mqtt_consumer] Connected [tcp://127.0.0.1:1883]
2023-05-10T07:39:32Z W! [inputs.opcua] Failed to load certificate: open /etc/telegraf/cert.pem: The system cannot find the path specified.
--- here disconnected the network cable
2023-05-10T07:40:15Z E! [inputs.opcua] Error in plugin: RegisterNodes Read failed: The operation could not complete because the client is not connected to the server. StatusBadServerNotConnected (0x800D0000)
2023-05-10T07:40:17Z W! [inputs.opcua] Collection took longer than expected; not complete after interval of 1s
2023-05-10T07:40:18Z W! [inputs.opcua] Collection took longer than expected; not complete after interval of 1s
2023-05-10T07:40:19Z W! [inputs.opcua] Collection took longer than expected; not complete after interval of 1s
2023-05-10T07:40:20Z W! [inputs.opcua] Collection took longer than expected; not complete after interval of 1s
2023-05-10T07:40:21Z W! [inputs.opcua] Collection took longer than expected; not complete after interval of 1s
--- somewhere here put back the network cable
2023-05-10T07:40:22Z W! [inputs.opcua] Collection took longer than expected; not complete after interval of 1s
2023-05-10T07:40:23Z W! [inputs.opcua] Collection took longer than expected; not complete after interval of 1s
2023-05-10T07:40:24Z W! [inputs.opcua] Collection took longer than expected; not complete after interval of 1s
2023-05-10T07:40:25Z W! [inputs.opcua] Collection took longer than expected; not complete after interval of 1s
2023-05-10T07:40:26Z W! [inputs.opcua] Collection took longer than expected; not complete after interval of 1s
2023-05-10T07:40:26Z E! [inputs.opcua] Error in plugin: dial tcp 172.16.90.11:4840: i/o timeout
panic: runtime error: invalid memory address or nil pointer dereference
[signal 0xc0000005 code=0x0 addr=0x10 pc=0x36b3481]
goroutine 879 [running]:
github.com/gopcua/opcua.(*Client).SendWithContext(0x0, {0x78f9420, 0xc00006c100}, {0x78d07a8, 0xc0006e8150}, 0x134d47f?)
/go/pkg/mod/github.com/gopcua/opcua@v0.3.7/client.go:903 +0x61
github.com/gopcua/opcua.(*Client).ReadWithContext(0x5?, {0x78f9420, 0xc00006c100}, 0xc000772a20)
/go/pkg/mod/github.com/gopcua/opcua@v0.3.7/client.go:992 +0x29c
github.com/gopcua/opcua.(*Client).Read(0xffffffffffffffff?, 0xc00077e770?)
/go/pkg/mod/github.com/gopcua/opcua@v0.3.7/client.go:979 +0x2a
github.com/influxdata/telegraf/plugins/inputs/opcua.(*ReadClient).read(0xc000b05a00)
/go/src/github.com/influxdata/telegraf/plugins/inputs/opcua/read_client.go:135 +0x33
github.com/influxdata/telegraf/plugins/inputs/opcua.(*ReadClient).CurrentValues(0xc000b05a00)
/go/src/github.com/influxdata/telegraf/plugins/inputs/opcua/read_client.go:109 +0x71
github.com/influxdata/telegraf/plugins/inputs/opcua.(*OpcUA).Gather(0x19?, {0x7915a60, 0xc000164320})
/go/src/github.com/influxdata/telegraf/plugins/inputs/opcua/opcua.go:38 +0x2e
github.com/influxdata/telegraf/models.(*RunningInput).Gather(0xc000395810, {0x7915a60, 0xc000164320})
/go/src/github.com/influxdata/telegraf/models/running_input.go:126 +0x5a
github.com/influxdata/telegraf/agent.(*Agent).gatherOnce.func1()
/go/src/github.com/influxdata/telegraf/agent/agent.go:576 +0x2e
created by github.com/influxdata/telegraf/agent.(*Agent).gatherOnce
/go/src/github.com/influxdata/telegraf/agent/agent.go:575 +0x12a
C:\v\InfluxData\telegraf\telegraf-1.26.1>
System info
Telegraf 1.26.1, Windows 10 Enterprise 22H2 Build 19045.2846
Docker
No response
Steps to reproduce
- Setup telegraf to work with OPCUA connected (started from command line - not elevated)
- Once it is connected, pull the network cable
- When the W! [inputs.opcua] Collection took longer than expected; appears, after few seconds put back the network cable.
- See how Telegraf totally crasches back to the command prompt
Expected behavior
- no crash, ever
- reconnect opcua automatically
Actual behavior
Telegraf crashes back to the command prompt. IMHO that should never happen
Additional info
No response
About this issue
- Original URL
- State: closed
- Created a year ago
- Comments: 24 (9 by maintainers)
I ran the new version for a few hours today and it looks good. When the connection to one of the two OPCUA servers is lost, telefraf continues to run and tries to re-establish the connection.
When the OPCUA server is online again, the connection is established and it takes some time until all nodes are reachable again.
@srebhan Great work! I tested it with one OPC-UA server active and one turned off. With both no problems occurred.
If you want, I can test tomorrow what happens when an active connection is interrupted.
Please checkout the update #13514 once it’s built…
On the road now but can have a look on the weekend