telegraf: Outputs.Stackdriver `tags_as_resource_labels` config option doesn't appear to be working as expected
Relevant telegraf.conf
## Relevant Agent Configuration
[inputs.mem]
[inputs.mem.tags]
job = "inputs.mem"
[inputs.processes]
[inputs.processes.tags]
job = "inputs.processes"
[[outputs.opentelemetry]]
service_address = "server:4317"
## Relevant Server Config
## Server (Values that are {{}} are hydrated by a key value store when
## Telegraf is deployed):
[global_tags]
# Configuration for telegraf agent
[agent]
interval = "10s"
round_interval = true
metric_batch_size = 199
metric_buffer_limit = {{.METRIC_BUFFER}}
collection_jitter = "0s"
flush_interval = "10s"
flush_jitter = "0s"
precision = ""
debug = {{.DEBUG}}
quiet = false
# Receive OpenTelemetry traces, metrics, and logs over gRPC
[[inputs.opentelemetry]]
## Override the default (0.0.0.0:4317) destination OpenTelemetry gRPC service
## address:port
service_address = "0.0.0.0:4317"
[[processors.regex]]
# Rename metric names (measurement names) that don't match Prometheus requirements
[[processors.regex.metric_rename]]
pattern = "[^a-zA-Z0-9_]+"
replacement = "_"
# Rename tag keys that don't match Prometheus requirements
[[processors.regex.tag_rename]]
pattern = "[^a-zA-Z0-9_]+"
replacement = "_"
# Rename field keys that don't match Prometheus requirements
[[processors.regex.field_rename]]
pattern = "[^a-zA-Z0-9_]+"
replacement = "_"
# Configuration for sending metrics to GMP
[[outputs.stackdriver]]
project = "{{.PROJECT_ID}}"
resource_type = "prometheus_target"
metric_name_format = "official"
metric_data_type = "double"
metric_type_prefix = "prometheus.googleapis.com"
tags_as_resource_label = ["instance", "job"]
# Ignore metrics from inputs.internal
namedrop = ["internal_*"]
[outputs.stackdriver.resource_labels]
cluster = "{{.CLUSTER_NAME}}"
job = "Telegraf"
instance = "{{.CLUSTER_NAME}}"
location = "{{.LOCATION}}"
namespace = "{{.NAMESPACE_LABEL}}"
Logs from Telegraf
The logs are unremarkable, but when outputting what the agent is sending to outputs.stackdriver using outputs.file I see that the tags are set correctly.
Example:
Memory metric has job=inputs.mem
Processes metric has job=inputs.processes
mem,env=Production_MacOS,host=hostname,instance=hostname,job=inputs.mem active=965160960i,available=1003286528i,used_percent=94.1601037979126,available_percent=5.839896202087402,inactive=954462208i,wired=693956608i,total=17179869184i,used=16176582656i,free=48824320i 1694460660000000000
processes,env=Production_MacOS,host=hostname,instance=hostname,job=inputs.processes blocked=0i,zombies=1i,stopped=0i,running=3i,sleeping=428i,total=432i,unknown=0i,idle=0i 1694460660000000000
But when I look at the metrics in Google or Grafana I notice that it looks like the metrics are sometimes tagged with another inputs tag, IE job=inputs.mem
but for a processes metric. From what I can tell it’s usually the busiest metric. Additionally, it appears to be the last metric’s tag
that is sent is the one that is used for all the entire batch of metrics. Including metrics that don’t contain a job tag
. Meaning that the default value isn’t applied, instead whatever the last value for the tag
is used.
### System info
1.27.4
### Docker
_No response_
Steps to reproduce
- Send metrics from a Telegraf client with a tag called job configured for each input plugin to a Telegraf server
- Configure the Telegraf server such that it uses
tags_as_resource_labels
forjob
tag. - Send metrics to Google …
Expected behavior
Each metric will have their job tag (or any other applicable tag) applied as a resource label.
Actual behavior
The tag appears to change depending on what the last tag was sent. As you can see in the screenshot the job tag changed between inputs.mem
and inputs.processes
, even though the sending agent didn’t change. Here’s an example of it changing applying a mismatched job
label on two different inputs:
What also seems weird is that you can stop sending a tag altogether and the last sent tag is what continues to be sent instead of the default value:
[inputs.mem]
# [inputs.mem.tags]
# job = "inputs.mem"
[inputs.processes]
# [inputs.processes.tags]
# job = "inputs.processes"
Telegraf Client outputs.file
output showing no job
tag:
mem,env=Production_MacOS,host=hostname,instance=hostname total=17179869184i,available_percent=5.273199081420898,active=885673984i,free=24256512i,wired=659767296i,available=905928704i,used=16273940480i,used_percent=94.7268009185791,inactive=881672192i 1694532100000000000
Additional info
No response
About this issue
- Original URL
- State: closed
- Created 10 months ago
- Comments: 41 (41 by maintainers)
I lied, I couldn’t wait, lol.
Initial testing looks good. I will have to do a deep dive of the metrics, but so far so good!
Well the debugger might be our savior here. Makes me wonder who is actually using this plugin besides you 😉
yep!
heh the best type of debugging 😃
I wondered, but nothing catches my eye. Once we create the time series we add it to bucket for each metric. Then we split the buckets into 200 at a time and send them, see here.
Let me know what you hear.
See #13912 artifacts in 20-30mins