kapacitor: Simple TICKscript that will always error
Duplicate of https://github.com/influxdata/telegraf/issues/2444
Bug report
Hey. For some reason, the system measurement (load1,5,15, uptime, etc) is sent by telegraf as two distinct lines.
system,host=TICKAlerta load1=0,load5=0.03,load15=0.03,n_users=1i,n_cpus=2i 1487579590000000000 system,host=TICKAlerta uptime_format=" 0:13",uptime=807i 1487579590000000000This essentially breaks stream processing with kapacitor of system measurement, as relevant field is missing 50% of the times.
E! error evaluating expression for level CRITICAL: no field or tag exists for load1This can be verified while looking into datapoints, as seen by kapacitor
{ "Name": "system", "Database": "telegraf", "RetentionPolicy": "default", "Group": "host=host1.ex", "Dimensions": { "ByName": false, "TagNames": [ "host" ] }, "Tags": { "environment": "offsite", "host": "host1.ex", "osname": "Ubuntu", "virtual": "physical" }, "Fields": { "load1": 0, "load15": 0.05, "load5": 0.01, "n_cpus": 4, "n_users": 0 }, "Time": "2017-02-14T12:38:40Z" } { "Name": "system", "Database": "telegraf", "RetentionPolicy": "default", "Group": "host=host1.ex", "Dimensions": { "ByName": false, "TagNames": [ "host" ] }, "Tags": { "environment": "offsite", "host": "host1.ex", "osname": "Ubuntu", "virtual": "physical" }, "Fields": { "uptime": 5278035, "uptime_format": "61 days, 2:07" }, "Time": "2017-02-14T12:38:40Z" }Relevant telegraf.conf:
[global_tags] [agent] interval = "10s" round_interval = true metric_batch_size = 1000 metric_buffer_limit = 10000 collection_jitter = "0s" flush_interval = "10s" flush_jitter = "0s" precision = "" debug = false quiet = false logfile = "" hostname = "" omit_hostname = false [[outputs.file]] files = ["stdout", "/tmp/metrics.out"] data_format = "influx" [[inputs.system]] # no configurationSystem info:
As far as I know, present in telegraf versions 1.0, 1.1 and 1.2. Tested on Ubuntu and Debian LTS versions (precise, trusty, xenial, jessie).
Steps to reproduce:
Telegraf
Use the included telegraf config file.
telegraf --config telegraf.conf --debug cat /tmp/metrics.outKapacitor
var warn_threshold = 4 var crit_threshold = 10 var period = 1h var every = 1m var data = stream |from() .database('telegraf') .retentionPolicy('default') .measurement('system') .groupBy('host') |log() |window() .period(period) .every(every) |last('load1') .as('stat')grep load1 /var/log/kapacitor/kapacitor.logExpected behavior:
Single line for system measurement.
Actual behavior:
Two distinct lines for system measurement
Since the data for a series is split into two lines, InfluxDB forwards it on to Kapacitor as two lines and therefore its possible to write a TICKscript that will perpetually fail
var data = stream
|from()
.measurement('system')
|eval(lambda: "load1" / "uptime")
.as('ratio')
with
kapacitor define example -tick example.tick -dbrp telegraf.autogen -type stream
kapacitor enable example
will perpetually error out, even though all of the data would exist.
As I see it, there are four ways to solve this problem.
- Kapacitor can do deduplication of incoming lines with common series key and timestamp.
- InfluxDB can do deduplication of data it reports in subscriptions for lines with common series key and timestamp.
- Telegraf can report data as a single line
- We can provide documentation that notes this as a rough edge and give users a work around.
I had initially directed @markuskont to opening an issue on Telegraf, but there’s definitely more than one way to solve this issue.
About this issue
- Original URL
- State: open
- Created 7 years ago
- Reactions: 4
- Comments: 21 (10 by maintainers)
I think there’s a larger question here about how/where we should talk about these kinds of issues that have cross platform implications.
IMO buffering data like this should not be the default functionality. It should be opt-in, we could add a
dedupenode that wants a stream and provides a stream where there is a specific window of time where the data will be buffered.