influxdb: CQ or Backfill using Top() with tag does not create tag in result
System info: 1.0.0-beta3, official docker image
Steps to reproduce:
- Create a measurement with some fields and tags
- Create a continuous query or a backfill query with INTO clause while doing a select that looks like this: select top(summary_field, tag1, 100) as total, tag2, tag3 INTO tops FROM previous_measurement where time < now() GROUP BY tag4, time(1d) fill(none);
Expected behavior: Output should mimic this: total, tag2, tag3 should be become fields, due to the way they’re selected. tag1 and tag4 should be tags.
Actual behavior: Tag1 is created as a field, not a tag.
This behavior is irrelevant if you’re just selecting, however when used as a select INTO, because it is not created as a tag, you don’t get the data you should.
Observe:
> select... INTO...:
name: result
------------
time written
0 5181
> select count(total) from tops;
name: tops
---------------------------
time count
0 100
> show tag keys from tops
name: tops
---------------------------
tagKey
tag4
Because it did not create tag1, the series was not unique enough to accommodate all the data, and the points were considered duplicates, and overwrote one another. This is why there were 5181 points written, but only 100 are available in the destination measurement of the INTO.
I don’t believe this issue is specific to my config. It appears as though this section of code hasn’t been touched much since Feb 2016, and this situation is not covered by any tests in the code or documentation, or google group.
Let me know if I can provide additional information.
About this issue
- Original URL
- State: closed
- Created 8 years ago
- Reactions: 1
- Comments: 21 (13 by maintainers)
The optimization I described above is implemented in #8394.
@kostasb seems to have hit the nail on the head. I had found exactly what he documented in my endeavors to similarly render a “top X by Y over time” graph.
select top(field, tag, number) from measurement where timeexpr group by time(interval)will indeed return the top “number” (one per series) per interval, but the results themselves are a single series. You would then try to add the appropriate tag to the group by:
select top(field, tag, number) from measurement where timeexpr group by time(interval),tagHowever rather than getting the former results broken down into “number” buckets (one per tag) per interval the result is that every tag is returned and not just those resulting from the original top(). Instead of the one-value-per-series up to “number” of series, the results become the top “number” of values per series
There likely isn’t much that Grafana could do with this since there doesn’t see to be a way to get the needed data properly presented from Influxdb.
I’ve done quite a bit of searching and found that there are many that have tried to do similar, and there have been various attempted alternatives/workarounds not generally with success. There are various similar scenarios trying to graph over time the top five processes by CPU, top five processes by memory, top five interfaces by traffic, top five temperatures by city, etc. And those are just the simple/basic cases.
I too am curious if there has been any further discussion on available or planned solutions for this.
+1000
Not being able to do this in Grafana/InfluxDB is killing us right now =>
There are various similar scenarios trying to graph over time the top five processes by CPU, top five processes by memory, top five interfaces by traffic, top five temperatures by city, etc. And those are just the simple/basic cases.
Does anyone have any type of workaround for this? I really don’t want to go back to RDBMS/roll-our-own -viz solution - let me know if we can help somehow.