influxdb: CQ or Backfill using Top() with tag does not create tag in result

System info: 1.0.0-beta3, official docker image

Steps to reproduce:

Create a measurement with some fields and tags
Create a continuous query or a backfill query with INTO clause while doing a select that looks like this: select top(summary_field, tag1, 100) as total, tag2, tag3 INTO tops FROM previous_measurement where time < now() GROUP BY tag4, time(1d) fill(none);

Expected behavior: Output should mimic this: total, tag2, tag3 should be become fields, due to the way they’re selected. tag1 and tag4 should be tags.

Actual behavior: Tag1 is created as a field, not a tag.

This behavior is irrelevant if you’re just selecting, however when used as a select INTO, because it is not created as a tag, you don’t get the data you should.

Observe:

> select... INTO...:
name: result
------------
time    written
0       5181

> select count(total) from tops;
name: tops
---------------------------
time    count
0       100

> show tag keys from tops
name: tops
---------------------------
tagKey
tag4

Because it did not create tag1, the series was not unique enough to accommodate all the data, and the points were considered duplicates, and overwrote one another. This is why there were 5181 points written, but only 100 are available in the destination measurement of the INTO.

I don’t believe this issue is specific to my config. It appears as though this section of code hasn’t been touched much since Feb 2016, and this situation is not covered by any tests in the code or documentation, or google group.

Let me know if I can provide additional information.

About this issue

Original URL
State: closed
Created 8 years ago
Reactions: 1
Comments: 21 (13 by maintainers)

Most upvoted comments

The optimization I described above is implemented in #8394.

jsternberg on May 16, 2017

@kostasb seems to have hit the nail on the head. I had found exactly what he documented in my endeavors to similarly render a “top X by Y over time” graph.

select top(field, tag, number) from measurement where timeexpr group by time(interval)

will indeed return the top “number” (one per series) per interval, but the results themselves are a single series. You would then try to add the appropriate tag to the group by:

select top(field, tag, number) from measurement where timeexpr group by time(interval),tag

However rather than getting the former results broken down into “number” buckets (one per tag) per interval the result is that every tag is returned and not just those resulting from the original top(). Instead of the one-value-per-series up to “number” of series, the results become the top “number” of values per series

There likely isn’t much that Grafana could do with this since there doesn’t see to be a way to get the needed data properly presented from Influxdb.

I’ve done quite a bit of searching and found that there are many that have tried to do similar, and there have been various attempted alternatives/workarounds not generally with success. There are various similar scenarios trying to graph over time the top five processes by CPU, top five processes by memory, top five interfaces by traffic, top five temperatures by city, etc. And those are just the simple/basic cases.

I too am curious if there has been any further discussion on available or planned solutions for this.

WaterByWind on Mar 6, 2017

+1000

Not being able to do this in Grafana/InfluxDB is killing us right now =>

There are various similar scenarios trying to graph over time the top five processes by CPU, top five processes by memory, top five interfaces by traffic, top five temperatures by city, etc. And those are just the simple/basic cases.

Does anyone have any type of workaround for this? I really don’t want to go back to RDBMS/roll-our-own -viz solution - let me know if we can help somehow.

atulc on Mar 11, 2017