kapacitor: unbounded memory growth to OOM with relatively simple scripts on version 1.5.7

Overview

I’m trying to leverage Kapacitor 1.5.7 on Linux/amd64 for context aware traffic alerting in our multi-tenant commerce system at our ingress points for traffic spikes, etc… Typically this means data collection on two (optionally three – in this example the data is sent but not grouped) fields:

  1. The IP of the requester
  2. The ID of the store for which the request is destined
  3. (optionally) The URI

We collect and report on the data in 1-2 minute windows and don’t care about any data outside such window. That is, if the point in question is over 2 minutes old, it should be expired and expunged. Kapacitor is a standalone Influx product for this use case – there is no InfluxDB instance for this data, no retention policies, etc… Data is transmitted to Kapacitor via the UDP listener.

The format of the message is the following:

combined,uri=/path/to/some/product,id=123456789,ip=127.0.0.1,role=ingresstype count=1

As shown in singled-out stream stats later detailed in this issue, the cardinality of each of the aforementioned fields is roughly:

  • uri - unknown, medium
  • id - 20-30k within a minute window, avg
  • ip - 40-60k within a minute window, avg
  • role - at most 3

The count parameter exists so that we can run a mathematical sum operation on the data in the pipe. However, this is redundant because each request generates its own message, its own point, always.

We found Kapacitor struggling with unbounded memory growth in our Production systems, something we did not observe in other (non live traffic) environments. Our initial response to these uncontrollable runaway memory situations was to examine and reduce the cardinality of sets, particularly group by operations on streams. We initially tried reporting on the IP address, the Store ID, and the URI. These are all relatively high cardinality fields, but putting them together in an ordered group by wasn’t helping with efficient and unbounded use of memory. So, we paired things back to the following tick script where the uri is dropped from the equation:

dbrp "toptraffic"."autogen"

var streamCounts = stream
    |from()
        .groupBy('ip', 'id')
        .measurement('combined')
    |barrier()
        .period(1m)
        .delete(TRUE)
    |window()
        .period(1m)
        .every(5s)
        .align()
    |sum('count')
        .as('totalCount')

streamCounts
    |alert()
        .flapping(0.25, 0.5)
        .history(21)
        .warn(lambda: "totalCount" > 17500)
        .crit(lambda: "totalCount" > 22500)
        .message('''Observed  {{ index .Fields "totalCount" }} requests to Production Store ID {{ index .Tags "id" }} for IP {{ index .Tags "ip" }} within the last minute.''')
        .noRecoveries()
        .stateChangesOnly(5m)
        .slack()
        .channel('#ops-noise')

streamCounts
    |alert()
        .flapping(0.25, 0.5)
        .history(21)
        .warn(lambda: "totalCount" > 17500)
        .crit(lambda: "totalCount" > 22500)
        .message('''Observed  {{ index .Fields "totalCount" }} requests to Production Store ID {{ index .Tags "id" }} for IP {{ index .Tags "ip" }} within the last minute.''')
        .stateChangesOnly(5m)
        .exec('/usr/bin/kapacitor_pubsub_stdin_invoker.sh')
        .log('/var/log/kapacitor/alerts.log')

The script is straight forward enough; we group on the stream by ip, then id, from the combined measurement. A barrier exists to delete data after one minute. These operations are assigned to a stream variable which is used in alerting to do different things (at the same threshold).

The dot graph and sample output of that tick script while running renders:

DOT:
digraph top_combined {
graph [throughput="14278.31 points/s"];

stream0 [avg_exec_time_ns="0s" errors="0" working_cardinality="0" ];
stream0 -> from1 [processed="518885257"];

from1 [avg_exec_time_ns="43.931µs" errors="0" working_cardinality="0" ];
from1 -> barrier2 [processed="518885257"];

barrier2 [avg_exec_time_ns="65.018µs" errors="0" working_cardinality="93843" ];
barrier2 -> window3 [processed="518865870"];

window3 [avg_exec_time_ns="107.889µs" errors="0" working_cardinality="93843" ];
window3 -> sum4 [processed="145863177"];

sum4 [avg_exec_time_ns="148.928µs" errors="0" working_cardinality="33327" ];
sum4 -> alert6 [processed="145863177"];
sum4 -> alert5 [processed="145863177"];

alert6 [alerts_inhibited="0" alerts_triggered="0" avg_exec_time_ns="53.506µs" crits_triggered="0" errors="0" infos_triggered="0" oks_triggered="0" warns_triggered="0" working_cardinality="33327" ];

alert5 [alerts_inhibited="0" alerts_triggered="0" avg_exec_time_ns="63.915µs" crits_triggered="0" errors="0" infos_triggered="0" oks_triggered="0" warns_triggered="0" working_cardinality="33327" ];
}

As previously mentioned, what we saw with this is that over time (pretty quickly) we ran out of memory. The following graph shows various tweaks to the aforementioned script changing things like the window and barrier periods didn’t seem to make any difference to how fast the script/pipeline/Kapacitor consumed memory.

Screen Shot 2021-01-27 at 3 25 00 PM

The various spikes in memory show me altering the tick script, removing the window, removing the barrier, changing the barrier from idle to period, changing the time of the barrier tick / window, etc… During these iterations I collected data. The data below is from engagement of the aforementioned tick script, with only changes to the window and barrier periods.

Heap dumps show the following for in use objects:

go tool pprof -inuse_objects --text kapacitord top_combined/heap\?debug=1
File: kapacitord
Type: inuse_objects
Showing nodes accounting for 255642703, 97.39% of 262504092 total
Dropped 98 nodes (cum <= 1312520)
      flat  flat%   sum%        cum   cum%
  47093992 17.94% 17.94%   47094003 17.94%  time.NewTicker
  35101126 13.37% 31.31%   35101126 13.37%  github.com/influxdata/kapacitor/vendor/github.com/influxdata/influxdb/models.Tags.Map
  34180645 13.02% 44.33%   34180645 13.02%  github.com/influxdata/kapacitor/edge.(*pointMessage).GroupInfo
  31006029 11.81% 56.14%   78231108 29.80%  github.com/influxdata/kapacitor.newPeriodicBarrier
  17670196  6.73% 62.88%   17670196  6.73%  github.com/influxdata/kapacitor/vendor/github.com/influxdata/influxdb/models.(*point).unmarshalBinary
  15859954  6.04% 68.92%   94091062 35.84%  github.com/influxdata/kapacitor.(*BarrierNode).newBarrier
  15840535  6.03% 74.95%   15840535  6.03%  github.com/influxdata/kapacitor.(*periodicBarrier).emitBarrier
  15281270  5.82% 80.77%   22687063  8.64%  github.com/influxdata/kapacitor/models.ToGroupID
  11141290  4.24% 85.02%   11141290  4.24%  github.com/influxdata/kapacitor/vendor/github.com/influxdata/influxdb/models.(*point).Name
   7405793  2.82% 87.84%    7405793  2.82%  strings.(*Builder).WriteRune
   5028186  1.92% 89.75%    5028186  1.92%  github.com/influxdata/kapacitor/vendor/github.com/influxdata/influxdb/influxql.encodeTags
   3452307  1.32% 91.07%    8480493  3.23%  github.com/influxdata/kapacitor.convertFloatPoint
   1977093  0.75% 91.82%    3763058  1.43%  net.(*UDPConn).readFrom
   1835036   0.7% 92.52%    4403938  1.68%  github.com/influxdata/kapacitor/vendor/github.com/influxdata/influxdb/models.ParsePointsWithPrecision
   1818726  0.69% 93.21%    5581784  2.13%  github.com/influxdata/kapacitor/services/udp.(*Service).serve
   1785965  0.68% 93.89%    1785965  0.68%  syscall.anyToSockaddr
   1766316  0.67% 94.57%    2568902  0.98%  github.com/influxdata/kapacitor/vendor/github.com/influxdata/influxdb/models.parsePoint
   1758629  0.67% 95.24%    1758629  0.67%  github.com/influxdata/kapacitor/edge.BatchPointFromPoint
   1729659  0.66% 95.90%    1942655  0.74%  github.com/influxdata/kapacitor/edge.NewPointMessage
   1682284  0.64% 96.54%    1682284  0.64%  github.com/influxdata/kapacitor/vendor/github.com/influxdata/influxdb/models.parseTags
    769871  0.29% 96.83%    1687514  0.64%  github.com/influxdata/kapacitor/edge.(*statsEdge).incCollected
    766201  0.29% 97.12%    2120657  0.81%  github.com/influxdata/kapacitor.(*AlertNode).renderID
    163847 0.062% 97.19%    1429628  0.54%  github.com/influxdata/kapacitor.(*AlertNode).NewGroup
    158393  0.06% 97.25%   15998928  6.09%  github.com/influxdata/kapacitor.(*periodicBarrier).periodicEmitter
    103541 0.039% 97.28%    1862170  0.71%  github.com/influxdata/kapacitor.(*windowTimeBuffer).points
     98309 0.037% 97.32%    2690768  1.03%  github.com/influxdata/kapacitor.(*windowByTime).batch
     85587 0.033% 97.35%   97154573 37.01%  github.com/influxdata/kapacitor/edge.(*groupedConsumer).getOrCreateGroup
     81923 0.031% 97.39%   94522520 36.01%  github.com/influxdata/kapacitor.(*BarrierNode).NewGroup
         0     0% 97.39%    3412833  1.30%  github.com/influxdata/kapacitor.(*AlertNode).runAlert
         0     0% 97.39%  126742493 48.28%  github.com/influxdata/kapacitor.(*BarrierNode).runBarrierEmitter
         0     0% 97.39%   23164149  8.82%  github.com/influxdata/kapacitor.(*FromNode).Point
         0     0% 97.39%   23619547  9.00%  github.com/influxdata/kapacitor.(*FromNode).runStream
         0     0% 97.39%   10632284  4.05%  github.com/influxdata/kapacitor.(*InfluxQLNode).runInfluxQL
         0     0% 97.39%   67208031 25.60%  github.com/influxdata/kapacitor.(*TaskMaster).WritePoints
         0     0% 97.39%    4753986  1.81%  github.com/influxdata/kapacitor.(*WindowNode).runWindow
         0     0% 97.39%    1420015  0.54%  github.com/influxdata/kapacitor.(*alertState).Point
         0     0% 97.39%    8480493  3.23%  github.com/influxdata/kapacitor.(*floatPointAggregator).AggregatePoint
         0     0% 97.39%    8824603  3.36%  github.com/influxdata/kapacitor.(*influxqlGroup).BatchPoint
         0     0% 97.39%  169161143 64.44%  github.com/influxdata/kapacitor.(*node).start.func1
         0     0% 97.39%    2230930  0.85%  github.com/influxdata/kapacitor.(*windowByTime).Point
         0     0% 97.39%  169160555 64.44%  github.com/influxdata/kapacitor/edge.(*consumer).Consume
         0     0% 97.39%    8824603  3.36%  github.com/influxdata/kapacitor/edge.(*forwardingReceiver).BatchPoint
         0     0% 97.39%    1360800  0.52%  github.com/influxdata/kapacitor/edge.(*forwardingReceiver).EndBatch
         0     0% 97.39%   27898115 10.63%  github.com/influxdata/kapacitor/edge.(*forwardingReceiver).Point
         0     0% 97.39%    1687514  0.64%  github.com/influxdata/kapacitor/edge.(*forwardingReceiver).forward
         0     0% 97.39%   10632284  4.05%  github.com/influxdata/kapacitor/edge.(*groupedConsumer).BufferedBatch
         0     0% 97.39%  145541008 55.44%  github.com/influxdata/kapacitor/edge.(*groupedConsumer).Consume
         0     0% 97.39%  134249262 51.14%  github.com/influxdata/kapacitor/edge.(*groupedConsumer).Point
         0     0% 97.39%   22129995  8.43%  github.com/influxdata/kapacitor/edge.(*pointMessage).SetDimensions
         0     0% 97.39%    1623485  0.62%  github.com/influxdata/kapacitor/edge.(*streamStatsEdge).Collect
         0     0% 97.39%    8824603  3.36%  github.com/influxdata/kapacitor/edge.(*timedForwardReceiver).BatchPoint
         0     0% 97.39%   26815094 10.22%  github.com/influxdata/kapacitor/edge.(*timedForwardReceiver).Point
         0     0% 97.39%   10632284  4.05%  github.com/influxdata/kapacitor/edge.receiveBufferedBatch
         0     0% 97.39%   71611969 27.28%  github.com/influxdata/kapacitor/services/udp.(*Service).processPackets
         0     0% 97.39%   17670196  6.73%  github.com/influxdata/kapacitor/vendor/github.com/influxdata/influxdb/models.(*point).Fields
         0     0% 97.39%    1682284  0.64%  github.com/influxdata/kapacitor/vendor/github.com/influxdata/influxdb/models.(*point).Tags
         0     0% 97.39%    4403938  1.68%  github.com/influxdata/kapacitor/vendor/github.com/influxdata/influxdb/models.ParsePoints
         0     0% 97.39%    1785965  0.68%  internal/poll.(*FD).ReadFrom
         0     0% 97.39%    3763058  1.43%  net.(*UDPConn).ReadFromUDP
         0     0% 97.39%    1785965  0.68%  net.(*netFD).readFrom
         0     0% 97.39%  262393459   100%  runtime.goexit
         0     0% 97.39%    1785965  0.68%  syscall.Recvfrom

and for in use space:

go tool pprof --text kapacitord top_combined/heap\?debug=1 
File: kapacitord
Type: inuse_space
Showing nodes accounting for 19557.32MB, 97.33% of 20093.60MB total
Dropped 99 nodes (cum <= 100.47MB)
      flat  flat%   sum%        cum   cum%
 5447.82MB 27.11% 27.11%  5447.82MB 27.11%  github.com/influxdata/kapacitor/edge.(*pointMessage).GroupInfo
 4027.07MB 20.04% 47.15%  7133.04MB 35.50%  github.com/influxdata/kapacitor.newPeriodicBarrier
 3100.24MB 15.43% 62.58%  3101.97MB 15.44%  time.NewTicker
 1270.70MB  6.32% 68.91%  1270.70MB  6.32%  github.com/influxdata/kapacitor/vendor/github.com/influxdata/influxdb/models.Tags.Map
  964.19MB  4.80% 73.71%   964.19MB  4.80%  github.com/influxdata/kapacitor/vendor/github.com/influxdata/influxdb/models.(*point).unmarshalBinary
  886.50MB  4.41% 78.12%  8272.54MB 41.17%  github.com/influxdata/kapacitor.(*BarrierNode).NewGroup
  485.13MB  2.41% 80.53%   546.14MB  2.72%  github.com/influxdata/kapacitor/vendor/github.com/influxdata/influxdb/models.parsePoint
  485.01MB  2.41% 82.95%   485.01MB  2.41%  github.com/influxdata/kapacitor.(*periodicBarrier).emitBarrier
  460.51MB  2.29% 85.24%   686.52MB  3.42%  github.com/influxdata/kapacitor/models.ToGroupID
  305.03MB  1.52% 86.75%   602.55MB  3.00%  github.com/influxdata/kapacitor.convertFloatPoint
  297.52MB  1.48% 88.24%   297.52MB  1.48%  github.com/influxdata/kapacitor/vendor/github.com/influxdata/influxdb/influxql.encodeTags
     242MB  1.20% 89.44%  7375.04MB 36.70%  github.com/influxdata/kapacitor.(*BarrierNode).newBarrier
  237.53MB  1.18% 90.62%   242.53MB  1.21%  github.com/influxdata/kapacitor/edge.NewPointMessage
  226.01MB  1.12% 91.75%   226.01MB  1.12%  strings.(*Builder).WriteRune
  194.03MB  0.97% 92.71%   194.03MB  0.97%  github.com/influxdata/kapacitor/vendor/github.com/influxdata/influxdb/models.parseTags
     170MB  0.85% 93.56%      170MB  0.85%  github.com/influxdata/kapacitor/vendor/github.com/influxdata/influxdb/models.(*point).Name
  142.02MB  0.71% 94.27%   142.02MB  0.71%  github.com/influxdata/kapacitor/edge.(*pointMessage).ShallowCopy
  119.01MB  0.59% 94.86%   318.52MB  1.59%  github.com/influxdata/kapacitor/services/udp.(*Service).serve
  109.01MB  0.54% 95.40%   109.01MB  0.54%  syscall.anyToSockaddr
   97.69MB  0.49% 95.89%   239.72MB  1.19%  github.com/influxdata/kapacitor/edge.(*statsEdge).incCollected
   90.50MB  0.45% 96.34%   199.51MB  0.99%  net.(*UDPConn).readFrom
   56.50MB  0.28% 96.62%   106.51MB  0.53%  github.com/influxdata/kapacitor.(*AlertNode).renderID
   55.12MB  0.27% 96.89%  8495.18MB 42.28%  github.com/influxdata/kapacitor/edge.(*groupedConsumer).getOrCreateGroup
   32.18MB  0.16% 97.05%   112.69MB  0.56%  github.com/influxdata/kapacitor.(*windowTimeBuffer).points
      28MB  0.14% 97.19%   574.14MB  2.86%  github.com/influxdata/kapacitor/vendor/github.com/influxdata/influxdb/models.ParsePointsWithPrecision
   14.50MB 0.072% 97.26%   499.52MB  2.49%  github.com/influxdata/kapacitor.(*periodicBarrier).periodicEmitter
    7.50MB 0.037% 97.30%   101.01MB   0.5%  github.com/influxdata/kapacitor.(*AlertNode).NewGroup
       6MB  0.03% 97.33%   152.19MB  0.76%  github.com/influxdata/kapacitor.(*windowByTime).batch
         0     0% 97.33%   271.99MB  1.35%  github.com/influxdata/kapacitor.(*AlertNode).runAlert
         0     0% 97.33% 13426.72MB 66.82%  github.com/influxdata/kapacitor.(*BarrierNode).runBarrierEmitter
         0     0% 97.33%   815.04MB  4.06%  github.com/influxdata/kapacitor.(*FromNode).Point
         0     0% 97.33%   884.96MB  4.40%  github.com/influxdata/kapacitor.(*FromNode).runStream
         0     0% 97.33%   840.02MB  4.18%  github.com/influxdata/kapacitor.(*InfluxQLNode).runInfluxQL
         0     0% 97.33%  2820.44MB 14.04%  github.com/influxdata/kapacitor.(*TaskMaster).WritePoints
         0     0% 97.33%   411.47MB  2.05%  github.com/influxdata/kapacitor.(*WindowNode).runWindow
         0     0% 97.33%   602.55MB  3.00%  github.com/influxdata/kapacitor.(*floatPointAggregator).AggregatePoint
         0     0% 97.33%   649.56MB  3.23%  github.com/influxdata/kapacitor.(*influxqlGroup).BatchPoint
         0     0% 97.33% 15835.15MB 78.81%  github.com/influxdata/kapacitor.(*node).start.func1
         0     0% 97.33%   150.93MB  0.75%  github.com/influxdata/kapacitor.(*windowByTime).Point
         0     0% 97.33% 15829.73MB 78.78%  github.com/influxdata/kapacitor/edge.(*consumer).Consume
         0     0% 97.33%   649.56MB  3.23%  github.com/influxdata/kapacitor/edge.(*forwardingReceiver).BatchPoint
         0     0% 97.33%   153.94MB  0.77%  github.com/influxdata/kapacitor/edge.(*forwardingReceiver).EndBatch
         0     0% 97.33%  1192.77MB  5.94%  github.com/influxdata/kapacitor/edge.(*forwardingReceiver).Point
         0     0% 97.33%   239.72MB  1.19%  github.com/influxdata/kapacitor/edge.(*forwardingReceiver).forward
         0     0% 97.33%   840.02MB  4.18%  github.com/influxdata/kapacitor/edge.(*groupedConsumer).BufferedBatch
         0     0% 97.33% 14944.77MB 74.38%  github.com/influxdata/kapacitor/edge.(*groupedConsumer).Consume
         0     0% 97.33% 14072.25MB 70.03%  github.com/influxdata/kapacitor/edge.(*groupedConsumer).Point
         0     0% 97.33%   673.02MB  3.35%  github.com/influxdata/kapacitor/edge.(*pointMessage).SetDimensions
         0     0% 97.33%   230.77MB  1.15%  github.com/influxdata/kapacitor/edge.(*streamStatsEdge).Collect
         0     0% 97.33%   649.56MB  3.23%  github.com/influxdata/kapacitor/edge.(*timedForwardReceiver).BatchPoint
         0     0% 97.33%  1037.97MB  5.17%  github.com/influxdata/kapacitor/edge.(*timedForwardReceiver).Point
         0     0% 97.33%   840.02MB  4.18%  github.com/influxdata/kapacitor/edge.receiveBufferedBatch
         0     0% 97.33%  3394.58MB 16.89%  github.com/influxdata/kapacitor/services/udp.(*Service).processPackets
         0     0% 97.33%   964.19MB  4.80%  github.com/influxdata/kapacitor/vendor/github.com/influxdata/influxdb/models.(*point).Fields
         0     0% 97.33%   194.03MB  0.97%  github.com/influxdata/kapacitor/vendor/github.com/influxdata/influxdb/models.(*point).Tags
         0     0% 97.33%   574.14MB  2.86%  github.com/influxdata/kapacitor/vendor/github.com/influxdata/influxdb/models.ParsePoints
         0     0% 97.33%   109.01MB  0.54%  internal/poll.(*FD).ReadFrom
         0     0% 97.33%   199.51MB  0.99%  net.(*UDPConn).ReadFromUDP
         0     0% 97.33%   109.01MB  0.54%  net.(*netFD).readFrom
         0     0% 97.33% 20051.80MB 99.79%  runtime.goexit
         0     0% 97.33%   109.01MB  0.54%  syscall.Recvfrom

A profile dump at roughly the same time shows:

go tool pprof --text kapacitord top_combined/profile      
File: kapacitord
Type: cpu
Time: Jan 25, 2021 at 7:32pm (PST)
Duration: 30.17s, Total samples = 43.72s (144.91%)
Showing nodes accounting for 34.25s, 78.34% of 43.72s total
Dropped 359 nodes (cum <= 0.22s)
      flat  flat%   sum%        cum   cum%
     3.20s  7.32%  7.32%      3.60s  8.23%  syscall.Syscall6
     2.96s  6.77% 14.09%      2.96s  6.77%  runtime.futex
     2.45s  5.60% 19.69%      2.45s  5.60%  runtime.epollwait
     1.87s  4.28% 23.97%      1.87s  4.28%  runtime.usleep
     1.54s  3.52% 27.49%      2.13s  4.87%  runtime.mapaccess2_faststr
     1.30s  2.97% 30.47%      6.20s 14.18%  runtime.mallocgc
     1.28s  2.93% 33.39%      1.28s  2.93%  runtime.nextFreeFast
     0.94s  2.15% 35.54%      1.18s  2.70%  runtime.heapBitsSetType
     0.94s  2.15% 37.69%      0.94s  2.15%  runtime.memclrNoHeapPointers
     0.88s  2.01% 39.71%      0.88s  2.01%  runtime.memmove
     0.86s  1.97% 41.67%      0.91s  2.08%  runtime.lock
     0.80s  1.83% 43.50%      3.64s  8.33%  runtime.selectgo
     0.69s  1.58% 45.08%      0.69s  1.58%  memeqbody
     0.64s  1.46% 46.55%      0.67s  1.53%  runtime.unlock
     0.60s  1.37% 47.92%      0.61s  1.40%  runtime.(*itabTableType).find
     0.50s  1.14% 49.06%      6.86s 15.69%  runtime.findrunnable
     0.46s  1.05% 50.11%      0.46s  1.05%  github.com/influxdata/kapacitor/vendor/github.com/influxdata/influxdb/models.scanLine
     0.45s  1.03% 51.14%      0.46s  1.05%  time.now
     0.42s  0.96% 52.10%      1.62s  3.71%  runtime.mapassign_faststr
     0.37s  0.85% 52.95%      0.37s  0.85%  aeshashbody
     0.33s  0.75% 53.71%      0.33s  0.75%  github.com/influxdata/kapacitor/edge.(*pointMessage).Fields
     0.33s  0.75% 54.46%      0.33s  0.75%  github.com/influxdata/kapacitor/vendor/github.com/influxdata/influxdb/models.scanTo
     0.32s  0.73% 55.19%      0.38s  0.87%  runtime.mapiternext
     0.29s  0.66% 55.86%      0.29s  0.66%  runtime.(*waitq).dequeue
     0.26s  0.59% 56.45%      3.93s  8.99%  runtime.newobject
     0.24s  0.55% 57.00%      3.44s  7.87%  github.com/influxdata/kapacitor/edge.(*streamStatsEdge).Collect
     0.24s  0.55% 57.55%      1.06s  2.42%  runtime.(*mcentral).cacheSpan
     0.24s  0.55% 58.10%      0.24s  0.55%  runtime.casgstatus
     0.24s  0.55% 58.65%      0.24s  0.55%  sync.(*RWMutex).RLock
     0.23s  0.53% 59.17%      0.23s  0.53%  github.com/influxdata/kapacitor/vendor/github.com/influxdata/influxdb/models.scanTagsValue
     0.23s  0.53% 59.70%      0.84s  1.92%  runtime.getitab
     0.23s  0.53% 60.22%      1.92s  4.39%  runtime.runqgrab
     0.22s   0.5% 60.73%      5.38s 12.31%  github.com/influxdata/kapacitor/edge.(*forwardingReceiver).Point
     0.21s  0.48% 61.21%      1.80s  4.12%  github.com/influxdata/kapacitor/edge.(*forwardingReceiver).forward
     0.21s  0.48% 61.69%      0.52s  1.19%  runtime.mapaccess1
     0.21s  0.48% 62.17%      2.73s  6.24%  runtime.netpoll
     0.20s  0.46% 62.63%      3.81s  8.71%  github.com/influxdata/kapacitor/edge.(*timedForwardReceiver).Point
     0.20s  0.46% 63.08%      1.56s  3.57%  github.com/influxdata/kapacitor/vendor/github.com/influxdata/influxdb/influxql.encodeTags
     0.20s  0.46% 63.54%      0.27s  0.62%  sync.(*RWMutex).Unlock
     0.19s  0.43% 63.98%      0.27s  0.62%  runtime.mapaccess1_faststr
     0.18s  0.41% 64.39%      1.44s  3.29%  github.com/influxdata/kapacitor/vendor/github.com/influxdata/influxdb/models.scanKey
     0.18s  0.41% 64.80%      0.97s  2.22%  runtime.sellock
     0.15s  0.34% 65.14%         3s  6.86%  github.com/influxdata/kapacitor.convertFloatPoint
     0.15s  0.34% 65.48%      1.17s  2.68%  github.com/influxdata/kapacitor/edge.(*statsEdge).incCollected
     0.15s  0.34% 65.83%      0.77s  1.76%  github.com/influxdata/kapacitor/vendor/github.com/influxdata/influxdb/models.parseTags
     0.14s  0.32% 66.15%      0.37s  0.85%  syscall.anyToSockaddr
     0.13s   0.3% 66.45%      0.72s  1.65%  github.com/influxdata/kapacitor/edge.(*pointMessage).GroupInfo
     0.13s   0.3% 66.74%      3.07s  7.02%  github.com/influxdata/kapacitor/vendor/github.com/influxdata/influxdb/models.ParsePointsWithPrecision
     0.13s   0.3% 67.04%      0.71s  1.62%  runtime.assertI2I2
     0.13s   0.3% 67.34%      0.92s  2.10%  runtime.slicebytetostring
     0.12s  0.27% 67.61%      0.54s  1.24%  github.com/influxdata/kapacitor/edge.NewPointMessage
     0.12s  0.27% 67.89%      1.54s  3.52%  github.com/influxdata/kapacitor/vendor/github.com/influxdata/influxdb/models.Tags.Map
     0.12s  0.27% 68.16%      0.42s  0.96%  runtime.makemap
     0.12s  0.27% 68.44%      0.40s  0.91%  runtime.mapiterinit
     0.12s  0.27% 68.71%      0.33s  0.75%  runtime.memhash
     0.11s  0.25% 68.96%      5.96s 13.63%  github.com/influxdata/kapacitor.(*TaskMaster).WritePoints
     0.11s  0.25% 69.21%      1.52s  3.48%  github.com/influxdata/kapacitor/edge.(*channelEdge).Emit
     0.10s  0.23% 69.44%      2.22s  5.08%  github.com/influxdata/kapacitor/edge.(*streamStatsEdge).Emit
     0.10s  0.23% 69.67%      0.33s  0.75%  github.com/influxdata/kapacitor/expvar.(*Map).Add
     0.10s  0.23% 69.90%      9.62s 22.00%  github.com/influxdata/kapacitor/services/udp.(*Service).processPackets
     0.10s  0.23% 70.13%      0.42s  0.96%  github.com/influxdata/kapacitor/vendor/github.com/influxdata/influxdb/models.scanTags
     0.09s  0.21% 70.33%      6.89s 15.76%  github.com/influxdata/kapacitor/services/udp.(*Service).serve
     0.09s  0.21% 70.54%      5.10s 11.67%  net.(*UDPConn).readFrom
     0.09s  0.21% 70.75%      7.64s 17.47%  runtime.schedule
     0.08s  0.18% 70.93%      1.28s  2.93%  github.com/influxdata/kapacitor.(*streamEdge).CollectPoint
     0.08s  0.18% 71.11%      6.38s 14.59%  github.com/influxdata/kapacitor/edge.(*groupedConsumer).Point
     0.08s  0.18% 71.29%      0.34s  0.78%  github.com/influxdata/kapacitor/edge.(*statsEdge).incEmitted
     0.08s  0.18% 71.48%      0.45s  1.03%  github.com/influxdata/kapacitor/models.ToGroupID
     0.08s  0.18% 71.66%      0.25s  0.57%  runtime.chanrecv
     0.08s  0.18% 71.84%      1.05s  2.40%  runtime.chansend
     0.08s  0.18% 72.03%      1.56s  3.57%  runtime.makeslice
     0.07s  0.16% 72.19%      1.13s  2.58%  github.com/influxdata/kapacitor.(*windowByTime).Point
     0.07s  0.16% 72.35%     14.28s 32.66%  github.com/influxdata/kapacitor/edge.(*consumer).Consume
     0.07s  0.16% 72.51%      0.24s  0.55%  github.com/influxdata/kapacitor/vendor/github.com/influxdata/influxdb/models.(*point).Next
     0.07s  0.16% 72.67%      0.84s  1.92%  github.com/influxdata/kapacitor/vendor/github.com/influxdata/influxdb/models.(*point).Tags
     0.07s  0.16% 72.83%      0.24s  0.55%  math/rand.(*Rand).Int63
     0.06s  0.14% 72.96%      0.70s  1.60%  github.com/influxdata/kapacitor.(*windowTimeBuffer).points
     0.06s  0.14% 73.10%      0.33s  0.75%  github.com/influxdata/kapacitor/timer.(*timer).Start
     0.06s  0.14% 73.24%      4.63s 10.59%  internal/poll.(*FD).ReadFrom
     0.06s  0.14% 73.38%      0.25s  0.57%  runtime.mapaccess2
     0.06s  0.14% 73.51%      0.95s  2.17%  runtime.notesleep
     0.05s  0.11% 73.63%      0.25s  0.57%  github.com/influxdata/kapacitor.(*AlertNode).serverInfo
     0.05s  0.11% 73.74%      0.55s  1.26%  github.com/influxdata/kapacitor.(*StreamNode).runSourceStream
     0.05s  0.11% 73.86%      1.29s  2.95%  github.com/influxdata/kapacitor.(*TaskMaster).forkPoint
     0.05s  0.11% 73.97%      0.88s  2.01%  github.com/influxdata/kapacitor.(*streamEdge).EmitPoint
     0.05s  0.11% 74.09%      2.12s  4.85%  github.com/influxdata/kapacitor/edge.(*channelEdge).Collect
     0.05s  0.11% 74.20%      5.15s 11.78%  net.(*UDPConn).ReadFromUDP
     0.05s  0.11% 74.31%      0.31s  0.71%  runtime.convI2I
     0.05s  0.11% 74.43%      0.60s  1.37%  runtime.resetspinning
     0.05s  0.11% 74.54%      1.97s  4.51%  runtime.runqsteal
     0.05s  0.11% 74.66%      1.52s  3.48%  runtime.send
     0.05s  0.11% 74.77%      0.37s  0.85%  runtime.strhash
     0.05s  0.11% 74.89%      0.24s  0.55%  runtime.typedmemmove
     0.05s  0.11% 75.00%      3.65s  8.35%  syscall.recvfrom
     0.05s  0.11% 75.11%      0.36s  0.82%  text/template.(*state).evalField
     0.04s 0.091% 75.21%      1.59s  3.64%  github.com/influxdata/kapacitor.(*alertState).Point
     0.04s 0.091% 75.30%      0.53s  1.21%  github.com/influxdata/kapacitor.EvalPredicate
     0.04s 0.091% 75.39%      0.45s  1.03%  github.com/influxdata/kapacitor/edge.(*forwardingReceiver).Barrier
     0.04s 0.091% 75.48%      1.42s  3.25%  github.com/influxdata/kapacitor/vendor/github.com/influxdata/influxdb/models.(*point).unmarshalBinary
     0.04s 0.091% 75.57%      2.16s  4.94%  github.com/influxdata/kapacitor/vendor/github.com/influxdata/influxdb/models.parsePoint
     0.04s 0.091% 75.66%      4.67s 10.68%  net.(*netFD).readFrom
     0.04s 0.091% 75.75%      0.42s  0.96%  runtime.selunlock
     0.04s 0.091% 75.85%      0.36s  0.82%  runtime.sysmon
     0.04s 0.091% 75.94%      0.32s  0.73%  sort.Strings
     0.04s 0.091% 76.03%      4.09s  9.35%  syscall.Recvfrom
     0.04s 0.091% 76.12%      0.77s  1.76%  text/template.(*state).walk
     0.03s 0.069% 76.19%      0.47s  1.08%  github.com/influxdata/kapacitor.(*FromNode).Point
     0.03s 0.069% 76.26%      1.09s  2.49%  github.com/influxdata/kapacitor.(*windowByTime).batch
     0.03s 0.069% 76.33%      1.57s  3.59%  github.com/influxdata/kapacitor/edge.(*groupedConsumer).getOrCreateGroup
     0.03s 0.069% 76.40%      0.59s  1.35%  github.com/influxdata/kapacitor/edge.BatchPointFromPoint
     0.03s 0.069% 76.46%      4.16s  9.52%  github.com/influxdata/kapacitor/edge.receiveBufferedBatch
     0.03s 0.069% 76.53%      1.09s  2.49%  runtime.(*mcache).refill
     0.03s 0.069% 76.60%      0.22s   0.5%  runtime.entersyscall
     0.03s 0.069% 76.67%      0.29s  0.66%  runtime.gentraceback
     0.03s 0.069% 76.74%      7.87s 18.00%  runtime.mcall
     0.03s 0.069% 76.81%      0.27s  0.62%  runtime.notetsleep_internal
     0.03s 0.069% 76.88%      1.88s  4.30%  runtime.startm
     0.03s 0.069% 76.94%         2s  4.57%  runtime.systemstack
     0.03s 0.069% 77.01%      0.26s  0.59%  strconv.ParseFloat
     0.03s 0.069% 77.08%      0.42s  0.96%  text/template.(*state).evalCommand
     0.03s 0.069% 77.15%      0.39s  0.89%  text/template.(*state).evalFieldChain
     0.02s 0.046% 77.20%      0.56s  1.28%  github.com/influxdata/kapacitor.(*AlertNode).determineLevel
     0.02s 0.046% 77.24%      2.19s  5.01%  github.com/influxdata/kapacitor.(*TaskMaster).runForking
     0.02s 0.046% 77.29%      3.06s  7.00%  github.com/influxdata/kapacitor.(*floatPointAggregator).AggregatePoint
     0.02s 0.046% 77.33%      0.51s  1.17%  github.com/influxdata/kapacitor.(*periodicBarrier).emitBarrier
     0.02s 0.046% 77.38%      0.72s  1.65%  github.com/influxdata/kapacitor.(*periodicBarrier).periodicEmitter
     0.02s 0.046% 77.42%      3.36s  7.69%  github.com/influxdata/kapacitor/edge.(*forwardingReceiver).BatchPoint
     0.02s 0.046% 77.47%      4.16s  9.52%  github.com/influxdata/kapacitor/edge.(*groupedConsumer).BufferedBatch
     0.02s 0.046% 77.52%      1.44s  3.29%  github.com/influxdata/kapacitor/vendor/github.com/influxdata/influxdb/models.(*point).Fields
     0.02s 0.046% 77.56%      0.29s  0.66%  github.com/influxdata/kapacitor/vendor/github.com/influxdata/influxdb/models.parseFloatBytes
     0.02s 0.046% 77.61%      0.26s  0.59%  math/rand.(*Rand).Float64
     0.02s 0.046% 77.65%      1.24s  2.84%  runtime.(*mcache).nextFree
     0.02s 0.046% 77.70%      0.28s  0.64%  runtime.(*mheap).alloc_m
     0.02s 0.046% 77.74%      1.01s  2.31%  runtime.chansend1
     0.02s 0.046% 77.79%      1.30s  2.97%  runtime.goready
     0.02s 0.046% 77.84%      7.83s 17.91%  runtime.park_m
     0.02s 0.046% 77.88%      1.01s  2.31%  runtime.stopm
     0.02s 0.046% 77.93%      0.85s  1.94%  text/template.(*Template).execute
     0.01s 0.023% 77.95%      0.54s  1.24%  github.com/influxdata/kapacitor.(*AlertNode).findFirstMatchLevel
     0.01s 0.023% 77.97%      0.22s   0.5%  github.com/influxdata/kapacitor/edge.(*batchStatsEdge).Collect
     0.01s 0.023% 78.00%      0.29s  0.66%  github.com/influxdata/kapacitor/edge.(*groupedConsumer).DeleteGroup
     0.01s 0.023% 78.02%      0.27s  0.62%  github.com/influxdata/kapacitor/edge.(*pointMessage).SetDimensions
     0.01s 0.023% 78.04%      0.32s  0.73%  github.com/influxdata/kapacitor/edge.(*timedForwardReceiver).Barrier
     0.01s 0.023% 78.06%      0.38s  0.87%  github.com/influxdata/kapacitor/edge.Forward
     0.01s 0.023% 78.09%      0.22s   0.5%  github.com/influxdata/kapacitor/tick/stateful.(*expression).EvalBool
     0.01s 0.023% 78.11%      3.14s  7.18%  github.com/influxdata/kapacitor/vendor/github.com/influxdata/influxdb/models.ParsePoints
     0.01s 0.023% 78.13%      0.43s  0.98%  runtime.(*mheap).alloc
     0.01s 0.023% 78.16%      0.24s  0.55%  runtime.(*mheap).allocSpanLocked
     0.01s 0.023% 78.18%      1.84s  4.21%  runtime.futexwakeup
     0.01s 0.023% 78.20%      1.28s  2.93%  runtime.goready.func1
     0.01s 0.023% 78.23%      0.23s  0.53%  runtime.makemap_small
     0.01s 0.023% 78.25%      1.82s  4.16%  runtime.notewakeup
     0.01s 0.023% 78.27%      1.27s  2.90%  runtime.ready
     0.01s 0.023% 78.29%      1.66s  3.80%  runtime.wakep
     0.01s 0.023% 78.32%      0.22s   0.5%  sort.Sort
     0.01s 0.023% 78.34%      0.30s  0.69%  text/template.(*state).printValue
         0     0% 78.34%      0.65s  1.49%  github.com/influxdata/kapacitor.(*AlertNode).NewGroup
         0     0% 78.34%      1.26s  2.88%  github.com/influxdata/kapacitor.(*AlertNode).renderID
         0     0% 78.34%      3.07s  7.02%  github.com/influxdata/kapacitor.(*AlertNode).runAlert
         0     0% 78.34%      0.39s  0.89%  github.com/influxdata/kapacitor.(*BarrierNode).NewGroup
         0     0% 78.34%      0.30s  0.69%  github.com/influxdata/kapacitor.(*BarrierNode).newBarrier
         0     0% 78.34%      2.41s  5.51%  github.com/influxdata/kapacitor.(*BarrierNode).runBarrierEmitter
         0     0% 78.34%      1.50s  3.43%  github.com/influxdata/kapacitor.(*FromNode).runStream
         0     0% 78.34%      4.33s  9.90%  github.com/influxdata/kapacitor.(*InfluxQLNode).runInfluxQL
         0     0% 78.34%      2.19s  5.01%  github.com/influxdata/kapacitor.(*TaskMaster).stream.func1
         0     0% 78.34%      2.97s  6.79%  github.com/influxdata/kapacitor.(*WindowNode).runWindow
         0     0% 78.34%      3.29s  7.53%  github.com/influxdata/kapacitor.(*influxqlGroup).BatchPoint
         0     0% 78.34%      0.22s   0.5%  github.com/influxdata/kapacitor.(*influxqlGroup).realizeReduceContextFromFields
         0     0% 78.34%     14.83s 33.92%  github.com/influxdata/kapacitor.(*node).start.func1
         0     0% 78.34%      0.30s  0.69%  github.com/influxdata/kapacitor.(*windowByTime).Barrier
         0     0% 78.34%      0.29s  0.66%  github.com/influxdata/kapacitor.newPeriodicBarrier
         0     0% 78.34%      0.51s  1.17%  github.com/influxdata/kapacitor/edge.(*forwardingReceiver).EndBatch
         0     0% 78.34%      0.46s  1.05%  github.com/influxdata/kapacitor/edge.(*groupedConsumer).Barrier
         0     0% 78.34%     12.78s 29.23%  github.com/influxdata/kapacitor/edge.(*groupedConsumer).Consume
         0     0% 78.34%      3.33s  7.62%  github.com/influxdata/kapacitor/edge.(*timedForwardReceiver).BatchPoint
         0     0% 78.34%      0.32s  0.73%  github.com/influxdata/kapacitor/edge.NewBeginBatchMessage
         0     0% 78.34%      1.71s  3.91%  github.com/influxdata/kapacitor/vendor/github.com/influxdata/influxdb/influxql.NewTags
         0     0% 78.34%      0.29s  0.66%  github.com/influxdata/kapacitor/vendor/github.com/influxdata/influxdb/models.(*point).FloatValue
         0     0% 78.34%      0.46s  1.05%  runtime.(*mcentral).grow
         0     0% 78.34%      0.28s  0.64%  runtime.(*mheap).alloc.func1
         0     0% 78.34%      0.22s   0.5%  runtime.chanrecv2
         0     0% 78.34%      0.28s  0.64%  runtime.copystack
         0     0% 78.34%      0.24s  0.55%  runtime.entersyscallblock
         0     0% 78.34%      0.24s  0.55%  runtime.entersyscallblock_handoff
         0     0% 78.34%      1.13s  2.58%  runtime.futexsleep
         0     0% 78.34%      0.24s  0.55%  runtime.handoffp
         0     0% 78.34%      0.34s  0.78%  runtime.mstart
         0     0% 78.34%      0.34s  0.78%  runtime.mstart1
         0     0% 78.34%      0.29s  0.66%  runtime.newstack
         0     0% 78.34%      0.48s  1.10%  runtime.notetsleepg
         0     0% 78.34%      0.66s  1.51%  runtime.timerproc
         0     0% 78.34%      0.85s  1.94%  text/template.(*Template).Execute
         0     0% 78.34%      0.39s  0.89%  text/template.(*state).evalFieldNode
         0     0% 78.34%      0.42s  0.96%  text/template.(*state).evalPipeline

Perplexed I decided to chop things up and create two tick scripts instead that monitor each of those metrics independently. The first, top_ips does no variable assignment in the tick script and things are piped together in a single flow. The second, top_stores has assignment and piping such that data streams to two alerts that do slightly different things with those triggers, like the aforementioned combined script.

Data to the measurement ips looks like:

ips,ip=127.0.0.1,role= ingresstype

Here’s the show output for top_ips:

ID: top_ips
Error: 
Template: 
Type: stream
Status: enabled
Executing: true
Created: 22 Jan 21 22:25 UTC
Modified: 26 Jan 21 06:52 UTC
LastEnabled: 26 Jan 21 06:52 UTC
Databases Retention Policies: ["toptraffic"."autogen"]
TICKscript:
dbrp "toptraffic"."autogen"

stream
    |from()
        .groupBy('ip')
        .measurement('ips')
    |barrier()
        .period(1m)
        .delete(TRUE)
    |window()
        .period(1m)
        .every(5s)
        .align()
    |sum('count')
        .as('totalCount')
    |alert()
        .flapping(0.25, 0.5)
        .history(21)
        .warn(lambda: "totalCount" > 17500)
        .crit(lambda: "totalCount" > 22500)
        .message('''Observed  {{ index .Fields "totalCount" }} requests to Production IP {{ index .Tags "ip" }} within the last 1 minute.''')
        .stateChangesOnly(5m)
        .slack()
        .channel('#ops-noise')
        .exec('/usr/bin/kapacitor_pubsub_stdin_invoker.sh')
        .log('/var/log/kapacitor/alerts.log')

DOT:
digraph top_ips {
graph [throughput="18076.66 points/s"];

stream0 [avg_exec_time_ns="0s" errors="0" working_cardinality="0" ];
stream0 -> from1 [processed="1891796886"];

from1 [avg_exec_time_ns="45.655µs" errors="0" working_cardinality="0" ];
from1 -> barrier2 [processed="1891796886"];

barrier2 [avg_exec_time_ns="22.507µs" errors="0" working_cardinality="58218" ];
barrier2 -> window3 [processed="1891721166"];

window3 [avg_exec_time_ns="251.156µs" errors="0" working_cardinality="58218" ];
window3 -> sum4 [processed="376239068"];

sum4 [avg_exec_time_ns="101.993µs" errors="0" working_cardinality="34455" ];
sum4 -> alert5 [processed="376239068"];

alert5 [alerts_inhibited="0" alerts_triggered="835" avg_exec_time_ns="58.686µs" crits_triggered="101" errors="0" infos_triggered="0" oks_triggered="367" warns_triggered="367" working_cardinality="34455" ];
}

… and for top stores the data looks like:

stores,id=123456789,role=ingresstype

with a evaluated script like:

ID: top_stores
Error: 
Template: 
Type: stream
Status: enabled
Executing: true
Created: 22 Jan 21 22:30 UTC
Modified: 26 Jan 21 06:05 UTC
LastEnabled: 26 Jan 21 06:05 UTC
Databases Retention Policies: ["toptraffic"."autogen"]
TICKscript:
dbrp "toptraffic"."autogen"

var stores = stream
    |from()
        .groupBy('id')
        .measurement('stores')
    |barrier()
        .period(1m)
        .delete(TRUE)
    |window()
        .period(1m)
        .every(5s)
        .align()
    |sum('count')
        .as('totalCount')

stores
    |alert()
        .flapping(0.25, 0.5)
        .history(21)
        .warn(lambda: "totalCount" > 17500)
        .crit(lambda: "totalCount" > 22500)
        .message('''Observed  {{ index .Fields "totalCount" }} requests to Production Store ID {{ index .Tags "id" }} within the last minute.''')
        .noRecoveries()
        .stateChangesOnly(5m)
        .slack()
        .channel('#ops-noise')

stores
    |alert()
        .flapping(0.25, 0.5)
        .history(21)
        .warn(lambda: "totalCount" > 17500)
        .crit(lambda: "totalCount" > 22500)
        .message('''Observed  {{ index .Fields "totalCount" }} requests to Production Store ID {{ index .Tags "id" }} within the last minute.''')
        .stateChangesOnly(5m)
        .exec('/usr/bin/kapacitor_pubsub_stdin_invoker.sh')
        .log('/var/log/kapacitor/alerts.log')

DOT:
digraph top_stores {
graph [throughput="15742.66 points/s"];

stream0 [avg_exec_time_ns="0s" errors="0" working_cardinality="0" ];
stream0 -> from1 [processed="1816122105"];

from1 [avg_exec_time_ns="25.922µs" errors="0" working_cardinality="0" ];
from1 -> barrier2 [processed="1816122105"];

barrier2 [avg_exec_time_ns="16.675µs" errors="0" working_cardinality="19560" ];
barrier2 -> window3 [processed="1816013270"];

window3 [avg_exec_time_ns="164.84µs" errors="0" working_cardinality="19560" ];
window3 -> sum4 [processed="191193880"];

sum4 [avg_exec_time_ns="592.492µs" errors="0" working_cardinality="12185" ];
sum4 -> alert6 [processed="191193879"];
sum4 -> alert5 [processed="191193879"];

alert6 [alerts_inhibited="0" alerts_triggered="1375" avg_exec_time_ns="93.317µs" crits_triggered="206" errors="0" infos_triggered="0" oks_triggered="586" warns_triggered="583" working_cardinality="12185" ];

alert5 [alerts_inhibited="0" alerts_triggered="789" avg_exec_time_ns="229.573µs" crits_triggered="206" errors="0" infos_triggered="0" oks_triggered="0" warns_triggered="583" working_cardinality="12185" ];
}

Note the cardinality of these, at least at the time sampled, was ~12K for store IDs and ~34k for IPs. These on their own seem small potatoes, and, even in the combined script where a group by splits first by the IP, then the Store, shouldn’t be too much data for a one or two minute window.

At first this seemed to be a more stable approach, memory didn’t seem to grow as fast and I thought we’d level off. Unfortunately, as the graph shows below, we did not.

Screen Shot 2021-01-27 at 3 45 32 PM

Heap dumps show the following for in use objects:

go tool pprof -inuse_objects --text kapacitord top_ip_and_store_id_last/heap\?debug=1 
File: kapacitord
Type: inuse_objects
Showing nodes accounting for 247823085, 97.68% of 253717036 total
Dropped 130 nodes (cum <= 1268585)
      flat  flat%   sum%        cum   cum%
  43042527 16.96% 16.96%   43042532 16.96%  time.NewTicker
  33785532 13.32% 30.28%   33785532 13.32%  github.com/influxdata/kapacitor/edge.(*pointMessage).GroupInfo
  29265618 11.53% 41.82%   72471995 28.56%  github.com/influxdata/kapacitor.newPeriodicBarrier
  28924029 11.40% 53.22%   28924029 11.40%  github.com/influxdata/kapacitor/vendor/github.com/influxdata/influxdb/models.Tags.Map
  26395161 10.40% 63.62%   26395161 10.40%  github.com/influxdata/kapacitor/models.ToGroupID
  14090455  5.55% 69.17%   86562450 34.12%  github.com/influxdata/kapacitor.(*BarrierNode).newBarrier
  13910441  5.48% 74.66%   13910441  5.48%  github.com/influxdata/kapacitor.(*periodicBarrier).emitBarrier
  13590942  5.36% 80.01%   13590942  5.36%  github.com/influxdata/kapacitor/vendor/github.com/influxdata/influxdb/influxql.encodeTags
   9683888  3.82% 83.83%    9683888  3.82%  github.com/influxdata/kapacitor/vendor/github.com/influxdata/influxdb/models.(*point).unmarshalBinary
   9202897  3.63% 87.46%   22793839  8.98%  github.com/influxdata/kapacitor.convertFloatPoint
   7045227  2.78% 90.23%    7045227  2.78%  github.com/influxdata/kapacitor/vendor/github.com/influxdata/influxdb/models.(*point).Name
   4871732  1.92% 92.15%    4871732  1.92%  github.com/influxdata/kapacitor/edge.BatchPointFromPoint
   1784280   0.7% 92.86%    1784280   0.7%  github.com/influxdata/kapacitor/edge.(*pointMessage).ShallowCopy
   1693245  0.67% 93.52%    2004546  0.79%  github.com/influxdata/kapacitor/edge.NewPointMessage
   1658880  0.65% 94.18%    2491189  0.98%  github.com/influxdata/kapacitor/vendor/github.com/influxdata/influxdb/models.parsePoint
   1656757  0.65% 94.83%    4679768  1.84%  github.com/influxdata/kapacitor/services/udp.(*Service).serve
   1646692  0.65% 95.48%    1646692  0.65%  syscall.anyToSockaddr
   1627118  0.64% 96.12%    1640457  0.65%  github.com/influxdata/kapacitor/vendor/github.com/influxdata/influxdb/models.parseTags
   1376319  0.54% 96.66%    3023011  1.19%  net.(*UDPConn).readFrom
   1376277  0.54% 97.21%    3867466  1.52%  github.com/influxdata/kapacitor/vendor/github.com/influxdata/influxdb/models.ParsePointsWithPrecision
    617947  0.24% 97.45%    2016082  0.79%  github.com/influxdata/kapacitor.(*AlertNode).renderID
    163847 0.065% 97.51%   86966602 34.28%  github.com/influxdata/kapacitor.(*BarrierNode).NewGroup
    152931  0.06% 97.57%   14063372  5.54%  github.com/influxdata/kapacitor.(*periodicBarrier).periodicEmitter
    108311 0.043% 97.62%    4980043  1.96%  github.com/influxdata/kapacitor.(*windowTimeBuffer).points
    106502 0.042% 97.66%    5704480  2.25%  github.com/influxdata/kapacitor.(*windowByTime).batch
     45530 0.018% 97.68%   88450861 34.86%  github.com/influxdata/kapacitor/edge.(*groupedConsumer).getOrCreateGroup
         0     0% 97.68%    2851193  1.12%  github.com/influxdata/kapacitor.(*AlertNode).runAlert
         0     0% 97.68%  118282743 46.62%  github.com/influxdata/kapacitor.(*BarrierNode).runBarrierEmitter
         0     0% 97.68%   27556840 10.86%  github.com/influxdata/kapacitor.(*FromNode).Point
         0     0% 97.68%   27968454 11.02%  github.com/influxdata/kapacitor.(*FromNode).runStream
         0     0% 97.68%   24799627  9.77%  github.com/influxdata/kapacitor.(*InfluxQLNode).runInfluxQL
         0     0% 97.68%   48848474 19.25%  github.com/influxdata/kapacitor.(*TaskMaster).WritePoints
         0     0% 97.68%    8255397  3.25%  github.com/influxdata/kapacitor.(*WindowNode).runWindow
         0     0% 97.68%    1802300  0.71%  github.com/influxdata/kapacitor.(*alertState).Point
         0     0% 97.68%   22793839  8.98%  github.com/influxdata/kapacitor.(*floatPointAggregator).AggregatePoint
         0     0% 97.68%   23395427  9.22%  github.com/influxdata/kapacitor.(*influxqlGroup).BatchPoint
         0     0% 97.68%  182157414 71.80%  github.com/influxdata/kapacitor.(*node).start.func1
         0     0% 97.68%    5456825  2.15%  github.com/influxdata/kapacitor.(*windowByTime).Point
         0     0% 97.68%  182157408 71.80%  github.com/influxdata/kapacitor/edge.(*consumer).Consume
         0     0% 97.68%   23395427  9.22%  github.com/influxdata/kapacitor/edge.(*forwardingReceiver).BatchPoint
         0     0% 97.68%   35505282 13.99%  github.com/influxdata/kapacitor/edge.(*forwardingReceiver).Point
         0     0% 97.68%   24799627  9.77%  github.com/influxdata/kapacitor/edge.(*groupedConsumer).BufferedBatch
         0     0% 97.68%  154188954 60.77%  github.com/influxdata/kapacitor/edge.(*groupedConsumer).Consume
         0     0% 97.68%  128939059 50.82%  github.com/influxdata/kapacitor/edge.(*groupedConsumer).Point
         0     0% 97.68%   25772560 10.16%  github.com/influxdata/kapacitor/edge.(*pointMessage).SetDimensions
         0     0% 97.68%   23395427  9.22%  github.com/influxdata/kapacitor/edge.(*timedForwardReceiver).BatchPoint
         0     0% 97.68%   34815965 13.72%  github.com/influxdata/kapacitor/edge.(*timedForwardReceiver).Point
         0     0% 97.68%   24799627  9.77%  github.com/influxdata/kapacitor/edge.receiveBufferedBatch
         0     0% 97.68%   52715940 20.78%  github.com/influxdata/kapacitor/services/udp.(*Service).processPackets
         0     0% 97.68%    9683888  3.82%  github.com/influxdata/kapacitor/vendor/github.com/influxdata/influxdb/models.(*point).Fields
         0     0% 97.68%    1640457  0.65%  github.com/influxdata/kapacitor/vendor/github.com/influxdata/influxdb/models.(*point).Tags
         0     0% 97.68%    3867466  1.52%  github.com/influxdata/kapacitor/vendor/github.com/influxdata/influxdb/models.ParsePoints
         0     0% 97.68%    1646692  0.65%  internal/poll.(*FD).ReadFrom
         0     0% 97.68%    3023011  1.19%  net.(*UDPConn).ReadFromUDP
         0     0% 97.68%    1646692  0.65%  net.(*netFD).readFrom
         0     0% 97.68%  253634506   100%  runtime.goexit
         0     0% 97.68%    1646692  0.65%  syscall.Recvfrom

and for in use space:

go tool pprof --text kapacitord top_ip_and_store_id_last/heap\?debug=1 
File: kapacitord
Type: inuse_space
Showing nodes accounting for 19161.22MB, 97.53% of 19646.18MB total
Dropped 128 nodes (cum <= 98.23MB)
      flat  flat%   sum%        cum   cum%
 5369.30MB 27.33% 27.33%  5369.30MB 27.33%  github.com/influxdata/kapacitor/edge.(*pointMessage).GroupInfo
 3794.53MB 19.31% 46.64%  6652.30MB 33.86%  github.com/influxdata/kapacitor.newPeriodicBarrier
 2852.22MB 14.52% 61.16%  2852.77MB 14.52%  time.NewTicker
 1230.21MB  6.26% 67.42%  1230.21MB  6.26%  github.com/influxdata/kapacitor/vendor/github.com/influxdata/influxdb/models.Tags.Map
  945.72MB  4.81% 72.24%   945.72MB  4.81%  github.com/influxdata/kapacitor/vendor/github.com/influxdata/influxdb/models.(*point).unmarshalBinary
  892.46MB  4.54% 76.78%  7766.76MB 39.53%  github.com/influxdata/kapacitor.(*BarrierNode).NewGroup
  551.54MB  2.81% 79.59%   966.55MB  4.92%  github.com/influxdata/kapacitor.convertFloatPoint
  541.01MB  2.75% 82.34%   541.01MB  2.75%  github.com/influxdata/kapacitor/models.ToGroupID
  455.63MB  2.32% 84.66%   521.63MB  2.66%  github.com/influxdata/kapacitor/vendor/github.com/influxdata/influxdb/models.parsePoint
  426.01MB  2.17% 86.83%   426.01MB  2.17%  github.com/influxdata/kapacitor.(*periodicBarrier).emitBarrier
  415.01MB  2.11% 88.94%   415.01MB  2.11%  github.com/influxdata/kapacitor/vendor/github.com/influxdata/influxdb/influxql.encodeTags
  245.03MB  1.25% 90.19%   245.03MB  1.25%  github.com/influxdata/kapacitor/edge.(*pointMessage).ShallowCopy
  232.53MB  1.18% 91.37%   238.03MB  1.21%  github.com/influxdata/kapacitor/edge.NewPointMessage
  223.01MB  1.14% 92.51%   223.01MB  1.14%  github.com/influxdata/kapacitor/edge.BatchPointFromPoint
     215MB  1.09% 93.60%  6867.30MB 34.95%  github.com/influxdata/kapacitor.(*BarrierNode).newBarrier
  188.02MB  0.96% 94.56%   190.53MB  0.97%  github.com/influxdata/kapacitor/vendor/github.com/influxdata/influxdb/models.parseTags
  108.51MB  0.55% 95.11%   272.02MB  1.38%  github.com/influxdata/kapacitor/services/udp.(*Service).serve
  107.50MB  0.55% 95.66%   107.50MB  0.55%  github.com/influxdata/kapacitor/vendor/github.com/influxdata/influxdb/models.(*point).Name
  100.51MB  0.51% 96.17%   100.51MB  0.51%  syscall.anyToSockaddr
   83.84MB  0.43% 96.60%   306.85MB  1.56%  github.com/influxdata/kapacitor.(*windowTimeBuffer).points
      63MB  0.32% 96.92%   163.51MB  0.83%  net.(*UDPConn).readFrom
   46.70MB  0.24% 97.16%   146.72MB  0.75%  github.com/influxdata/kapacitor/edge.(*statsEdge).incCollected
   32.42MB  0.17% 97.32%  7904.19MB 40.23%  github.com/influxdata/kapacitor/edge.(*groupedConsumer).getOrCreateGroup
      21MB  0.11% 97.43%   542.63MB  2.76%  github.com/influxdata/kapacitor/vendor/github.com/influxdata/influxdb/models.ParsePointsWithPrecision
      14MB 0.071% 97.50%   440.01MB  2.24%  github.com/influxdata/kapacitor.(*periodicBarrier).periodicEmitter
    6.50MB 0.033% 97.53%   340.85MB  1.73%  github.com/influxdata/kapacitor.(*windowByTime).batch
         0     0% 97.53%   192.18MB  0.98%  github.com/influxdata/kapacitor.(*AlertNode).runAlert
         0     0% 97.53% 12749.75MB 64.90%  github.com/influxdata/kapacitor.(*BarrierNode).runBarrierEmitter
         0     0% 97.53%   774.55MB  3.94%  github.com/influxdata/kapacitor.(*FromNode).Point
         0     0% 97.53%   832.51MB  4.24%  github.com/influxdata/kapacitor.(*FromNode).runStream
         0     0% 97.53%  1171.02MB  5.96%  github.com/influxdata/kapacitor.(*InfluxQLNode).runInfluxQL
         0     0% 97.53%  2687.48MB 13.68%  github.com/influxdata/kapacitor.(*TaskMaster).WritePoints
         0     0% 97.53%      724MB  3.69%  github.com/influxdata/kapacitor.(*WindowNode).runWindow
         0     0% 97.53%   966.55MB  4.92%  github.com/influxdata/kapacitor.(*floatPointAggregator).AggregatePoint
         0     0% 97.53%  1028.06MB  5.23%  github.com/influxdata/kapacitor.(*influxqlGroup).BatchPoint
         0     0% 97.53% 15669.47MB 79.76%  github.com/influxdata/kapacitor.(*node).start.func1
         0     0% 97.53%   370.80MB  1.89%  github.com/influxdata/kapacitor.(*windowByTime).Point
         0     0% 97.53% 15659.62MB 79.71%  github.com/influxdata/kapacitor/edge.(*consumer).Consume
         0     0% 97.53%  1028.06MB  5.23%  github.com/influxdata/kapacitor/edge.(*forwardingReceiver).BatchPoint
         0     0% 97.53%   118.41MB   0.6%  github.com/influxdata/kapacitor/edge.(*forwardingReceiver).EndBatch
         0     0% 97.53%  1316.67MB  6.70%  github.com/influxdata/kapacitor/edge.(*forwardingReceiver).Point
         0     0% 97.53%   147.22MB  0.75%  github.com/influxdata/kapacitor/edge.(*forwardingReceiver).forward
         0     0% 97.53%  1171.02MB  5.96%  github.com/influxdata/kapacitor/edge.(*groupedConsumer).BufferedBatch
         0     0% 97.53% 14827.11MB 75.47%  github.com/influxdata/kapacitor/edge.(*groupedConsumer).Consume
         0     0% 97.53% 13633.07MB 69.39%  github.com/influxdata/kapacitor/edge.(*groupedConsumer).Point
         0     0% 97.53%   529.51MB  2.70%  github.com/influxdata/kapacitor/edge.(*pointMessage).SetDimensions
         0     0% 97.53%   140.38MB  0.71%  github.com/influxdata/kapacitor/edge.(*streamStatsEdge).Collect
         0     0% 97.53%  1028.06MB  5.23%  github.com/influxdata/kapacitor/edge.(*timedForwardReceiver).BatchPoint
         0     0% 97.53%  1208.85MB  6.15%  github.com/influxdata/kapacitor/edge.(*timedForwardReceiver).Point
         0     0% 97.53%  1171.02MB  5.96%  github.com/influxdata/kapacitor/edge.receiveBufferedBatch
         0     0% 97.53%  3230.11MB 16.44%  github.com/influxdata/kapacitor/services/udp.(*Service).processPackets
         0     0% 97.53%   945.72MB  4.81%  github.com/influxdata/kapacitor/vendor/github.com/influxdata/influxdb/models.(*point).Fields
         0     0% 97.53%   190.53MB  0.97%  github.com/influxdata/kapacitor/vendor/github.com/influxdata/influxdb/models.(*point).Tags
         0     0% 97.53%   542.63MB  2.76%  github.com/influxdata/kapacitor/vendor/github.com/influxdata/influxdb/models.ParsePoints
         0     0% 97.53%   100.51MB  0.51%  internal/poll.(*FD).ReadFrom
         0     0% 97.53%   163.51MB  0.83%  net.(*UDPConn).ReadFromUDP
         0     0% 97.53%   100.51MB  0.51%  net.(*netFD).readFrom
         0     0% 97.53% 19615.16MB 99.84%  runtime.goexit
         0     0% 97.53%   100.51MB  0.51%  syscall.Recvfrom

A profile dump at roughly the same time shows:

go tool pprof --text kapacitord top_ip_and_store_id_last/profile 
File: kapacitord
Type: cpu
Time: Jan 27, 2021 at 2:20pm (PST)
Duration: 30.17s, Total samples = 66.49s (220.40%)
Showing nodes accounting for 56.07s, 84.33% of 66.49s total
Dropped 417 nodes (cum <= 0.33s)
      flat  flat%   sum%        cum   cum%
    11.90s 17.90% 17.90%     12.39s 18.63%  runtime.findObject
     9.26s 13.93% 31.82%     26.13s 39.30%  runtime.scanobject
     5.15s  7.75% 39.57%      5.15s  7.75%  runtime.markBits.isMarked
     2.63s  3.96% 43.53%      2.78s  4.18%  syscall.Syscall6
     2.17s  3.26% 46.79%      2.88s  4.33%  runtime.mapaccess2_faststr
     1.32s  1.99% 48.77%      1.32s  1.99%  runtime.epollwait
     1.26s  1.90% 50.67%      9.32s 14.02%  runtime.mallocgc
     1.25s  1.88% 52.55%      1.25s  1.88%  runtime.futex
     1.17s  1.76% 54.31%      1.19s  1.79%  runtime.pageIndexOf
     0.96s  1.44% 55.75%      1.13s  1.70%  runtime.heapBitsSetType
     0.90s  1.35% 57.11%      0.90s  1.35%  runtime.usleep
     0.84s  1.26% 58.37%      0.84s  1.26%  runtime.nextFreeFast
     0.80s  1.20% 59.57%      0.83s  1.25%  runtime.(*itabTableType).find
     0.74s  1.11% 60.69%      0.74s  1.11%  runtime.memclrNoHeapPointers
     0.73s  1.10% 61.78%      4.10s  6.17%  runtime.selectgo
     0.71s  1.07% 62.85%      3.78s  5.69%  runtime.gcWriteBarrier
     0.70s  1.05% 63.90%      0.70s  1.05%  memeqbody
     0.67s  1.01% 64.91%      0.77s  1.16%  runtime.lock
     0.56s  0.84% 65.75%      0.60s   0.9%  runtime.spanOf
     0.55s  0.83% 66.58%      0.62s  0.93%  runtime.unlock
     0.52s  0.78% 67.36%      0.52s  0.78%  runtime.memmove
     0.48s  0.72% 68.09%      0.48s  0.72%  github.com/influxdata/kapacitor/edge.(*pointMessage).Fields
     0.48s  0.72% 68.81%      1.09s  1.64%  runtime.gcmarknewobject
     0.44s  0.66% 69.47%      0.44s  0.66%  runtime.spanOfUnchecked
     0.42s  0.63% 70.10%      0.42s  0.63%  aeshashbody
     0.39s  0.59% 70.69%      3.51s  5.28%  runtime.wbBufFlush1
     0.36s  0.54% 71.23%      5.38s  8.09%  github.com/influxdata/kapacitor/edge.(*timedForwardReceiver).Point
     0.35s  0.53% 71.76%         1s  1.50%  runtime.mapiternext
     0.33s   0.5% 72.25%      7.40s 11.13%  github.com/influxdata/kapacitor/edge.(*forwardingReceiver).Point
     0.33s   0.5% 72.75%      6.15s  9.25%  runtime.greyobject
     0.32s  0.48% 73.23%      1.15s  1.73%  runtime.getitab
     0.31s  0.47% 73.70%      3.36s  5.05%  runtime.findrunnable
     0.30s  0.45% 74.15%      2.87s  4.32%  github.com/influxdata/kapacitor/vendor/github.com/influxdata/influxdb/influxql.encodeTags
     0.29s  0.44% 74.58%      2.07s  3.11%  github.com/influxdata/kapacitor/edge.(*forwardingReceiver).forward
     0.29s  0.44% 75.02%      0.58s  0.87%  runtime.mapaccess1
     0.29s  0.44% 75.45%      1.96s  2.95%  runtime.mapassign_faststr
     0.25s  0.38% 75.83%      0.61s  0.92%  runtime.markBitsForAddr
     0.20s   0.3% 76.13%      5.48s  8.24%  runtime.newobject
     0.20s   0.3% 76.43%      1.06s  1.59%  runtime.runqgrab
     0.19s  0.29% 76.72%      1.21s  1.82%  github.com/influxdata/kapacitor/edge.(*statsEdge).incCollected
     0.19s  0.29% 77.00%      0.84s  1.26%  runtime.bulkBarrierPreWrite
     0.18s  0.27% 77.27%      3.34s  5.02%  github.com/influxdata/kapacitor/edge.(*streamStatsEdge).Collect
     0.15s  0.23% 77.50%     19.80s 29.78%  github.com/influxdata/kapacitor/edge.(*consumer).Consume
     0.15s  0.23% 77.73%      3.20s  4.81%  github.com/influxdata/kapacitor/edge.(*streamStatsEdge).Emit
     0.15s  0.23% 77.95%      0.96s  1.44%  runtime.sellock
     0.14s  0.21% 78.16%      0.35s  0.53%  github.com/influxdata/kapacitor.(*windowTimeBuffer).insert
     0.14s  0.21% 78.37%      0.52s  0.78%  runtime.selunlock
     0.13s   0.2% 78.57%      0.95s  1.43%  runtime.slicebytetostring
     0.13s   0.2% 78.76%      0.96s  1.44%  runtime.typedmemmove
     0.12s  0.18% 78.94%     23.22s 34.92%  runtime.gcDrain
     0.12s  0.18% 79.12%      3.90s  5.87%  runtime.schedule
     0.11s  0.17% 79.29%      2.24s  3.37%  github.com/influxdata/kapacitor.(*windowByTime).Point
     0.11s  0.17% 79.46%      1.44s  2.17%  github.com/influxdata/kapacitor/edge.(*pointMessage).GroupInfo
     0.10s  0.15% 79.61%      1.80s  2.71%  github.com/influxdata/kapacitor.(*TaskMaster).forkPoint
     0.10s  0.15% 79.76%      0.80s  1.20%  github.com/influxdata/kapacitor/vendor/github.com/influxdata/influxdb/models.scanKey
     0.10s  0.15% 79.91%      4.20s  6.32%  net.(*UDPConn).readFrom
     0.10s  0.15% 80.06%      0.97s  1.46%  runtime.assertI2I2
     0.10s  0.15% 80.21%      1.70s  2.56%  runtime.makeslice
     0.09s  0.14% 80.34%      0.83s  1.25%  github.com/influxdata/kapacitor/vendor/github.com/influxdata/influxdb/models.parseTags
     0.09s  0.14% 80.48%      0.34s  0.51%  runtime.memhash
     0.09s  0.14% 80.61%      1.45s  2.18%  runtime.netpoll
     0.08s  0.12% 80.73%      5.77s  8.68%  github.com/influxdata/kapacitor.(*influxqlGroup).BatchPoint
     0.08s  0.12% 80.85%      1.32s  1.99%  github.com/influxdata/kapacitor/edge.(*groupedConsumer).getOrCreateGroup
     0.08s  0.12% 80.97%      3.42s  5.14%  runtime.gcDrainN
     0.08s  0.12% 81.09%      0.90s  1.35%  runtime.mapiterinit
     0.07s  0.11% 81.20%      1.48s  2.23%  github.com/influxdata/kapacitor.(*alertState).Point
     0.07s  0.11% 81.31%      2.06s  3.10%  github.com/influxdata/kapacitor/edge.(*channelEdge).Collect
     0.07s  0.11% 81.41%      2.02s  3.04%  github.com/influxdata/kapacitor/edge.(*channelEdge).Emit
     0.06s  0.09% 81.50%      1.09s  1.64%  github.com/influxdata/kapacitor.(*StreamNode).runSourceStream
     0.06s  0.09% 81.59%      0.61s  0.92%  github.com/influxdata/kapacitor.(*streamEdge).CollectPoint
     0.06s  0.09% 81.68%      5.32s  8.00%  github.com/influxdata/kapacitor.convertFloatPoint
     0.06s  0.09% 81.77%      0.67s  1.01%  github.com/influxdata/kapacitor/edge.(*statsEdge).incEmitted
     0.06s  0.09% 81.86%      5.14s  7.73%  github.com/influxdata/kapacitor/services/udp.(*Service).serve
     0.06s  0.09% 81.95%      3.62s  5.44%  internal/poll.(*FD).ReadFrom
     0.06s  0.09% 82.04%      4.26s  6.41%  net.(*UDPConn).ReadFromUDP
     0.06s  0.09% 82.13%      0.53s   0.8%  runtime.makemap
     0.05s 0.075% 82.21%      2.83s  4.26%  github.com/influxdata/kapacitor.(*TaskMaster).runForking
     0.05s 0.075% 82.28%      0.90s  1.35%  github.com/influxdata/kapacitor/edge.NewBatchPointMessage
     0.05s 0.075% 82.36%      0.38s  0.57%  github.com/influxdata/kapacitor/models.ToGroupID
     0.05s 0.075% 82.43%      2.11s  3.17%  github.com/influxdata/kapacitor/vendor/github.com/influxdata/influxdb/models.ParsePointsWithPrecision
     0.05s 0.075% 82.51%      0.64s  0.96%  runtime.(*mcentral).cacheSpan
     0.05s 0.075% 82.58%      0.39s  0.59%  runtime.strhash
     0.05s 0.075% 82.66%      0.63s  0.95%  sort.Strings
     0.04s  0.06% 82.72%      4.62s  6.95%  github.com/influxdata/kapacitor.(*TaskMaster).WritePoints
     0.04s  0.06% 82.78%      1.66s  2.50%  github.com/influxdata/kapacitor.(*windowTimeBuffer).points
     0.04s  0.06% 82.84%      7.16s 10.77%  github.com/influxdata/kapacitor/services/udp.(*Service).processPackets
     0.04s  0.06% 82.90%      1.24s  1.86%  github.com/influxdata/kapacitor/vendor/github.com/influxdata/influxdb/models.Tags.Map
     0.04s  0.06% 82.96%      1.60s  2.41%  github.com/influxdata/kapacitor/vendor/github.com/influxdata/influxdb/models.parsePoint
     0.04s  0.06% 83.02%      0.44s  0.66%  runtime.chansend1
     0.04s  0.06% 83.08%     30.96s 46.56%  runtime.systemstack
     0.04s  0.06% 83.14%      0.34s  0.51%  syscall.anyToSockaddr
     0.04s  0.06% 83.20%      2.82s  4.24%  syscall.recvfrom
     0.03s 0.045% 83.25%      0.80s  1.20%  github.com/influxdata/kapacitor.(*FromNode).Point
     0.03s 0.045% 83.29%      0.44s  0.66%  github.com/influxdata/kapacitor.EvalPredicate
     0.03s 0.045% 83.34%      8.25s 12.41%  github.com/influxdata/kapacitor/edge.(*groupedConsumer).Point
     0.03s 0.045% 83.38%      0.38s  0.57%  github.com/influxdata/kapacitor/edge.Forward
     0.03s 0.045% 83.43%      0.67s  1.01%  runtime.(*mcache).refill
     0.03s 0.045% 83.47%      1.09s  1.64%  runtime.runqsteal
     0.03s 0.045% 83.52%      0.64s  0.96%  runtime.send
     0.03s 0.045% 83.56%      0.47s  0.71%  runtime.timerproc
     0.03s 0.045% 83.61%      3.19s  4.80%  syscall.Recvfrom
     0.03s 0.045% 83.65%      0.54s  0.81%  text/template.(*state).walk
     0.02s  0.03% 83.68%      0.98s  1.47%  github.com/influxdata/kapacitor.(*streamEdge).EmitPoint
     0.02s  0.03% 83.71%      1.97s  2.96%  github.com/influxdata/kapacitor.(*windowByTime).batch
     0.02s  0.03% 83.74%      0.36s  0.54%  github.com/influxdata/kapacitor/edge.(*forwardingReceiver).Barrier
     0.02s  0.03% 83.77%      5.86s  8.81%  github.com/influxdata/kapacitor/edge.(*timedForwardReceiver).BatchPoint
     0.02s  0.03% 83.80%      1.40s  2.11%  github.com/influxdata/kapacitor/edge.BatchPointFromPoint
     0.02s  0.03% 83.83%      3.10s  4.66%  github.com/influxdata/kapacitor/vendor/github.com/influxdata/influxdb/influxql.NewTags
     0.02s  0.03% 83.86%      0.87s  1.31%  github.com/influxdata/kapacitor/vendor/github.com/influxdata/influxdb/models.(*point).Tags
     0.02s  0.03% 83.89%      2.16s  3.25%  github.com/influxdata/kapacitor/vendor/github.com/influxdata/influxdb/models.ParsePoints
     0.02s  0.03% 83.92%      3.65s  5.49%  net.(*netFD).readFrom
     0.02s  0.03% 83.95%      0.42s  0.63%  runtime.chansend
     0.02s  0.03% 83.98%      4.20s  6.32%  runtime.mcall
     0.02s  0.03% 84.01%         4s  6.02%  runtime.park_m
     0.02s  0.03% 84.04%      0.41s  0.62%  runtime.ready
     0.02s  0.03% 84.07%      0.38s  0.57%  runtime.stopm
     0.02s  0.03% 84.10%      3.54s  5.32%  runtime.wbBufFlush
     0.01s 0.015% 84.12%      0.45s  0.68%  github.com/influxdata/kapacitor.(*AlertNode).findFirstMatchLevel
     0.01s 0.015% 84.13%      5.40s  8.12%  github.com/influxdata/kapacitor.(*floatPointAggregator).AggregatePoint
     0.01s 0.015% 84.15%      1.78s  2.68%  github.com/influxdata/kapacitor.(*periodicBarrier).emitBarrier
     0.01s 0.015% 84.16%         2s  3.01%  github.com/influxdata/kapacitor.(*periodicBarrier).periodicEmitter
     0.01s 0.015% 84.18%      6.59s  9.91%  github.com/influxdata/kapacitor/edge.(*groupedConsumer).BufferedBatch
     0.01s 0.015% 84.19%      0.50s  0.75%  github.com/influxdata/kapacitor/edge.NewPointMessage
     0.01s 0.015% 84.21%      6.62s  9.96%  github.com/influxdata/kapacitor/edge.receiveBufferedBatch
     0.01s 0.015% 84.22%      1.11s  1.67%  github.com/influxdata/kapacitor/vendor/github.com/influxdata/influxdb/models.(*point).unmarshalBinary
     0.01s 0.015% 84.24%      0.75s  1.13%  runtime.(*mcache).nextFree
     0.01s 0.015% 84.25%      0.36s  0.54%  runtime.convTslice
     0.01s 0.015% 84.27%      0.53s   0.8%  runtime.futexsleep
     0.01s 0.015% 84.28%      0.74s  1.11%  runtime.futexwakeup
     0.01s 0.015% 84.30%      0.34s  0.51%  runtime.notesleep
     0.01s 0.015% 84.31%      0.68s  1.02%  runtime.notewakeup
     0.01s 0.015% 84.33%      3.52s  5.29%  runtime.wbBufFlush.func1
         0     0% 84.33%      0.45s  0.68%  github.com/influxdata/kapacitor.(*AlertNode).determineLevel
         0     0% 84.33%      1.02s  1.53%  github.com/influxdata/kapacitor.(*AlertNode).renderID
         0     0% 84.33%      2.42s  3.64%  github.com/influxdata/kapacitor.(*AlertNode).runAlert
         0     0% 84.33%      3.25s  4.89%  github.com/influxdata/kapacitor.(*BarrierNode).runBarrierEmitter
         0     0% 84.33%      2.35s  3.53%  github.com/influxdata/kapacitor.(*FromNode).runStream
         0     0% 84.33%      6.89s 10.36%  github.com/influxdata/kapacitor.(*InfluxQLNode).runInfluxQL
         0     0% 84.33%      2.83s  4.26%  github.com/influxdata/kapacitor.(*TaskMaster).stream.func1
         0     0% 84.33%      4.89s  7.35%  github.com/influxdata/kapacitor.(*WindowNode).runWindow
         0     0% 84.33%     20.89s 31.42%  github.com/influxdata/kapacitor.(*node).start.func1
         0     0% 84.33%      5.86s  8.81%  github.com/influxdata/kapacitor/edge.(*forwardingReceiver).BatchPoint
         0     0% 84.33%      0.50s  0.75%  github.com/influxdata/kapacitor/edge.(*forwardingReceiver).EndBatch
         0     0% 84.33%      0.37s  0.56%  github.com/influxdata/kapacitor/edge.(*groupedConsumer).Barrier
         0     0% 84.33%     17.45s 26.24%  github.com/influxdata/kapacitor/edge.(*groupedConsumer).Consume
         0     0% 84.33%      0.36s  0.54%  github.com/influxdata/kapacitor/edge.(*pointMessage).ShallowCopy
         0     0% 84.33%      1.12s  1.68%  github.com/influxdata/kapacitor/vendor/github.com/influxdata/influxdb/models.(*point).Fields
         0     0% 84.33%      1.31s  1.97%  runtime.convT2E
         0     0% 84.33%      3.42s  5.14%  runtime.gcAssistAlloc
         0     0% 84.33%      3.42s  5.14%  runtime.gcAssistAlloc.func1
         0     0% 84.33%      3.42s  5.14%  runtime.gcAssistAlloc1
         0     0% 84.33%     23.23s 34.94%  runtime.gcBgMarkWorker
         0     0% 84.33%     23.22s 34.92%  runtime.gcBgMarkWorker.func2
         0     0% 84.33%      0.41s  0.62%  runtime.goready
         0     0% 84.33%      0.41s  0.62%  runtime.goready.func1
         0     0% 84.33%      0.67s  1.01%  runtime.startm
         0     0% 84.33%      0.51s  0.77%  runtime.wakep
         0     0% 84.33%      0.64s  0.96%  text/template.(*Template).Execute
         0     0% 84.33%      0.64s  0.96%  text/template.(*Template).execute

So I’m left head scratching of running through many iterations of changes to the tick scripts. The data, while constantly flowing the Kapacitor doesn’t cause memory budge from about a gigabyte if all the tick scripts are disabled / inhibited from processing.

A few questions:

  1. Am I correctly configuring Kapacitor to do what I want here for data directly sent to it, that we don’t care about at all past the window (again, 1-2 minutes)? That is, the only way to purge data once you open a stream and consume from it is to have barrier events remove them – this seems to be adding high CPU load and potentially accounting for memory when you get over a size of a certain number, even if cardinality is in check.
  2. Configuring the barrier to emit delete on idle causes quicker growth, a steeper graph, but that’s probably because it’s not the option for that node we want … there will essentially never be an idle period in the data Kapacitor receives for these measurements in Production. That said, a lot of time and overhead seem to be spent with the tickers responsible for purging the data.
  3. The carnality does fluctuate and stay within a reasonable size when a tick script is example, as the various aforementioned outputs show. So why the growth till OOM of Kapacitor?
  4. Though sum doesn’t really seem to be anywhere near the problem with this configuration and the growth issue, is there a cleaner way of identifying the count of points in the stream? Is it possible for me to drop the value count=1 given that’s the only purpose it serves?
  5. Retention in this situation – where data never interacts with InfluxDB in any way – is moot, no where in Kapacitor is retention honored, exhibited by the fact that Kapacitor allows you to place already-existing-in-Influx retention periods on data in Kapacitor. Is that right?

I’ve attached (top_combined_and_top_ip_and_store_kapacitor_1.5.7_growth.tar.gz) the various dumps / profile captures, etc… from the time each of these steps were in place, an output of the structure is:

tree -D    
.
├── top_combined
│   ├── [Jan 25 19:31]  allocs?debug=1
│   ├── [Jan 25 19:31]  goroutine?debug=1
│   ├── [Jan 25 19:33]  goroutine?debug=2
│   ├── [Jan 25 19:31]  heap?debug=1
│   ├── [Jan 25 22:52]  index.html
│   ├── [Jan 25 19:32]  mutex?debug=1
│   ├── [Jan 25 19:32]  profile
│   ├── [Jan 25 19:32]  show_top_combined
│   ├── [Jan 25 19:32]  threadcreate?debug=1
│   └── [Jan 25 19:32]  trace
├── top_ip_and_store_id
│   ├── [Jan 27 11:26]  allocs?debug=1
│   ├── [Jan 27 11:23]  goroutine?debug=1
│   ├── [Jan 27 11:23]  goroutine?debug=2
│   ├── [Jan 27 11:25]  heap?debug=1
│   ├── [Jan 27 11:26]  index.html
│   ├── [Jan 27 11:24]  mutex?debug=1
│   ├── [Jan 27 11:25]  profile
│   ├── [Jan 27 11:27]  show_top_ips
│   ├── [Jan 27 11:27]  show_top_stores
│   ├── [Jan 27 11:26]  threadcreate?debug=1
│   └── [Jan 27 11:24]  trace
└── top_ip_and_store_id_last
    ├── [Jan 27 14:18]  allocs?debug=1
    ├── [Jan 27 14:19]  goroutine?debug=1
    ├── [Jan 27 14:19]  goroutine?debug=2
    ├── [Jan 27 14:19]  heap?debug=1
    ├── [Jan 27 14:24]  index.html
    ├── [Jan 27 14:19]  mutex?debug=1
    ├── [Jan 27 14:20]  profile
    ├── [Jan 27 14:20]  threadcreate?debug=1
    └── [Jan 27 14:20]  trace

3 directories, 30 files

With the files in top_ip_and_store_id and top_ip_and_store_id_last taken a few hours apart as the ^^^ shows.

About this issue

  • Original URL
  • State: open
  • Created 3 years ago
  • Comments: 15 (3 by maintainers)

Most upvoted comments

Will do. Will try and make some changes in the next few days and report back. Cheers!