opentelemetry-collector-contrib: [exporter/prometheus] Expired metrics were not be deleted

Component(s)

exporter/prometheus

What happened?

Description

Hi, I am trying to use spanmetrics processor and prometheus exporter to transform spans to metrics. But I found some expired metrics seems to be appeared repeatedly when new metrics received. And also the memory usage continued to rise. So is this a bug of prometheus exporter?

Steps to Reproduce

examples: when I posted span to collector, the prometheus exporter exported the metric like this

calls_total{db_instance="N/A",db_name="name-KSDORKdOKV",db_sql_table="table-IWstkE",db_system="redis",operation="get",service_name="go-project.examples",span_kind="SPAN_KIND_CLIENT",status_code="STATUS_CODE_UNSET"} 1

after 5 seconds, the metric will disappear. Then I posted another span to collector, the prometheus exporter exported two metrics included the expired one

calls_total{db_instance="N/A",db_name="name-KSDORKdOKV",db_sql_table="table-IWstkE",db_system="redis",operation="get",service_name="go-project.examples",span_kind="SPAN_KIND_CLIENT",status_code="STATUS_CODE_UNSET"} 1
calls_total{db_instance="N/A",db_name="name-QSvMJKDYso",db_sql_table="table-ZHdGvF",db_system="redis",operation="set",service_name="go-project.examples",span_kind="SPAN_KIND_CLIENT",status_code="STATUS_CODE_UNSET"} 1

Expected Result

Expired metrics will be deleted.

Actual Result

Expired metrics seems to be still stored in memory cache.

Collector version

2ada50fd4a

Environment information

Environment

OS: MacOS 13.0.1 Compiler(if manually compiled): go 1.19.3

OpenTelemetry Collector configuration

receivers:
  # Dummy receiver that's never used, because a pipeline is required to have one.
  otlp/spanmetrics:
    protocols:
      grpc:
        endpoint: "localhost:12345"

  otlp:
    protocols:
      grpc:
        endpoint: "localhost:55677"

processors:
  batch:
  spanmetrics:
    metrics_exporter: otlp/spanmetrics
    latency_histogram_buckets: [10ms, 100ms]
    dimensions:
      - name: db.system
        default: N/A
      - name: db.name
        default: N/A
      - name: db.sql.table
        default: N/A
      - name: db.instance
        default: N/A
    dimensions_cache_size: 1000
    aggregation_temporality: "AGGREGATION_TEMPORALITY_CUMULATIVE"

exporters:
  logging:
    verbosity: basic

  otlp/spanmetrics:
    endpoint: "localhost:55677"
    tls:
      insecure: true

  prometheus:
    endpoint: "0.0.0.0:8889"
    metric_expiration: 5s

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [spanmetrics, batch]
      exporters: [logging]

    # The exporter name must match the metrics_exporter name.
    # The receiver is just a dummy and never used; added to pass validation requiring at least one receiver in a pipeline.
    metrics/spanmetrics:
      receivers: [otlp/spanmetrics]
      exporters: [otlp/spanmetrics]

    metrics:
      receivers: [otlp]
      exporters: [prometheus]

Log output

No response

Additional context

No response

About this issue

Original URL
State: closed
Created 2 years ago
Reactions: 1
Comments: 16 (14 by maintainers)

Most upvoted comments

Hi @mfilipe , the spanmetrics connector will continue to emit metrics at a configurable time interval (default every 15s) even when no new spans are sent to it, so I believe this is what’s preventing the prometheus.metric_expiration to kick in.

There are some knobs available to influence the number of metrics the spanmetrics connector emits which are dimensions_cache_size and the buckets config parameters.

The former ensures no more than this many metrics will be store in memory and emitted downstream.

The latter speaks to the resolution of histograms, so a smaller number should reduced the number of metrics exported.

albertteoh on Sep 19, 2023