opentelemetry-collector-contrib: Memory leak problem with Opentelemetry Collector

Describe the bug Memory leak problem with Opentelemetry Collecotor

Steps to reproduce I wasn’t able to reproduce this locally, but I think it may be due to the fact that OTLP collected a huge trace with 20000 spans.

What did you expect to see? Expected memory usage to go up and down. However, memory usage is constantly going up.

What version did you use? opentelemetry-operator:0.37.1 tempo-distributed:1.5.4

What config did you use?

apiVersion: opentelemetry.io/v1alpha1
kind: OpenTelemetryCollector
metadata:
  name: otel
  namespace: opentelemetry
spec:
  config: |
    connectors:
      spanmetrics:
        namespace: span.metrics

    receivers:
      # Data sources: traces, metrics, logs
      otlp:
        protocols:
          grpc:
          http:
    processors:
      memory_limiter:
        check_interval: 1s
        limit_percentage: 75
        spike_limit_percentage: 15
      batch:
        send_batch_size: 10000
        timeout: 10s
      tail_sampling:
        policies:
          - name: drop_noisy_traces_url
            type: string_attribute
            string_attribute:
              key: http.target
              values:
                - \/health
              enabled_regex_matching: true
              invert_match: true
    exporters:
      otlp:
        endpoint: http://tempo-distributor:4317/
        tls:
          insecure: true
      logging:
        loglevel: debug
      prometheus:
        enable_open_metrics: true
        endpoint: 0.0.0.0:8889
        resource_to_telemetry_conversion:
          enabled: true
      loki:
        endpoint: http://loki-gateway.loki/loki/api/v1/push
    service:
      pipelines:
        traces:
          receivers: [otlp]
          processors: [memory_limiter, batch, tail_sampling]
          exporters: [otlp, spanmetrics]
        metrics:
          receivers: [otlp, spanmetrics]
          processors: [memory_limiter, batch]
          exporters: [prometheus]
        logs:
          receivers: [otlp]
          processors: [memory_limiter, batch]
          exporters: [loki]

Environment OS: AKS Ubuntu Linux Compiler: .NET 6.0 dotnet-autoinstrumentation

About this issue

  • Original URL
  • State: open
  • Created 7 months ago
  • Comments: 15 (7 by maintainers)

Most upvoted comments

@albertteoh yes, it was related to exemplars. OTLP pod did not crash due to memory since 17 hours. Thanks

I am transferring this to contrib since the current theory is that this is related to the spanmetrics connector

I noticed you’re using the spanmetrics connector; there was a recent merge of a memory leak fix: https://github.com/open-telemetry/opentelemetry-collector-contrib/pull/28847

It was just released today: https://github.com/open-telemetry/opentelemetry-collector-contrib/releases/tag/v0.91.0

It might be worth upgrading the opentelemetry-operator once it’s released with collector v0.91.0.