opentelemetry-operator: TargetAllocator : error during loading configuration

Hi, i’ve got these logs at startup:

{"level":"info","ts":"2023-06-05T20:17:18Z","msg":"Starting the Target Allocator"}
{"level":"info","ts":"2023-06-05T20:17:18Z","logger":"allocator","msg":"Unrecognized filter strategy; filtering disabled"}
{"level":"info","ts":"2023-06-05T20:17:18Z","logger":"allocator","msg":"Starting server..."}
{"level":"info","ts":"2023-06-05T20:17:18Z","msg":"Waiting for caches to sync for servicemonitors\n"}
{"level":"info","ts":"2023-06-05T20:17:20Z","msg":"Caches are synced for servicemonitors\n"}
{"level":"info","ts":"2023-06-05T20:17:20Z","msg":"Waiting for caches to sync for podmonitors\n"}
{"level":"info","ts":"2023-06-05T20:17:20Z","msg":"Caches are synced for podmonitors\n"}
{"level":"info","ts":"2023-06-05T20:17:20Z","logger":"allocator","msg":"Successfully started a collector pod watcher","component":"opentelemetry-targetallocator"}
{"level":"error","ts":"2023-06-05T20:17:21Z","logger":"setup","msg":"Unable to load configuration","error":"empty duration string","stacktrace":"main.main.func13\n\t/app/main.go:198\ngithub.com/oklog/run.(*Group).Run.func1\n\t/go/pkg/mod/github.com/oklog/run@v1.1.0/group.go:38"}
{"level":"error","ts":"2023-06-05T20:17:21Z","logger":"setup","msg":"Unable to load configuration","error":"empty duration string","stacktrace":"main.main.func13\n\t/app/main.go:198\ngithub.com/oklog/run.(*Group).Run.func1\n\t/go/pkg/mod/github.com/oklog/run@v1.1.0/group.go:38"}
{"level":"error","ts":"2023-06-05T20:17:22Z","logger":"setup","msg":"Unable to load configuration","error":"empty duration string","stacktrace":"main.main.func13\n\t/app/main.go:198\ngithub.com/oklog/run.(*Group).Run.func1\n\t/go/pkg/mod/github.com/oklog/run@v1.1.0/group.go:38"}

with this configuration:

apiVersion: opentelemetry.io/v1alpha1
kind: OpenTelemetryCollector
metadata:
  labels:
    app.kubernetes.io/component: opentelemetry-collector
    app.kubernetes.io/instance: opentelemetry-collector
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: opentelemetry-collector
    app.kubernetes.io/part-of: opentelemetry-collector
    app.kubernetes.io/version: 1.0.0
    argocd.argoproj.io/instance: opentelemetry-collector
    helm.sh/chart: opentelemetry-collector-1.0.0
  name: metrics
  namespace: opentelemetry
spec:
  config: |
    exporters:
      logging:
        verbosity: normal
      prometheus:
        endpoint: 0.0.0.0:9090
        metric_expiration: 180m
        resource_to_telemetry_conversion:
          enabled: true
      prometheusremotewrite/mimir:
        endpoint: http://mimir-nginx.monitoring.svc.cluster.local:80/api/v1/push
    extensions:
      basicauth/grafanacloud:
        client_auth:
          password: ${GRAFANA_CLOUD_METRICS_APIKEY}
          username: ${GRAFANA_CLOUD_METRICS_ID}
      health_check: null
      memory_ballast:
        size_in_percentage: 20
      pprof:
        endpoint: :1888
      zpages:
        endpoint: :55679
    processors:
      batch:
        send_batch_max_size: 1500
        send_batch_size: 1500
        timeout: 15s
      k8sattributes:
        extract:
          metadata:
          - k8s.namespace.name
          - k8s.pod.name
          - k8s.pod.uid
          - k8s.node.name
          - k8s.pod.start_time
          - k8s.deployment.name
          - k8s.replicaset.name
          - k8s.replicaset.uid
          - k8s.daemonset.name
          - k8s.daemonset.uid
          - k8s.job.name
          - k8s.job.uid
          - k8s.cronjob.name
          - k8s.statefulset.name
          - k8s.statefulset.uid
          - container.image.tag
          - container.image.name
        passthrough: false
        pod_association:
        - sources:
          - from: resource_attribute
            name: k8s.pod.name
      memory_limiter:
        check_interval: 5s
        limit_percentage: 90
        spike_limit_percentage: 30
      resource:
        attributes:
        - action: insert
          key: collector.name
          value: ${KUBE_POD_NAME}
    receivers:
      hostmetrics:
        collection_interval: 60s
        scrapers:
          cpu: null
          disk: null
          filesystem: null
          load: null
          memory: null
          network: null
          processes: null
      prometheus:
        config:
          global:
            evaluation_interval: 60s
            scrape_interval: 60s
            scrape_timeout: 60s
        target_allocator:
          collector_id: ${POD_NAME}
          endpoint: http://metrics-targetallocator:80
          http_sd_config:
            refresh_interval: 60s
          interval: 30s
    service:
      extensions:
      - health_check
      - memory_ballast
      - pprof
      - zpages
      pipelines:
        metrics:
          exporters:
          - logging
          - prometheus
          processors:
          - batch
          - memory_limiter
          - k8sattributes
          receivers:
          - hostmetrics
          - prometheus
      telemetry:
        logs:
          encoding: json
          level: info
        metrics:
          address: 0.0.0.0:8888
          level: detailed
  env:
  - name: K8S_NODE_NAME
    valueFrom:
      fieldRef:
        fieldPath: spec.nodeName
  - name: K8S_POD_NAME
    valueFrom:
      fieldRef:
        fieldPath: metadata.name
  - name: K8S_NAMESPACE
    valueFrom:
      fieldRef:
        fieldPath: metadata.namespace
  envFrom:
  - secretRef:
      name: opentelemetry-datadog-credentials
  - secretRef:
      name: opentelemetry-lightstep-credentials
  - secretRef:
      name: opentelemetry-grafanacloud-credentials
  image: ghcr.io/open-telemetry/opentelemetry-collector-releases/opentelemetry-collector-contrib:0.78.0
  ingress:
    route: {}
  mode: statefulset
  ports:
  - name: metrics
    port: 8888
    protocol: TCP
    targetPort: 8888
  replicas: 1
  resources:
    limits:
      memory: 3Gi
    requests:
      cpu: "1"
      memory: 2Gi
  serviceAccount: opentelemetry-collector-metrics
  targetAllocator:
    allocationStrategy: consistent-hashing
    enabled: true
    filterStrategy: relabel-config
    image: ghcr.io/open-telemetry/opentelemetry-operator/target-allocator:0.78.0
    prometheusCR:
      enabled: true
    replicas: 1
    serviceAccount: opentelemetry-collector-metrics-targetallocator
  upgradeStrategy: automatic

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Comments: 17 (5 by maintainers)

Most upvoted comments

thanks i will try that…

@matej-g Please go ahead, I’m busy with something else at the moment.

Seeing the same after updating to latest TA (0.78.0). I think it’s coming from the scrape time validation logic, in which we now have empty durations, since these are now coming from the Prometheus object passed to config generator and are empty.

I think this is right @eplightning, I was about to open a PR, would you like to instead?