telegraf: Telegraf crashing in 1.26 version

Relevant telegraf.conf

[[inputs.prometheus]]
      metric_version = 2
      monitor_kubernetes_pods = false
      pod_scrape_scope = "cluster"
      bearer_token = "/var/run/secrets/kubernetes.io/serviceaccount/token"
      kubernetes_services = ["https://kubernetes.default.svc/api/v1/nodes/${MY_NODE_NAME}/proxy/metrics/cadvisor"]

Logs from Telegraf

fatal error: concurrent map read and map write

goroutine 30 [running]:
github.com/influxdata/telegraf/plugins/inputs/prometheus.shouldScrapePod(0xc0017c4ee0, 0xc000618d80)
        /usr/src/mariner/BUILD/telegraf-1.26.0/plugins/inputs/prometheus/kubernetes.go:110 +0x1be
github.com/influxdata/telegraf/plugins/inputs/prometheus.(*Prometheus).watchPod.func1({0x62c3c60?, 0xc0017c4ee0?})
        /usr/src/mariner/BUILD/telegraf-1.26.0/plugins/inputs/prometheus/kubernetes.go:150 +0x56
k8s.io/client-go/tools/cache.ResourceEventHandlerFuncs.OnAdd(...)
        /usr/src/mariner/BUILD/telegraf-1.26.0/vendor/k8s.io/client-go/tools/cache/controller.go:232
k8s.io/client-go/tools/cache.(*processorListener).run.func1()
        /usr/src/mariner/BUILD/telegraf-1.26.0/vendor/k8s.io/client-go/tools/cache/shared_informer.go:818 +0x134
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0x0?)
        /usr/src/mariner/BUILD/telegraf-1.26.0/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:157 +0x3e
k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0xc0008c8738?, {0x6fa7c80, 0xc000813620}, 0x1, 0xc0011a43c0)
        /usr/src/mariner/BUILD/telegraf-1.26.0/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:158 +0xb6
k8s.io/apimachinery/pkg/util/wait.JitterUntil(0x0?, 0x3b9aca00, 0x0, 0x0?, 0xc0008c8788?)
        /usr/src/mariner/BUILD/telegraf-1.26.0/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:135 +0x89
k8s.io/apimachinery/pkg/util/wait.Until(...)
        /usr/src/mariner/BUILD/telegraf-1.26.0/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:92
k8s.io/client-go/tools/cache.(*processorListener).run(0xc000645b80?)
        /usr/src/mariner/BUILD/telegraf-1.26.0/vendor/k8s.io/client-go/tools/cache/shared_informer.go:812 +0x6b
k8s.io/apimachinery/pkg/util/wait.(*Group).Start.func1()
        /usr/src/mariner/BUILD/telegraf-1.26.0/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:75 +0x5a
created by k8s.io/apimachinery/pkg/util/wait.(*Group).Start
        /usr/src/mariner/BUILD/telegraf-1.26.0/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:73 +0x85

goroutine 1 [semacquire]:
sync.runtime_Semacquire(0xc000b9bef0?)

System info

TelegrafVersion:1.26, OS:Mariner,2.0.20230407

Docker

No response

Steps to reproduce

  1. Install telegraf with 1.26 version 2.add the input kubernetes-service for cadvisor

Expected behavior

Telegraf should not panic. It should be up and running

Actual behavior

Telegraf is crashing

Additional info

No response

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Comments: 17 (10 by maintainers)

Most upvoted comments

@samskan , your sample config shows monitor_kubernetes_pods = false, but it disables code path to watchPod where problem arises. From the stacktraces, there should be multiple instances of prometheus plugin, could you provide full config please?

@redbaron based on the above crash it looks like we may need to add some mutex around access to the pods. Does it make sense to do this in the UpdateFunc when registering a pod?