telegraf: Telegraf crashing in 1.26 version
Relevant telegraf.conf
[[inputs.prometheus]]
metric_version = 2
monitor_kubernetes_pods = false
pod_scrape_scope = "cluster"
bearer_token = "/var/run/secrets/kubernetes.io/serviceaccount/token"
kubernetes_services = ["https://kubernetes.default.svc/api/v1/nodes/${MY_NODE_NAME}/proxy/metrics/cadvisor"]
Logs from Telegraf
fatal error: concurrent map read and map write
goroutine 30 [running]:
github.com/influxdata/telegraf/plugins/inputs/prometheus.shouldScrapePod(0xc0017c4ee0, 0xc000618d80)
/usr/src/mariner/BUILD/telegraf-1.26.0/plugins/inputs/prometheus/kubernetes.go:110 +0x1be
github.com/influxdata/telegraf/plugins/inputs/prometheus.(*Prometheus).watchPod.func1({0x62c3c60?, 0xc0017c4ee0?})
/usr/src/mariner/BUILD/telegraf-1.26.0/plugins/inputs/prometheus/kubernetes.go:150 +0x56
k8s.io/client-go/tools/cache.ResourceEventHandlerFuncs.OnAdd(...)
/usr/src/mariner/BUILD/telegraf-1.26.0/vendor/k8s.io/client-go/tools/cache/controller.go:232
k8s.io/client-go/tools/cache.(*processorListener).run.func1()
/usr/src/mariner/BUILD/telegraf-1.26.0/vendor/k8s.io/client-go/tools/cache/shared_informer.go:818 +0x134
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0x0?)
/usr/src/mariner/BUILD/telegraf-1.26.0/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:157 +0x3e
k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0xc0008c8738?, {0x6fa7c80, 0xc000813620}, 0x1, 0xc0011a43c0)
/usr/src/mariner/BUILD/telegraf-1.26.0/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:158 +0xb6
k8s.io/apimachinery/pkg/util/wait.JitterUntil(0x0?, 0x3b9aca00, 0x0, 0x0?, 0xc0008c8788?)
/usr/src/mariner/BUILD/telegraf-1.26.0/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:135 +0x89
k8s.io/apimachinery/pkg/util/wait.Until(...)
/usr/src/mariner/BUILD/telegraf-1.26.0/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:92
k8s.io/client-go/tools/cache.(*processorListener).run(0xc000645b80?)
/usr/src/mariner/BUILD/telegraf-1.26.0/vendor/k8s.io/client-go/tools/cache/shared_informer.go:812 +0x6b
k8s.io/apimachinery/pkg/util/wait.(*Group).Start.func1()
/usr/src/mariner/BUILD/telegraf-1.26.0/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:75 +0x5a
created by k8s.io/apimachinery/pkg/util/wait.(*Group).Start
/usr/src/mariner/BUILD/telegraf-1.26.0/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:73 +0x85
goroutine 1 [semacquire]:
sync.runtime_Semacquire(0xc000b9bef0?)
System info
TelegrafVersion:1.26, OS:Mariner,2.0.20230407
Docker
No response
Steps to reproduce
- Install telegraf with 1.26 version 2.add the input kubernetes-service for cadvisor
…
Expected behavior
Telegraf should not panic. It should be up and running
Actual behavior
Telegraf is crashing
Additional info
No response
About this issue
- Original URL
- State: closed
- Created a year ago
- Comments: 17 (10 by maintainers)
@samskan , your sample config shows
monitor_kubernetes_pods = false
, but it disables code path towatchPod
where problem arises. From the stacktraces, there should be multiple instances ofprometheus
plugin, could you provide full config please?@redbaron based on the above crash it looks like we may need to add some mutex around access to the pods. Does it make sense to do this in the
UpdateFunc
when registering a pod?