telegraf: Kubernetes input plugin not working (deprecated /stats/summary endpoint?)
Relevant telegraf.conf:
[[inputs.kubernetes]]
url = "https://kubernetes.default.svc"
bearer_token = "/var/run/secrets/kubernetes.io/serviceaccount/token"
insecure_skip_verify = true
System info:
Ubuntu 18.04 k3s v1.17.2+k3s1 Telegraf image: telegraf:1.12.2
Steps to reproduce:
Configure the Kubernetes input plugin in a Telegraf container.
Expected behavior:
The plugin should colect the Kubernetes metrics.
Actual behavior:
The Telegraf plugin log shows that Kubernetes API server returned a 403 Forbiden error code. After adding to the RBAC Service Account of the pod the following rules:
rules:
- nonResourceURLs: ["/stats", "/stats/*"]
verbs: ["get", "list"]
the error is 404. No metrics are being collected.
Additional info:
The input plugin kube_intentory seems to be working just fine but the plugin kubernetes is not capable of obtaining any metric, as described. Looking at the code, the kubernetes input pluging calls the /stats/summary Kubernetes API server endpoint.
/stats/summary endpoint was planned to be depracated (https://github.com/kubernetes/kubernetes/issues/68522) but it seems that it is already removed.
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Reactions: 12
- Comments: 18 (8 by maintainers)
Thanks at all, I found the problem, I was using a Deployment defintion, instead of Daemonset. Related problem when you change to daemonset is like commented @alanjcastonguay or @rawkode , you have to use NODEIP:10250, like this:
So I have changed my yaml for the official helm chart like recommended @nsteinmetz because I had to change/add too params in my yaml. The official chart is OK, deploy in the namespace that you need and collect all metrics ok.
Conclusion: IF you need to monitor a kubernetes cluster the better option is deploy offical helm chart telegraf-ds. This monitorize by Node inside the cluster (deploy a telegraf agent in each one via daemonset) with only one deploy definition.
https://github.com/influxdata/helm-charts/tree/master/charts/telegraf-ds
Try creating a Service Account and ClusterRoleBinding for telegraf using the yaml configuration below. Mind the namespace.
Faced similar issue, after applying the yaml telegraf was able to authenticate in the cluster to scrape the metrics.
We should put together some documentation about what needs done to switch to the replacement and anyway we can smooth the transition. I could definitely use some help from the community on this.
I am assuming similar metrics can be captured with the prometheus input plugin. It would be good to gather a listing of the new metrics because switching over will likely change all metrics and break dashboards/alerts.
It also looks like it should also be possible to use the
--enable-cadvisor-endpoints
flag to reenable the endpoint, it would be good to describe how this can be set as well.