integrations-core: kubernetes.pods.running reporting incorrectly
Output of the info page
==============
Agent (v6.4.2)
==============
Status date: 2018-08-24 00:05:55.602398 UTC
Pid: 352
Python Version: 2.7.15
Logs:
Check Runners: 2
Log Level: WARNING
kubernetes_apiserver
--------------------
Total Runs: 53293
Metric Samples: 0, Total: 0
Events: 0, Total: 0
Service Checks: 0, Total: 0
Average Execution Time : 4ms
(a ton of unrelated and possibly sensitive stuff removed)
Additional environment details (Operating System, Cloud provider, etc): GKE - kubernetes 1.10
Steps to reproduce the issue:
Have a k8s cluster monitored by datadog where at least one pod is in a failed state (or anything that’s not running)
Describe the results you received: The metric appears to count pods in all statuses, including failed
Describe the results you expected:
Simple fix: the metric is correctly filtered to only pods where status.phase == Running
Enhancement: the metric is replaced by kubernetes.pods.count with status.phase added as a tag, allowing accurate reporting of pods in e.g. Failed state. This would enable more useful metrics and reporting.
Note that a similar metric is exposed in kubernetes_state when its configured, but that shouldn’t excuse the inaccuracy of the other one.
Additional information you deem important (e.g. issue happens only occasionally):
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Reactions: 11
- Comments: 16 (1 by maintainers)
I can confirm that during a period in which we terminate some pods and then we create new ones (not with replicas, simply scheduling pending pods on a node which now has free resources) we see 3 times the amount of real pods. This I believe happens when we launch the new pods. In our case we launch one pods per namespace but DD report 3 for 5 minutes.
I’d also like to add that we consistently see inaccurate measurements for the pods running metrics
the numbers are off by 100% during scaling periods and can take up to 10 minutes to stabilize. Turning off interpolation in the metric graphs shows a sawtooth measurement.
@ahmed-mez this issue is not resolved by setting
sum by, as I commented last year.We face this issue too.
kubernetes.pods.runningshows only a single pod most of the time. Sometime it changes to a floating-point number (up to 1.4) even when there are definitely several pods running.