istio: Duplicate labels on telemetry metric export
Describe the bug We just installed Istio 1.0.4 using the Helm chart instructions on k8s 1.11.4 and it has been running several days. We also hooked it up to Datadog metric collection using their instructions. Metrics were coming through ok for a couple of days and all of a sudden stopped. Looking into why, it appears the telemetry is returning dozens of the below error about duplicate labels. The only place I found such an error so far is at https://github.com/prometheus/node_exporter/blob/f9dd8e9b8c29f6c9da676036d8a8c587326bb710/vendor/github.com/prometheus/client_golang/prometheus/registry.go#L845
$ curl -v http://istio-telemetry.istio-system:42422/metrics
* Trying 10.3.237.129...
* TCP_NODELAY set
* Connected to istio-telemetry.istio-system (10.3.237.129) port 42422 (#0)
> GET /metrics HTTP/1.1
> Host: istio-telemetry.istio-system:42422
> User-Agent: curl/7.59.0
> Accept: */*
>
< HTTP/1.1 500 Internal Server Error
< Content-Type: text/plain; charset=utf-8
< X-Content-Type-Options: nosniff
< Date: Mon, 03 Dec 2018 05:14:42 GMT
< Transfer-Encoding: chunked
<
An error has occurred during metrics gathering:
47 error(s) occurred:
* collected metric "istio_tcp_sent_bytes_total" { label:<name:"connection_security_policy" value:"none" > label:<name:"connection_security_policy" value:"none" > label:<name:"destination_app" value:"cms" > label:<name:"destination_service" value:"unknown" > label:<name:"destination_service_name" value:"unknown" > label:<name:"destination_service_namespace" value:"unknown" > label:<name:"destination_version" value:"81bb5c3" > label:<name:"destination_workload_namespace" value:"int" > label:<name:"destination_principal" value:"unknown" > label:<name:"reporter" value:"destination" > label:<name:"source_app" value:"frontend" > label:<name:"source_principal" value:"unknown" > label:<name:"source_version" value:"06195b2" > label:<name:"source_workload" value:"frontend-fix-update-blog-article-layout" > label:<name:"source_workload_namespace" value:"int" > counter:<value:1.088915e+06 > } has two or more labels with the same name: connection_security_policy
* collected metric "istio_tcp_sent_bytes_total" { label:<name:"connection_security_policy" value:"unknown" > label:<name:"destination_app" value:"unknown" > label:<name:"destination_app" value:"unknown" > label:<name:"destination_service" value:"aerospike-3" > label:<name:"destination_service_name" value:"aerospike-3" > label:<name:"destination_service_namespace" value:"int-shared" > label:<name:"destination_version" value:"unknown" > label:<name:"destination_workload" value:"unknown" > label:<name:"destination_workload_namespace" value:"unknown" > label:<name:"reporter" value:"source" > label:<name:"source_app" value:"entities-manager" > label:<name:"source_principal" value:"unknown" > label:<name:"source_version" value:"unknown" > label:<name:"source_workload" value:"entities-manager-statefulset-v1-next" > label:<name:"source_workload_namespace" value:"int-shared" > counter:<value:2.362107154e+09 > } has two or more labels with the same name: destination_app
* collected metric "istio_tcp_sent_bytes_total" { label:<name:"connection_security_policy" value:"none" > label:<name:"destination_app" value:"be-api" > label:<name:"destination_app" value:"be-api" > label:<name:"destination_principal" value:"unknown" > label:<name:"destination_service" value:"unknown" > label:<name:"destination_service_name" value:"unknown" > label:<name:"destination_service_namespace" value:"unknown" > label:<name:"destination_workload" value:"be-api-next" > label:<name:"destination_workload_namespace" value:"int" > label:<name:"reporter" value:"destination" > label:<name:"source_app" value:"unknown" > label:<name:"source_principal" value:"unknown" > label:<name:"source_version" value:"unknown" > label:<name:"source_workload" value:"unknown" > label:<name:"source_workload_namespace" value:"unknown" > counter:<value:3065 > } has two or more labels with the same name: destination_app
* collected metric "istio_tcp_sent_bytes_total" { label:<name:"connection_security_policy" value:"none" > label:<name:"destination_app" value:"client" > label:<name:"destination_app" value:"client" > label:<name:"destination_principal" value:"unknown" > label:<name:"destination_service" value:"unknown" > label:<name:"destination_service_name" value:"unknown" > label:<name:"destination_service_namespace" value:"unknown" > label:<name:"destination_version" value:"5d746da" > label:<name:"destination_workload" value:"client-master" > label:<name:"reporter" value:"destination" > label:<name:"source_app" value:"frontend" > label:<name:"source_principal" value:"unknown" > label:<name:"source_version" value:"6559340" > label:<name:"source_workload" value:"frontend-fix-outrights-loading" > label:<name:"source_workload_namespace" value:"int" > counter:<value:44973 > } has two or more labels with the same name: destination_app
* collected metric "istio_tcp_received_bytes_total" { label:<name:"connection_security_policy" value:"unknown" > label:<name:"destination_app" value:"unknown" > label:<name:"destination_app" value:"unknown" > label:<name:"destination_app" value:"unknown" > label:<name:"destination_principal" value:"unknown" > label:<name:"destination_service" value:"aerospike-3" > label:<name:"destination_service_name" value:"aerospike-3" > label:<name:"destination_workload" value:"unknown" > label:<name:"destination_workload_namespace" value:"unknown" > label:<name:"reporter" value:"source" > label:<name:"source_app" value:"distribution-engine" > label:<name:"source_principal" value:"unknown" > label:<name:"source_version" value:"9ec06fb" > label:<name:"source_workload" value:"distribution-engine-next" > label:<name:"source_workload_namespace" value:"int" > counter:<value:1.15981052e+08 > } has two or more labels with the same name: destination_app
* collected metric "istio_tcp_received_bytes_total" { label:<name:"destination_app" value:"unknown" > label:<name:"destination_principal" value:"unknown" > label:<name:"destination_service" value:"aerospike-2" > label:<name:"destination_service_name" value:"aerospike-2" > label:<name:"destination_service_namespace" value:"int" > label:<name:"destination_service_namespace" value:"int" > label:<name:"destination_workload" value:"unknown" > label:<name:"connection_security_policy" value:"unknown" > label:<name:"destination_workload_namespace" value:"unknown" > label:<name:"reporter" value:"source" > label:<name:"source_app" value:"api" > label:<name:"source_principal" value:"unknown" > label:<name:"source_version" value:"8b52afe" > label:<name:"source_workload" value:"api-chore-clusterization" > label:<name:"source_workload_namespace" value:"int" > counter:<value:4425 > } has two or more labels with the same name: destination_service_namespace
[...]
Expected behavior
HTTP 200 for metrics as per the first few days of collection.
Steps to reproduce the bug
Not totally sure how to reproduce yet, I thought I would file now in case anyone experiences the same issue. I’d like to help in whatever way I can to try to reproduce it.
Version
Istio 1.0.4 k8s 1.11.4
Installation
Official Helm chart
Environment
AWS, Container Linux (kube-aws installer)
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Comments: 21 (10 by maintainers)
Good news: The issue in client_golang has been identified. https://github.com/prometheus/client_golang/pull/513 should fix it. It will be part of v0.9.2, which I’ll release once the fix is merged.
@beorn7 I sent a provisional PR for your consideration and to help in tracking the issue: https://github.com/prometheus/client_golang/pull/511.