istio: Unable to add custom dimension to metrics

Bug description Hello, I’m following the documentation in https://istio.io/latest/docs/tasks/observability/metrics/customize-metrics to attempt to add a dimension to istio_requests_total on the outbound sidecar, that indicates if the request was sampled.

I have added the following block to our installer options:

    telemetry:
      enabled: true
      v2:
        enabled: true
        prometheus:
          configOverride:
            outboundSidecar:
              debug: false
              stat_prefix: istio
              metrics:
              - name: requests_total
                dimensions:
                  sampled: request.headers.x-b3-sampled

Which gives a complete set of options (merging with our already existing configuration to remove some cardinality):

    telemetry:
      enabled: true
      v2:
        enabled: true
        prometheus:
          configOverride:
            outboundSidecar:
              debug: false
              stat_prefix: istio
              metrics:
              - name: requests_total
                dimensions:
                  sampled: request.headers.x-b3-sampled
              - tags_to_remove:
                - destination_canonical_service
                - source_canonical_service
                - destination_principal
                - source_principal
                - connection_security_policy
                - grpc_response_status
                - source_version
                - destination_version
                - request_protocol
                - source_canonical_revision
                - destination_canonical_revision
                - source_cluster
                - destination_cluster
                - destination_app
              - name: request_duration_milliseconds
                tags_to_remove:
                - response_code
                - response_flags
                - source_cluster
                - destination_cluster
              - name: request_bytes
                tags_to_remove:
                - response_code
                - response_flags
                - source_cluster
                - destination_cluster
              - name: response_bytes
                tags_to_remove:
                - response_code
                - response_flags
                - source_cluster
                - destination_cluster

This appears to modify the EnvoyFitler as expected:

apiVersion: networking.istio.io/v1alpha3
kind: EnvoyFilter
metadata:
  name: stats-filter-1.10
  namespace: istio-system
  labels:
    istio.io/rev: default
    helm-platform-istio: 1.10.2
spec:
  configPatches:
  - applyTo: HTTP_FILTER
    match:
      context: SIDECAR_OUTBOUND
      proxy:
        proxyVersion: "^1\\.10.*"
      listener:
        filterChain:
          filter:
            name: envoy.filters.network.http_connection_manager
            subFilter:
              name: envoy.filters.http.router
    patch:
      operation: INSERT_BEFORE
      value:
        name: istio.stats
        typed_config:
          "@type": type.googleapis.com/udpa.type.v1.TypedStruct
          type_url: type.googleapis.com/envoy.extensions.filters.http.wasm.v3.Wasm
          value:
            config:
              root_id: stats_outbound
              configuration:
                "@type": type.googleapis.com/google.protobuf.StringValue
                value: '{"debug":false,"metrics":[{"dimensions":{"sampled":"request.headers.x-b3-sampled"},"name":"requests_total"},{"tags_to_remove":["destination_canonical_service","source_canonical_service","destination_principal","source_principal","connection_security_policy","grpc_response_status","source_version","destination_version","request_protocol","source_canonical_revision","destination_canonical_revision","source_cluster","destination_cluster","destination_app"]},{"name":"request_duration_milliseconds","tags_to_remove":["response_code","response_flags","source_cluster","destination_cluster"]},{"name":"request_bytes","tags_to_remove":["response_code","response_flags","source_cluster","destination_cluster"]},{"name":"response_bytes","tags_to_remove":["response_code","response_flags","source_cluster","destination_cluster"]}],"stat_prefix":"istio"}

                  '
              vm_config:
                vm_id: stats_outbound
                runtime: envoy.wasm.runtime.null
                code:
                  local:
                    inline_string: envoy.wasm.stats

However when the updated EnvoyFilter is applied, we lose metrics entirely (istio_requests_total for all sidecars stops working).

I was speaking to @douglas-reid ; and he said to check the istio-proxy logs, however there is nothing in them:

❯ k logs istio-test-app-1-69d7644787-fwbb5 -c istio-proxy
{"timestamp":"2021-07-06T16:30:05+00:00","level":"info","module":"pilot-agent-agent","message":"Starting custom autotrader pilot-agent wrapper..."}
{"timestamp":"2021-07-06T16:30:05+00:00","level":"info","module":"pilot-agent-agent","message":"Pilot-agent args: proxy"}
{"timestamp":"2021-07-06T16:30:05+00:00","level":"info","module":"pilot-agent-agent","message":"Pilot agent started with pid: 9"}
{"level":"info","time":"2021-07-06T16:30:05.494489Z","scope":"citadelclient","msg":"Citadel client using custom root cert: istiod.istio-system.svc:15012"}
{"level":"info","time":"2021-07-06T16:30:05.547565Z","scope":"ads","msg":"All caches have been synced up in 58.411564ms, marking server ready"}
{"level":"info","time":"2021-07-06T16:30:05.560215Z","scope":"sds","msg":"SDS server for workload certificates started, listening on \"./etc/istio/proxy/SDS\""}
{"level":"info","time":"2021-07-06T16:30:05.560302Z","scope":"sds","msg":"Start SDS grpc server"}
{"level":"info","time":"2021-07-06T16:30:05.560394Z","scope":"xdsproxy","msg":"Initializing with upstream address \"istiod.istio-system.svc:15012\" and cluster \"Kubernetes\""}
{"level":"info","time":"2021-07-06T16:30:05.818916Z","scope":"xdsproxy","msg":"connected to upstream XDS server: istiod.istio-system.svc:15012"}
{"level":"info","time":"2021-07-06T16:30:05.828389Z","scope":"cache","msg":"generated new workload certificate","latency":280145920,"ttl":86399171622729}
{"level":"info","time":"2021-07-06T16:30:05.828457Z","scope":"cache","msg":"Root cert has changed, start rotating root cert"}
{"level":"info","time":"2021-07-06T16:30:05.828479Z","scope":"ads","msg":"XDS: Incremental Pushing:0 ConnectedEndpoints:0 Version:"}
{"level":"info","time":"2021-07-06T16:30:05.828519Z","scope":"cache","msg":"returned workload trust anchor from cache","ttl":86399171482242}
{"level":"info","time":"2021-07-06T16:30:05.868540Z","scope":"ads","msg":"ADS: new connection for node:sidecar~10.206.1.87~istio-test-app-1-69d7644787-fwbb5.istio-test-app-1~istio-test-app-1.svc.cluster.local-1"}
{"level":"info","time":"2021-07-06T16:30:05.868624Z","scope":"cache","msg":"returned workload trust anchor from cache","ttl":86399131379030}
{"level":"info","time":"2021-07-06T16:30:05.868819Z","scope":"ads","msg":"ADS: new connection for node:sidecar~10.206.1.87~istio-test-app-1-69d7644787-fwbb5.istio-test-app-1~istio-test-app-1.svc.cluster.local-2"}
{"level":"info","time":"2021-07-06T16:30:05.868913Z","scope":"cache","msg":"returned workload certificate from cache","ttl":86399131090167}
{"level":"info","time":"2021-07-06T16:30:05.869044Z","scope":"sds","msg":"SDS: PUSH","resource":"ROOTCA"}
{"level":"info","time":"2021-07-06T16:30:05.869059Z","scope":"sds","msg":"SDS: PUSH","resource":"default"}

So it seems to fail silently.

[ ] Docs [ ] Installation [ ] Networking [ ] Performance and Scalability [x] Extensions and Telemetry [ ] Security [ ] Test and Release [x] User Experience [ ] Developer Infrastructure [ ] Upgrade

Expected behavior Either additional dimensions to work, or, output an error why

Steps to reproduce the bug As above

Version (include the output of istioctl version --remote and kubectl version --short and helm version --short if you used Helm) 1.10.2

How was Istio installed?

Environment where the bug was observed (cloud vendor, OS, etc)

Additionally, please consider running istioctl bug-report and attach the generated cluster-state tarball to this issue. Refer cluster state archive for more details.

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Reactions: 4
  • Comments: 35 (30 by maintainers)

Most upvoted comments

I just got hit by this issue again and forgot that I’d raised it before 😄

It’s pretty frustrating that adding:

              - name: requests_total
                dimensions:
                  path: request.path

Breaks all metrics:

Screenshot 2021-11-13 at 11 10 10

Until they are rolling restarted.

In the above example; I don’t want to enable the path dimension on all workloads, only some. So the expectation would be that I can define it in the istiooperator and it only take affect on workloads that I add:

sidecar.istio.io/extraStatTags: path

Really this limitation makes customising istio metrics as a feature unusable for existing systems, it’s unrealistic that in order to add a custom dimension in a production environment, that i’m subsequently going to break metrics for 450+ workloads until they’re all rolling restarted - which is a shame as i’d love to experiment with some metrics customisation.

@douglas-reid

  1. Created a new YAML, PFB Starting section of YAML `apiVersion: networking.istio.io/v1alpha3 kind: EnvoyFilter metadata: name: stats-filter-1.14 namespace: istio-system labels: istio.io/rev: default spec: configPatches:
    • applyTo: HTTP_FILTER match: context: SIDECAR_OUTBOUND proxy: proxyVersion: ‘^1.14.'For simplicity, have attached the GATEWAY section only. - applyTo: HTTP_FILTER match: context: GATEWAY proxy: proxyVersion: '^1.14.’ listener: filterChain: filter: name: “envoy.filters.network.http_connection_manager” subFilter: name: “envoy.filters.http.router” patch: operation: INSERT_BEFORE value: name: istio.stats typed_config: “@type”: type.googleapis.com/udpa.type.v1.TypedStruct type_url: type.googleapis.com/envoy.extensions.filters.http.wasm.v3.Wasm value: config: root_id: stats_outbound configuration: “@type”: “type.googleapis.com/google.protobuf.StringValue” value: | { “debug”: “false”, “stat_prefix”: “istio”, “disable_host_header_fallback”: true, “metrics”: [ { “dimensions”: { “destination_url_path”: “request.url_path” } } ] } vm_config: vm_id: stats_outbound runtime: envoy.wasm.runtime.null code: local: inline_string: envoy.wasm.statsIn TCP section, - applyTo: NETWORK_FILTER match: context: GATEWAY proxy: proxyVersion: ‘^1.14.*’ listener: filterChain: filter: name: “envoy.filters.network.tcp_proxy” patch: operation: INSERT_BEFORE value: name: istio.stats typed_config: “@type”: type.googleapis.com/udpa.type.v1.TypedStruct type_url: type.googleapis.com/envoy.extensions.filters.network.wasm.v3.Wasm value: config: root_id: stats_outbound configuration: “@type”: “type.googleapis.com/google.protobuf.StringValue” value: | { “debug”: “false”, “stat_prefix”: “istio”, “metrics”: [ { “dimensions”: { “destination_url_path”: “request.url_path” } } ] } vm_config: vm_id: tcp_stats_outbound runtime: envoy.wasm.runtime.null code: local: inline_string: “envoy.wasm.stats”`
  2. Restarted ingress gateway pod in istio-system namespace.
  3. curl -s -I -HHost:httpbin-example.com “http://$INGRESS_HOST:$INGRESS_PORT/status/200”
  4. istio_requests_total{app=“httpbin”, businessunit=“Services”, connection_security_policy=“mutual_tls”, destination_app=“httpbin”, destination_canonical_revision=“v1”, destination_canonical_service=“httpbin”, destination_cluster=“Kubernetes”, destination_principal=“spiffe://cluster.local/ns/manual-injection/sa/httpbin”, destination_service=“httpbin.manual-injection.svc.cluster.local”, destination_service_name=“httpbin”, destination_service_namespace=“manual-injection”, destination_version=“v1”, destination_workload=“httpbin”, destination_workload_namespace=“manual-injection”, environment=“infra”, instance=“10.169.37.34:15020”, istio_io_rev=“default”, job=“kubernetes-pods”, k8scluster=“eke”, kubernetes_namespace=“manual-injection”, kubernetes_pod_name=“httpbin-665f7656b7-clw4l”, pod_template_hash=“665f8756b7”, prometheus_cluster=“prometheus”, reporter=“destination”, request_protocol=“http”, response_code=“200”, response_flags=“-”, security_istio_io_tlsMode=“istio”, service_istio_io_canonical_name=“httpbin”, service_istio_io_canonical_revision=“v1”, sidecar_istio_io_inject=“true”, source_app=“istio-ingressgateway”, source_canonical_revision=“latest”, source_canonical_service=“istio-ingressgateway”, source_cluster=“Kubernetes”, source_principal=“spiffe://cluster.local/ns/istio-system/sa/istio-ingressgateway-service-account”, source_version=“unknown”, source_workload=“istio-ingressgateway”, source_workload_namespace=“istio-system”,version=“v1”} Clearly, there is no destination_url_path in the output metric.

Can you please correct where am I going wrong here ?

@douglas-reid Envoy does not allow tag production whatsoever from dynamic extensions, unlike native extensions which can declare tags. The regex is a bandaid to convert a tagless metric to a metric with tags, but it costs CPU for regex. If Envoy would allow dynamic tags, then Wasm could leverage it and avoid regexes.

Ok. So that was my original question, I guess. Is the not allowing dynamic tags purely a technical limitation, a policy one, or both? Having Wasm supply dynamic tags seems useful beyond just Istio.

So I decided to look at one of the envoys directly with a port-forward and observed this in the metrics:

# TYPE envoy_sampled___unknown___istio_requests_total counter
envoy_sampled___unknown___istio_requests_total{response_code="200",reporter="source",source_workload="istio-test-app-1",source_workload_namespace="istio-test-app-1",source_app="istio-test-app-1",destination_workload="istio-test-app-2",destination_workload_namespace="istio-test-app-2",destination_service="app.istio-test-app-2.svc.cluster.local",destination_service_name="app",destination_service_namespace="istio-test-app-2",response_flags="-"} 3392
envoy_sampled___unknown___istio_requests_total{response_code="200",reporter="source",source_workload="istio-test-app-1",source_workload_namespace="istio-test-app-1",source_app="istio-test-app-1",destination_workload="istio-test-app-3",destination_workload_namespace="istio-test-app-3",destination_service="app.istio-test-app-3.svc.cluster.local",destination_service_name="app",destination_service_namespace="istio-test-app-3",response_flags="-"} 1665