istio: Unable to add custom dimension to metrics
Bug description
Hello,
I’m following the documentation in https://istio.io/latest/docs/tasks/observability/metrics/customize-metrics to attempt to add a dimension to istio_requests_total
on the outbound sidecar, that indicates if the request was sampled.
I have added the following block to our installer options:
telemetry:
enabled: true
v2:
enabled: true
prometheus:
configOverride:
outboundSidecar:
debug: false
stat_prefix: istio
metrics:
- name: requests_total
dimensions:
sampled: request.headers.x-b3-sampled
Which gives a complete set of options (merging with our already existing configuration to remove some cardinality):
telemetry:
enabled: true
v2:
enabled: true
prometheus:
configOverride:
outboundSidecar:
debug: false
stat_prefix: istio
metrics:
- name: requests_total
dimensions:
sampled: request.headers.x-b3-sampled
- tags_to_remove:
- destination_canonical_service
- source_canonical_service
- destination_principal
- source_principal
- connection_security_policy
- grpc_response_status
- source_version
- destination_version
- request_protocol
- source_canonical_revision
- destination_canonical_revision
- source_cluster
- destination_cluster
- destination_app
- name: request_duration_milliseconds
tags_to_remove:
- response_code
- response_flags
- source_cluster
- destination_cluster
- name: request_bytes
tags_to_remove:
- response_code
- response_flags
- source_cluster
- destination_cluster
- name: response_bytes
tags_to_remove:
- response_code
- response_flags
- source_cluster
- destination_cluster
This appears to modify the EnvoyFitler
as expected:
apiVersion: networking.istio.io/v1alpha3
kind: EnvoyFilter
metadata:
name: stats-filter-1.10
namespace: istio-system
labels:
istio.io/rev: default
helm-platform-istio: 1.10.2
spec:
configPatches:
- applyTo: HTTP_FILTER
match:
context: SIDECAR_OUTBOUND
proxy:
proxyVersion: "^1\\.10.*"
listener:
filterChain:
filter:
name: envoy.filters.network.http_connection_manager
subFilter:
name: envoy.filters.http.router
patch:
operation: INSERT_BEFORE
value:
name: istio.stats
typed_config:
"@type": type.googleapis.com/udpa.type.v1.TypedStruct
type_url: type.googleapis.com/envoy.extensions.filters.http.wasm.v3.Wasm
value:
config:
root_id: stats_outbound
configuration:
"@type": type.googleapis.com/google.protobuf.StringValue
value: '{"debug":false,"metrics":[{"dimensions":{"sampled":"request.headers.x-b3-sampled"},"name":"requests_total"},{"tags_to_remove":["destination_canonical_service","source_canonical_service","destination_principal","source_principal","connection_security_policy","grpc_response_status","source_version","destination_version","request_protocol","source_canonical_revision","destination_canonical_revision","source_cluster","destination_cluster","destination_app"]},{"name":"request_duration_milliseconds","tags_to_remove":["response_code","response_flags","source_cluster","destination_cluster"]},{"name":"request_bytes","tags_to_remove":["response_code","response_flags","source_cluster","destination_cluster"]},{"name":"response_bytes","tags_to_remove":["response_code","response_flags","source_cluster","destination_cluster"]}],"stat_prefix":"istio"}
'
vm_config:
vm_id: stats_outbound
runtime: envoy.wasm.runtime.null
code:
local:
inline_string: envoy.wasm.stats
However when the updated EnvoyFilter
is applied, we lose metrics entirely (istio_requests_total
for all sidecars stops working).
I was speaking to @douglas-reid ; and he said to check the istio-proxy
logs, however there is nothing in them:
❯ k logs istio-test-app-1-69d7644787-fwbb5 -c istio-proxy
{"timestamp":"2021-07-06T16:30:05+00:00","level":"info","module":"pilot-agent-agent","message":"Starting custom autotrader pilot-agent wrapper..."}
{"timestamp":"2021-07-06T16:30:05+00:00","level":"info","module":"pilot-agent-agent","message":"Pilot-agent args: proxy"}
{"timestamp":"2021-07-06T16:30:05+00:00","level":"info","module":"pilot-agent-agent","message":"Pilot agent started with pid: 9"}
{"level":"info","time":"2021-07-06T16:30:05.494489Z","scope":"citadelclient","msg":"Citadel client using custom root cert: istiod.istio-system.svc:15012"}
{"level":"info","time":"2021-07-06T16:30:05.547565Z","scope":"ads","msg":"All caches have been synced up in 58.411564ms, marking server ready"}
{"level":"info","time":"2021-07-06T16:30:05.560215Z","scope":"sds","msg":"SDS server for workload certificates started, listening on \"./etc/istio/proxy/SDS\""}
{"level":"info","time":"2021-07-06T16:30:05.560302Z","scope":"sds","msg":"Start SDS grpc server"}
{"level":"info","time":"2021-07-06T16:30:05.560394Z","scope":"xdsproxy","msg":"Initializing with upstream address \"istiod.istio-system.svc:15012\" and cluster \"Kubernetes\""}
{"level":"info","time":"2021-07-06T16:30:05.818916Z","scope":"xdsproxy","msg":"connected to upstream XDS server: istiod.istio-system.svc:15012"}
{"level":"info","time":"2021-07-06T16:30:05.828389Z","scope":"cache","msg":"generated new workload certificate","latency":280145920,"ttl":86399171622729}
{"level":"info","time":"2021-07-06T16:30:05.828457Z","scope":"cache","msg":"Root cert has changed, start rotating root cert"}
{"level":"info","time":"2021-07-06T16:30:05.828479Z","scope":"ads","msg":"XDS: Incremental Pushing:0 ConnectedEndpoints:0 Version:"}
{"level":"info","time":"2021-07-06T16:30:05.828519Z","scope":"cache","msg":"returned workload trust anchor from cache","ttl":86399171482242}
{"level":"info","time":"2021-07-06T16:30:05.868540Z","scope":"ads","msg":"ADS: new connection for node:sidecar~10.206.1.87~istio-test-app-1-69d7644787-fwbb5.istio-test-app-1~istio-test-app-1.svc.cluster.local-1"}
{"level":"info","time":"2021-07-06T16:30:05.868624Z","scope":"cache","msg":"returned workload trust anchor from cache","ttl":86399131379030}
{"level":"info","time":"2021-07-06T16:30:05.868819Z","scope":"ads","msg":"ADS: new connection for node:sidecar~10.206.1.87~istio-test-app-1-69d7644787-fwbb5.istio-test-app-1~istio-test-app-1.svc.cluster.local-2"}
{"level":"info","time":"2021-07-06T16:30:05.868913Z","scope":"cache","msg":"returned workload certificate from cache","ttl":86399131090167}
{"level":"info","time":"2021-07-06T16:30:05.869044Z","scope":"sds","msg":"SDS: PUSH","resource":"ROOTCA"}
{"level":"info","time":"2021-07-06T16:30:05.869059Z","scope":"sds","msg":"SDS: PUSH","resource":"default"}
So it seems to fail silently.
[ ] Docs [ ] Installation [ ] Networking [ ] Performance and Scalability [x] Extensions and Telemetry [ ] Security [ ] Test and Release [x] User Experience [ ] Developer Infrastructure [ ] Upgrade
Expected behavior Either additional dimensions to work, or, output an error why
Steps to reproduce the bug As above
Version (include the output of istioctl version --remote
and kubectl version --short
and helm version --short
if you used Helm)
1.10.2
How was Istio installed?
Environment where the bug was observed (cloud vendor, OS, etc)
Additionally, please consider running istioctl bug-report
and attach the generated cluster-state tarball to this issue.
Refer cluster state archive for more details.
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Reactions: 4
- Comments: 35 (30 by maintainers)
I just got hit by this issue again and forgot that I’d raised it before 😄
It’s pretty frustrating that adding:
Breaks all metrics:
Until they are rolling restarted.
In the above example; I don’t want to enable the
path
dimension on all workloads, only some. So the expectation would be that I can define it in theistiooperator
and it only take affect on workloads that I add:sidecar.istio.io/extraStatTags: path
Really this limitation makes customising istio metrics as a feature unusable for existing systems, it’s unrealistic that in order to add a custom dimension in a production environment, that i’m subsequently going to break metrics for 450+ workloads until they’re all rolling restarted - which is a shame as i’d love to experiment with some metrics customisation.
@douglas-reid
For simplicity, have attached the GATEWAY section only.
- applyTo: HTTP_FILTER match: context: GATEWAY proxy: proxyVersion: '^1.14.’ listener: filterChain: filter: name: “envoy.filters.network.http_connection_manager” subFilter: name: “envoy.filters.http.router” patch: operation: INSERT_BEFORE value: name: istio.stats typed_config: “@type”: type.googleapis.com/udpa.type.v1.TypedStruct type_url: type.googleapis.com/envoy.extensions.filters.http.wasm.v3.Wasm value: config: root_id: stats_outbound configuration: “@type”: “type.googleapis.com/google.protobuf.StringValue” value: | { “debug”: “false”, “stat_prefix”: “istio”, “disable_host_header_fallback”: true, “metrics”: [ { “dimensions”: { “destination_url_path”: “request.url_path” } } ] } vm_config: vm_id: stats_outbound runtime: envoy.wasm.runtime.null code: local: inline_string: envoy.wasm.statsIn TCP section,
- applyTo: NETWORK_FILTER match: context: GATEWAY proxy: proxyVersion: ‘^1.14.*’ listener: filterChain: filter: name: “envoy.filters.network.tcp_proxy” patch: operation: INSERT_BEFORE value: name: istio.stats typed_config: “@type”: type.googleapis.com/udpa.type.v1.TypedStruct type_url: type.googleapis.com/envoy.extensions.filters.network.wasm.v3.Wasm value: config: root_id: stats_outbound configuration: “@type”: “type.googleapis.com/google.protobuf.StringValue” value: | { “debug”: “false”, “stat_prefix”: “istio”, “metrics”: [ { “dimensions”: { “destination_url_path”: “request.url_path” } } ] } vm_config: vm_id: tcp_stats_outbound runtime: envoy.wasm.runtime.null code: local: inline_string: “envoy.wasm.stats”`Can you please correct where am I going wrong here ?
Ok. So that was my original question, I guess. Is the not allowing dynamic tags purely a technical limitation, a policy one, or both? Having Wasm supply dynamic tags seems useful beyond just Istio.
So I decided to look at one of the envoys directly with a port-forward and observed this in the metrics: