istio: Metrics on workload certificate expiry
Describe the feature request
Since istio 1.7 workload certificates are not stored as kubernetes secrets anymore, but rather on the sidecar proxy container. Viewing them works with the istioctl pc secret <pod>
command. They are also exposed over the admin api under localhost:15000/certs
, where the istioctl command seams to use that endpoint in combination with port-forwarding.
We want to be able to monitor the workload certificates for possible expiry, e.g. by scraping that information from istiod.
Describe alternatives you’ve considered
An option we considered was defining values.sidecarInjectorWebhook.templates: certificate-monitor
and maybe set the values.sidecarInjectorWebhook.defaultTemplates: [sidecar, certificate-monitor]
. An app in the container would basically query localhost:15000/certs
and extract and exposes the certificate expiry, being scraped by prometheus on a regular basis. Got a working proof of concept, but the overhead on both prometheus and running a second sidecar for every istio enabled workload is unproportional, especially when having thousands of workloads.
Also one could think about opening the admin endpoint for a specific monitoring application that collects the information and exposes them to prometheus. This was discarded because of the obvious security concerns.
Affected product area (please put an X in all that apply)
[ ] Docs [ ] Installation [x] Networking [ ] Performance and Scalability [x] Extensions and Telemetry [x] Security [ ] Test and Release [] User Experience [ ] Developer Infrastructure
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 16 (12 by maintainers)
this’s already in master branch, nothing need to do.
a metric(named
citadel_server_root_cert_expiry_timestamp
) exists inpilot-discovery
right now, but the format is not friendly to ready.This only happens when generating the cert? But we’d want periodic report of the current workload cert expiry?
This looks more like a metric at istio proxy instead of istiod? Envoy has a metric exposed
envoy_server_days_until_first_cert_expiring
although it will almost always be 0 since the default expiry 24 hours. I think we can make istio-agent generate this stats.