istio: Revision Upgrades Break Validation Webhook

Bug description Upgrading from a default revision (istiod) to a new revision (istiod-foo) may result in all istio config becoming unchangeable, blocking upgrades and downgrades, after the original revision (istiod) is removed.

[ ] Docs [x] Installation [ ] Networking [ ] Performance and Scalability [ ] Extensions and Telemetry [ ] Security [ ] Test and Release [ ] User Experience [ ] Developer Infrastructure [ ] Upgrade

Expected behavior Validation moved to new revision OR Validation failurePolicy = Ignore.

Steps to reproduce the bug

./bin/istioctl install
./bin/istioctl install --revision foo
// wait for ValidatingWebhookConfiguration failurePolicy=Fail, usually less than a minute
kubectl delete service -n istio-system istiod
kubectl apply -f samples/bookinfo/networking/virtual-service-all-v1.yaml 
// any istio resources will fail to modify

Version (include the output of istioctl version --remote and kubectl version --short and helm version if you used Helm) Confirmed in 1.7.4. Suspected in all versions since 1.6

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Reactions: 5
  • Comments: 15 (11 by maintainers)

Most upvoted comments

The issue is still present in v1.9.1. To get validations to work with revisions, I add an extra service called istiod and point it to the current revision.

apiVersion: v1
kind: Service
metadata:
  name: istiod
  namespace: istio-system
spec:
  ports:
    - name: https-webhook
      port: 443
      protocol: TCP
      targetPort: 15017
  selector:
    app: istiod
    istio.io/rev: 1-9-1

I just update the selector for this service whenever I switch to a new revision.

Running 1.10 I ran into this same issue. I notice the docs still link to this issue calling it out as a bug. https://istio.io/latest/docs/setup/upgrade/canary/ Should this issue be re-opened until a fix is created?

It also states we can point the service to the revision. Could that cause issues if we upgrade to a new revision like 1.11 and there is a difference between 1.10 and 1.11 and all 1.10 namespaces will have issues?

Note: this call out is removed in 1.11

I understand the TOC voted this not a release blocker which is unfortunate. This functionality being broken prevents canary upgrades. Unless there are some known workarounds? or order of operations to prevent getting into this state?