istio: Upgrading from Istio 1.9.X to Istio 1.11.X results in a duplicate MutatingWebHook configuration
Bug Description
We recently upgraded 100+ clusters from Istio 1.9.6 to 1.11.4. We had 2 issues, one is issue #35162, the other is duplication of a MutatingWebHook configuration. When running istioctl analyze -n xxxxx
on a namespace, I get this:
Error [IST0139] (MutatingWebhookConfiguration istio-sidecar-injector) Webhook overlaps with others: [istio-sidecar-injector/namespace.sidecar-injector.istio.io]. This may cause injection to occur twice.
Error [IST0139] (MutatingWebhookConfiguration istio-sidecar-injector) Webhook overlaps with others: [istio-sidecar-injector/sidecar-injector.istio.io]. This may cause injection to occur twice.
Upon inspecting that object, I do indeed see both configs; it appears namespace.sidecar-injector.istio.io is the new 1.11 variant and sidecar-injector.istio.io is the 1.9 variant; not sure why it wasn’t removed during the upgrade process.
This doesn’t seem to be causing issues just yet, but obviously the error is concerning. We upgraded Istio via the operator, deploying the new 1.11.4 manifest first with the operator at 0 replicas, then upgraded the operator to 1.11.4 and scaled it to 1.
Version
client version: 1.11.4
control plane version: 1.11.4
data plane version: 1.11.4 (67 proxies)
Additional Information
No response
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Reactions: 5
- Comments: 62 (25 by maintainers)
Commits related to this issue
- Removed the helm istio-operator install Since the istio-operator helm charts have been removed, this is no longer possible to do. Remove the instructions. Please also see this issue https://github.c... — committed to litong01/istio.io by litong01 2 years ago
- upgrade to istio 1.13.2 Fixes issue https://github.com/istio/istio/issues/36114#issuecomment-1096348714 preventing the operator to successfully reconcile changes in the IstioOperator custom resource — committed to Kuadrant/kuadrant-operator by eguzki 2 years ago
Experiencing the same issue after upgrading to 1.12. This must be a bug in istio. I don’t understand why this issue was closed.
Hey folks, I spent some time digging through it. There is an issue in
HelmReconciler
(legacy galley code) in all the 1.12 releases. This issue gets removed with the removal of legacy galley code in 1.13sa
, with config schema, resources read frommetadata.yaml
. In this file, webhooks are not mentioned asclusterScoped: true
and this is the cause of the issue.sa.AddReaderKubeSource()
then tries to add config sources based on metadata file. In this call flow eventually from here, it ended up prefixingistio-system
in the webhook name. IOW, in the analyzer’s snapshot, webhook gets stored by keyistio-system/<webhook-name>
istio-system
is not added as the prefix. Eventually, analyzer’s snapshot is having two entries for the same webhook,webhook-name
andistio-system/<webhook-name>
. Naturally, it then fails withWebhook overlaps with others
log from analyzer’s [Snapshotter.publish
](https://github.com/istio/istio/blob/1.12.6/galley/pkg/config/processing/snapshotter/snapshotter.go#L224):Locally I tried setting
clusterScoped:true
in metadata.yaml for mutating webhooks, it worked then.I did something like this
kubectl apply webhook.yaml
webhook.yaml -
Then restarted the istio operator pod
Also tried deleting 1.11 operator and installing 1.12 operator. This time the upgrade was successful. However, after the upgrade, I restarted the istio-operator pod, and the error is back in the logs. It seems there is a problem in istio 1.12 operator code.
Essentially, if a node where the operator is running gets rebooted or replaced, Istio operator will get stuck in the reconciliation loop.
Is it OK to use 1.11 Operator with 1.12 Istio images to upgrade Istio? I am going to test this flow to see if it helps
Update Looks like Istio 1.11 Operator with Istio 1.12 image fails to start istiod , since service account for istiod does not have enough permissions: 2022-01-24T00:36:01.087732Z error watch error in cluster Kubernetes: failed to list *v1alpha1.WasmPlugin: wasmplugins.extensions.istio.io is forbidden: User “system:serviceaccount:istio-system:istiod” cannot list resource “wasmplugins” in API group “extensions.istio.io” at the cluster scope
Im doing this all through gitops with flux, so running ad-hoc istioctl operator init commands on 150+ clusters isn’t really going to work for me. Thus, why I took the YAML from the Helm chart and did that instead.
Not sure if this is related but I am also facing a similar error using Istio Operator version
1.12.0
. While trying to follow the Istio canary upgrade guide, the Operator pod fails to sync resources when there are two deployments active at the same time.https://istio.io/latest/docs/setup/upgrade/canary/
The strange thing is, I checked the spec of both mutating webhook configurations and they don’t appear to overlap. During the canary upgrade there are two configurations, each with 2 webhooks. It seems like all are meant to be there (one is checking namespace labels and the other checks object labels). It seems like the operator throws an error because it doesn’t distinguish between different values in the match conditions (I have
1-12-0
and1-11-4
). Istio maintainers - could it be a bug in the Operator checking logic?Neither of the operators are marked as ‘healthy’, and I get this error constantly in the operator log:
My webhook configurations (some stuff removed):
Using helm to install istio-operator is no longer supported. Please see this issue. https://github.com/istio/istio/issues/36157
Im just using raw YAML to do it; here is our latest 1.11.4 yaml which is the same as what we used in 1.9, just changed the docker image version:
We use EKS, all 1.19-1.21. All had the problem.
@lukeplausin Thank you for reporting I am seeing the same issue when doing an in-place upgrade from 1.11.4 to 1.12.0 using the operator - the
IstioOperator
never reconciles and does not return a healthy status, I am however able to install 1.12 using the 1.11 operator