istio: Upgrading from Istio 1.9.X to Istio 1.11.X results in a duplicate MutatingWebHook configuration

Bug Description

We recently upgraded 100+ clusters from Istio 1.9.6 to 1.11.4. We had 2 issues, one is issue #35162, the other is duplication of a MutatingWebHook configuration. When running istioctl analyze -n xxxxx on a namespace, I get this:

Error [IST0139] (MutatingWebhookConfiguration istio-sidecar-injector) Webhook overlaps with others: [istio-sidecar-injector/namespace.sidecar-injector.istio.io]. This may cause injection to occur twice.
Error [IST0139] (MutatingWebhookConfiguration istio-sidecar-injector) Webhook overlaps with others: [istio-sidecar-injector/sidecar-injector.istio.io]. This may cause injection to occur twice.

Upon inspecting that object, I do indeed see both configs; it appears namespace.sidecar-injector.istio.io is the new 1.11 variant and sidecar-injector.istio.io is the 1.9 variant; not sure why it wasn’t removed during the upgrade process.

This doesn’t seem to be causing issues just yet, but obviously the error is concerning. We upgraded Istio via the operator, deploying the new 1.11.4 manifest first with the operator at 0 replicas, then upgraded the operator to 1.11.4 and scaled it to 1.

Version

client version: 1.11.4
control plane version: 1.11.4
data plane version: 1.11.4 (67 proxies)

Additional Information

No response

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Reactions: 5
  • Comments: 62 (25 by maintainers)

Commits related to this issue

Most upvoted comments

Experiencing the same issue after upgrading to 1.12. This must be a bug in istio. I don’t understand why this issue was closed.

Hey folks, I spent some time digging through it. There is an issue in HelmReconciler(legacy galley code) in all the 1.12 releases. This issue gets removed with the removal of legacy galley code in 1.13

  • Reconcile invokes analyzeWebhooks
  • analyzeWebhooks creates an instance of source analyzer, sa, with config schema, resources read from metadata.yaml. In this file, webhooks are not mentioned as clusterScoped: true and this is the cause of the issue.
  • sa.AddReaderKubeSource() then tries to add config sources based on metadata file. In this call flow eventually from here, it ended up prefixing istio-system in the webhook name. IOW, in the analyzer’s snapshot, webhook gets stored by key istio-system/<webhook-name>
  • analyzer also runs a k8s watcher. In the initial cache sync itself, this watcher inserts, along with other resources, webhook as well into the buffer queue. But here istio-system is not added as the prefix. Eventually, analyzer’s snapshot is having two entries for the same webhook, webhook-name and istio-system/<webhook-name>. Naturally, it then fails with Webhook overlaps with others log from analyzer’s [Snapshotter.publish](https://github.com/istio/istio/blob/1.12.6/galley/pkg/config/processing/snapshotter/snapshotter.go#L224):
2022-04-12T08:23:07.254608Z	debug	processing	InmemoryDistributor.Distribute: localAnalysis: [0] istio/mesh/v1alpha1/MeshConfig (@istio/mesh/v1alpha1/MeshConfig/1)
  ......
[10] k8s/admissionregistration.k8s.io/v1/mutatingwebhookconfigurations (@k8s/admissionregistration.k8s.io/v1/mutatingwebhookconfigurations/8)
  [0] istio-system/istio-sidecar-injector-stable <-----  SEE THIS
  [1] istio-sidecar-injector-stable                      <---- AND THIS
  [2] onboarding-operator
  

Locally I tried setting clusterScoped:true in metadata.yaml for mutating webhooks, it worked then.

I did something like this kubectl apply webhook.yaml

webhook.yaml -

apiVersion: admissionregistration.k8s.io/v1
kind: MutatingWebhookConfiguration
metadata:
  annotations:
  labels:
    app: sidecar-injector
    install.operator.istio.io/owning-resource: xxx
    install.operator.istio.io/owning-resource-namespace: istio-system
    istio.io/rev: default
    operator.istio.io/component: Pilot
    operator.istio.io/managed: Reconcile
    operator.istio.io/version: 1.12.3
    release: istio
  name: istio-sidecar-injector
webhooks:

Then restarted the istio operator pod

Also tried deleting 1.11 operator and installing 1.12 operator. This time the upgrade was successful. However, after the upgrade, I restarted the istio-operator pod, and the error is back in the logs. It seems there is a problem in istio 1.12 operator code.

Essentially, if a node where the operator is running gets rebooted or replaced, Istio operator will get stuck in the reconciliation loop.

Is it OK to use 1.11 Operator with 1.12 Istio images to upgrade Istio? I am going to test this flow to see if it helps

Update Looks like Istio 1.11 Operator with Istio 1.12 image fails to start istiod , since service account for istiod does not have enough permissions: 2022-01-24T00:36:01.087732Z error watch error in cluster Kubernetes: failed to list *v1alpha1.WasmPlugin: wasmplugins.extensions.istio.io is forbidden: User “system:serviceaccount:istio-system:istiod” cannot list resource “wasmplugins” in API group “extensions.istio.io” at the cluster scope

Im doing this all through gitops with flux, so running ad-hoc istioctl operator init commands on 150+ clusters isn’t really going to work for me. Thus, why I took the YAML from the Helm chart and did that instead.

Not sure if this is related but I am also facing a similar error using Istio Operator version 1.12.0. While trying to follow the Istio canary upgrade guide, the Operator pod fails to sync resources when there are two deployments active at the same time.

https://istio.io/latest/docs/setup/upgrade/canary/

The strange thing is, I checked the spec of both mutating webhook configurations and they don’t appear to overlap. During the canary upgrade there are two configurations, each with 2 webhooks. It seems like all are meant to be there (one is checking namespace labels and the other checks object labels). It seems like the operator throws an error because it doesn’t distinguish between different values in the match conditions (I have 1-12-0 and 1-11-4). Istio maintainers - could it be a bug in the Operator checking logic?

Neither of the operators are marked as ‘healthy’, and I get this error constantly in the operator log:

2021-11-23T11:34:51.406302Z	info	processing	Runtime.run: Stopping session: id1
2021-11-23T11:34:51.406341Z	info	processing	session[1] "processing" => "terminating"
2021-11-23T11:34:51.406423Z	info	processing	session[1] "terminating" => "inactive"
2021-11-23T11:34:51.406431Z	info	processing	Runtime.run: Exiting...
2021-11-23T11:34:51.406459Z	error	installer	Error during reconcile: creating default tag would conflict:
Error [IST0139] (MutatingWebhookConfiguration istio-sidecar-injector-1-11-4) Webhook overlaps with others: [istio-system/istio-sidecar-injector-1-11-4/rev.namespace.sidecar-injector.istio.io]. This may cause injection to occur twice.
Error [IST0139] (MutatingWebhookConfiguration istio-sidecar-injector-1-11-4) Webhook overlaps with others: [istio-system/istio-sidecar-injector-1-11-4/rev.object.sidecar-injector.istio.io]. This may cause injection to occur twice.
Error [IST0139] (MutatingWebhookConfiguration istio-system/istio-sidecar-injector-1-11-4 ) Webhook overlaps with others: [istio-sidecar-injector-1-11-4/rev.namespace.sidecar-injector.istio.io]. This may cause injection to occur twice.
Error [IST0139] (MutatingWebhookConfiguration istio-system/istio-sidecar-injector-1-11-4 ) Webhook overlaps with others: [istio-sidecar-injector-1-11-4/rev.object.sidecar-injector.istio.io]. This may cause injection to occur twice.

My webhook configurations (some stuff removed):

# istio-sidecar-injector-1-11-4.yaml
---
apiVersion: admissionregistration.k8s.io/v1
kind: MutatingWebhookConfiguration
metadata:
  name: istio-sidecar-injector-1-11-4
webhooks:
- admissionReviewVersions:
  failurePolicy: Fail
  matchPolicy: Equivalent
  name: rev.namespace.sidecar-injector.istio.io
  namespaceSelector:
    matchExpressions:
    - key: istio.io/rev
      operator: In
      values:
      - 1-11-4
    - key: istio-injection
      operator: DoesNotExist
  objectSelector:
    matchExpressions:
    - key: sidecar.istio.io/inject
      operator: NotIn
      values:
      - "false"
  reinvocationPolicy: Never
- admissionReviewVersions:
  failurePolicy: Fail
  matchPolicy: Equivalent
  name: rev.object.sidecar-injector.istio.io
  namespaceSelector:
    matchExpressions:
    - key: istio.io/rev
      operator: DoesNotExist
    - key: istio-injection
      operator: DoesNotExist
  objectSelector:
    matchExpressions:
    - key: sidecar.istio.io/inject
      operator: NotIn
      values:
      - "false"
    - key: istio.io/rev
      operator: In
      values:
      - 1-11-4
  reinvocationPolicy: Never
# istio-sidecar-injector-1-12-0.yaml
---
apiVersion: admissionregistration.k8s.io/v1
kind: MutatingWebhookConfiguration
metadata:
  name: istio-sidecar-injector-1-12-0
webhooks:
- admissionReviewVersions:
  failurePolicy: Fail
  matchPolicy: Equivalent
  name: rev.namespace.sidecar-injector.istio.io
  namespaceSelector:
    matchExpressions:
    - key: istio.io/rev
      operator: In
      values:
      - 1-12-0
    - key: istio-injection
      operator: DoesNotExist
  objectSelector:
    matchExpressions:
    - key: sidecar.istio.io/inject
      operator: NotIn
      values:
      - "false"
  reinvocationPolicy: Never
- admissionReviewVersions:
  failurePolicy: Fail
  matchPolicy: Equivalent
  name: rev.object.sidecar-injector.istio.io
  namespaceSelector:
    matchExpressions:
    - key: istio.io/rev
      operator: DoesNotExist
    - key: istio-injection
      operator: DoesNotExist
  objectSelector:
    matchExpressions:
    - key: sidecar.istio.io/inject
      operator: NotIn
      values:
      - "false"
    - key: istio.io/rev
      operator: In
      values:
      - 1-12-0
  reinvocationPolicy: Never

Using helm to install istio-operator is no longer supported. Please see this issue. https://github.com/istio/istio/issues/36157

Im just using raw YAML to do it; here is our latest 1.11.4 yaml which is the same as what we used in 1.9, just changed the docker image version:

kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: psp:1-privileged
  namespace: flux
subjects:
- kind: Group
  name: system:authenticated
  apiGroup: rbac.authorization.k8s.io
roleRef:
  kind: ClusterRole
  name: psp:1-privileged
  apiGroup: rbac.authorization.k8s.io
---
# Source: istio-operator/templates/crd-operator.yaml
apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
  name: istiooperators.install.istio.io
spec:
  group: install.istio.io
  names:
    kind: IstioOperator
    plural: istiooperators
    singular: istiooperator
    shortNames:
    - iop
  scope: Namespaced
  subresources:
    status: {}
  validation:
    openAPIV3Schema:
      properties:
        apiVersion:
          description: 'APIVersion defines the versioned schema of this representation
            of an object. Servers should convert recognized schemas to the latest
            internal value, and may reject unrecognized values.
            More info: https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/api-conventions.md#resources'
          type: string
        kind:
          description: 'Kind is a string value representing the REST resource this
            object represents. Servers may infer this from the endpoint the client
            submits requests to. Cannot be updated. In CamelCase.
            More info: https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/api-conventions.md#types-kinds'
          type: string
        spec:
          description: 'Specification of the desired state of the istio control plane resource.
            More info: https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/api-conventions.md#spec-and-status'
          type: object
        status:
          description: 'Status describes each of istio control plane component status at the current time.
            0 means NONE, 1 means UPDATING, 2 means HEALTHY, 3 means ERROR, 4 means RECONCILING.
            More info: https://github.com/istio/api/blob/master/operator/v1alpha1/istio.operator.v1alpha1.pb.html &
            https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/api-conventions.md#spec-and-status'
          type: object
  versions:
  - name: v1alpha1
    served: true
    storage: true
---
# Source: istio-operator/templates/service_account.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
  namespace: flux
  name: istio-operator
---
# Source: istio-operator/templates/clusterrole.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  creationTimestamp: null
  name: istio-operator
rules:
# istio groups
- apiGroups:
  - authentication.istio.io
  resources:
  - '*'
  verbs:
  - '*'
- apiGroups:
  - config.istio.io
  resources:
  - '*'
  verbs:
  - '*'
- apiGroups:
  - install.istio.io
  resources:
  - '*'
  verbs:
  - '*'
- apiGroups:
  - networking.istio.io
  resources:
  - '*'
  verbs:
  - '*'
- apiGroups:
  - security.istio.io
  resources:
  - '*'
  verbs:
  - '*'
# k8s groups
- apiGroups:
  - admissionregistration.k8s.io
  resources:
  - mutatingwebhookconfigurations
  - validatingwebhookconfigurations
  verbs:
  - '*'
- apiGroups:
  - apiextensions.k8s.io
  resources:
  - customresourcedefinitions.apiextensions.k8s.io
  - customresourcedefinitions
  verbs:
  - '*'
- apiGroups:
  - apps
  - extensions
  resources:
  - daemonsets
  - deployments
  - deployments/finalizers
  - replicasets
  verbs:
  - '*'
- apiGroups:
  - autoscaling
  resources:
  - horizontalpodautoscalers
  verbs:
  - '*'
- apiGroups:
  - monitoring.coreos.com
  resources:
  - servicemonitors
  verbs:
  - get
  - create
  - update
- apiGroups:
  - policy
  resources:
  - poddisruptionbudgets
  verbs:
  - '*'
- apiGroups:
  - rbac.authorization.k8s.io
  resources:
  - clusterrolebindings
  - clusterroles
  - roles
  - rolebindings
  verbs:
  - '*'
- apiGroups:
  - coordination.k8s.io
  resources:
  - leases
  verbs:
  - get
  - create
  - update
- apiGroups:
  - ""
  resources:
  - configmaps
  - endpoints
  - events
  - namespaces
  - pods
  - pods/proxy
  - persistentvolumeclaims
  - secrets
  - services
  - serviceaccounts
  verbs:
  - '*'
---
# Source: istio-operator/templates/clusterrole_binding.yaml
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: istio-operator
subjects:
- kind: ServiceAccount
  name: istio-operator
  namespace: flux
roleRef:
  kind: ClusterRole
  name: istio-operator
  apiGroup: rbac.authorization.k8s.io
---
# Source: istio-operator/templates/service.yaml
apiVersion: v1
kind: Service
metadata:
  namespace: flux
  labels:
    name: istio-operator
  name: istio-operator
spec:
  ports:
  - name: http-metrics
    port: 8383
    targetPort: 8383
    protocol: TCP
  selector:
    name: istio-operator
---
# Source: istio-operator/templates/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  namespace: flux
  name: istio-operator
spec:
  replicas: 1
  selector:
    matchLabels:
      name: istio-operator
  template:
    metadata:
      labels:
        name: istio-operator
    spec:
      serviceAccountName: istio-operator
      containers:
        - name: istio-operator
          image:  gcr.io/istio-release/operator:1.11.4
          command:
          - operator
          - server
          securityContext:
            allowPrivilegeEscalation: false
            capabilities:
              drop:
              - ALL
            privileged: false
            readOnlyRootFilesystem: true
            runAsGroup: 1337
            runAsUser: 1337
            runAsNonRoot: true
          imagePullPolicy: IfNotPresent
          resources:
            limits:
              cpu: 200m
              memory: 256Mi
            requests:
              cpu: 50m
              memory: 50Mi
          env:
            - name: WATCH_NAMESPACE
              value: "istio-system"
            - name: LEADER_ELECTION_NAMESPACE
              value: "flux"
            - name: POD_NAME
              valueFrom:
                fieldRef:
                  fieldPath: metadata.name
            - name: OPERATOR_NAME
              value: "istio-system"
            - name: WAIT_FOR_RESOURCES_TIMEOUT
              value: "300s"
            - name: REVISION
              value: ""

We use EKS, all 1.19-1.21. All had the problem.

@lukeplausin Thank you for reporting I am seeing the same issue when doing an in-place upgrade from 1.11.4 to 1.12.0 using the operator - the IstioOperator never reconciles and does not return a healthy status, I am however able to install 1.12 using the 1.11 operator

2021-11-29T15:39:10.324997Z	info	processing	Publishing snapshot for group: localAnalysis
2021-11-29T15:39:10.325824Z	info	processing	Runtime.run: Stopping session: id1
2021-11-29T15:39:10.325851Z	info	processing	session[1] "processing" => "terminating"
2021-11-29T15:39:10.325894Z	info	processing	session[1] "terminating" => "inactive"
2021-11-29T15:39:10.325899Z	info	processing	Runtime.run: Exiting...
2021-11-29T15:39:10.325913Z	error	installer	Error during reconcile: creating default tag would conflict:
Error [IST0139] (MutatingWebhookConfiguration istio-sidecar-injector) Webhook overlaps with others: [istio-sidecar-injector/namespace.sidecar-injector.istio.io]. This may cause injection to occur twice.
Error [IST0139] (MutatingWebhookConfiguration istio-sidecar-injector) Webhook overlaps with others: [istio-sidecar-injector/sidecar-injector.istio.io]. This may cause injection to occur twice.