istio: unable to drain k8s node running istio-policy pod

Deployed istio 1.1.0 RC with helm chart (not template)

I would like to drain out a specific node because it needs a reboot. The PODs

Annotations:        scheduler.alpha.kubernetes.io/critical-pod:

prevents that.

error when evicting pod "istio-policy-d48655744-s8ggt" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.

(also note that annotation scheduler.alpha.kubernetes.io/critical-pod is deprecated and will be removed in k8s 1.14!)

About this issue

Original URL
State: open
Created 5 years ago
Reactions: 23
Comments: 37 (18 by maintainers)

Commits related to this issue

common: Remove PodDisruptionBudget resources for Istio Istio 1.6.0 generated manifests include some policy/v1 PodDisruptionBudget resources that we need to remove. See: - https://github.com/istio/ist... — committed to apo-ger/manifests by apo-ger 2 years ago
common: Remove PodDisruptionBudget resources for Istio Istio 1.6.0 generated manifests include some policy/v1 PodDisruptionBudget resources that we need to remove. See: - https://github.com/istio/ist... — committed to apo-ger/manifests by apo-ger 2 years ago
common: Remove PodDisruptionBudget resources for Istio Istio 1.6.0 generated manifests include some policy/v1 PodDisruptionBudget resources that we need to remove. See: - https://github.com/istio/ist... — committed to apo-ger/manifests by apo-ger 2 years ago
common: Remove PodDisruptionBudget resources for Istio Istio 1.6.0 generated manifests include some policy/v1 PodDisruptionBudget resources that we need to remove. See: - https://github.com/istio/ist... — committed to apo-ger/manifests by apo-ger 2 years ago
common: Add Istio v1.16.0 manifests (#2327) * common: Add Istio v1.16.0 manifests Signed-off-by: Apostolos Gerakaris <apoger@arrikto.com> * Update kustomization file in example to deploy istio-1-16... — committed to kubeflow/manifests by apo-ger 2 years ago
common: Add Istio v1.16.0 manifests (#2327) * common: Add Istio v1.16.0 manifests Signed-off-by: Apostolos Gerakaris <apoger@arrikto.com> * Update kustomization file in example to deploy istio-1-16... — committed to juliusvonkohout/manifests by apo-ger 2 years ago

Most upvoted comments

The default helm installs are set up with 1 replica and a PDB for several deployments / HPAs. This seems like a case of a bad default config. PDBs are a feature for HA deployments, and HA deployments imply 2+ replicas.

The reason this is bad is that it breaks kubectl drain. In itself not good but automatic node-pool upgrades may also fail or take unacceptably long to apply. Node-pools will automatically upscale but then may fail to downscale as well.

Don’t know if this is a feature or a bug but it’s not a great default config for istio installs. I would either remove the PDBs or increase default replicas to 2.

Affected deployments (spec.replicas = 1): certmanager, istio-galley Affected HPA (spec.minReplicas = 1): istio-egressgateway, istio-ingressgateway, istio-policy, istio-telemetry, istio-pilot

+56

Lysholm on May 5, 2019

Just scaling istiod deployment to 2 and after scaling it back to 1 worked for me. Meanwhile the draining node is in cordoned state. I’m using lens for it, but here is a kubectl:

kubectl cordon <your node to drain>
kubectl scale --replicas=2 deployemnt/istiod
# Make sure it was scaled, when you scale it back, it will remove pod from cordoned node
kubectl scale --replicas=1 deployment/istiod
kubectl drain <your node to drain>

PS. in my case, istiod was in separate node from other istio related services, and it worked.

Kostanos on Jun 7, 2022

This can be avoided with the following spec:

apiVersion: install.istio.io/v1alpha1
kind: IstioOperator
metadata:
  name: istio-controlplane
  namespace: istio-system
spec:
  addonComponents:
    kiali:
      enabled: true
    prometheus:
      enabled: false
    tracing:
      enabled: true
  components:
    cni:
      enabled: true
    ingressGateways:
    - enabled: true
      k8s:
        hpaSpec:
          minReplicas: 2
      name: istio-ingressgateway
    pilot:
      enabled: true
      k8s:
        hpaSpec:
          minReplicas: 2
  meshConfig:
    enablePrometheusMerge: true
  profile: default
  values:
    cni:
      excludeNamespaces:
      - istio-system
      - kube-system
      logLevel: info
    kiali:
      dashboard:
        grafanaURL: http://monitoring-grafana.monitoring.svc
      prometheusAddr: http://monitoring-prometheus-server.monitoring.svc

which should probablt be default (the hpa values)

davidkarlsen on Sep 29, 2020

Why close this? I don’t see any concrete resolution provided for this

tariq1890 on Jul 23, 2020

@linsun @howardjohn Requesting your attention on this. I think we need better defaults here. Since we are in istiod, I’d just suggest defaulting the replicaCount of istiod and the gateways to 2. This seems to work rather well especially with PDB’s being enabled by default

tariq1890 on May 16, 2020

@Arnold1 you need to add one more replica for istio-policy & istio-galley.

If you did install istio with helm you need to add the following lines to values.yaml

galley:
  enabled: true
  replicaCount: 2

mixer:
  policy:
    enabled: true
    replicaCount: 2

Then apply it:

helm upgrade --install istio istio.io/istio --namespace istio-system -f values.yaml

vigohe on Nov 18, 2019

You can save yourself a command @Kostanos. Once you’ve cordoned your Node, use the kubectl rollout restart deployment/... command instead of manually scaling, which will essentially do the same thing (scale up to 2 Pods, wait for the 2nd to become Ready, then remove the initial Pod).

des1redState on Feb 27, 2023

@ostromart you changed the defaults recently to have replica counts =2, right? is this resolved by that fix?

mandarjog on Jun 5, 2020

Is there any plan to resolve this issue. It’s still a blocker when using such a tool like kured. It results in a node being effectively locked out of action almost indefinitely.

kinihun on Aug 5, 2019