istio: unable to drain k8s node running istio-policy pod
Deployed istio 1.1.0 RC with helm chart (not template)
I would like to drain out a specific node because it needs a reboot. The PODs
Annotations:        scheduler.alpha.kubernetes.io/critical-pod:
prevents that.
error when evicting pod "istio-policy-d48655744-s8ggt" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.
(also note that annotation scheduler.alpha.kubernetes.io/critical-pod is deprecated and will be removed in k8s 1.14!)
About this issue
- Original URL
 - State: open
 - Created 5 years ago
 - Reactions: 23
 - Comments: 37 (18 by maintainers)
 
Commits related to this issue
- common: Remove PodDisruptionBudget resources for Istio Istio 1.6.0 generated manifests include some policy/v1 PodDisruptionBudget resources that we need to remove. See: - https://github.com/istio/ist... — committed to apo-ger/manifests by apo-ger 2 years ago
 - common: Remove PodDisruptionBudget resources for Istio Istio 1.6.0 generated manifests include some policy/v1 PodDisruptionBudget resources that we need to remove. See: - https://github.com/istio/ist... — committed to apo-ger/manifests by apo-ger 2 years ago
 - common: Remove PodDisruptionBudget resources for Istio Istio 1.6.0 generated manifests include some policy/v1 PodDisruptionBudget resources that we need to remove. See: - https://github.com/istio/ist... — committed to apo-ger/manifests by apo-ger 2 years ago
 - common: Remove PodDisruptionBudget resources for Istio Istio 1.6.0 generated manifests include some policy/v1 PodDisruptionBudget resources that we need to remove. See: - https://github.com/istio/ist... — committed to apo-ger/manifests by apo-ger 2 years ago
 - common: Add Istio v1.16.0 manifests (#2327) * common: Add Istio v1.16.0 manifests Signed-off-by: Apostolos Gerakaris <apoger@arrikto.com> * Update kustomization file in example to deploy istio-1-16... — committed to kubeflow/manifests by apo-ger 2 years ago
 - common: Add Istio v1.16.0 manifests (#2327) * common: Add Istio v1.16.0 manifests Signed-off-by: Apostolos Gerakaris <apoger@arrikto.com> * Update kustomization file in example to deploy istio-1-16... — committed to juliusvonkohout/manifests by apo-ger 2 years ago
 
The default helm installs are set up with 1 replica and a PDB for several deployments / HPAs. This seems like a case of a bad default config. PDBs are a feature for HA deployments, and HA deployments imply 2+ replicas.
The reason this is bad is that it breaks
kubectl drain. In itself not good but automatic node-pool upgrades may also fail or take unacceptably long to apply. Node-pools will automatically upscale but then may fail to downscale as well.Don’t know if this is a feature or a bug but it’s not a great default config for istio installs. I would either remove the PDBs or increase default replicas to 2.
Affected deployments (spec.replicas = 1): certmanager, istio-galley Affected HPA (spec.minReplicas = 1): istio-egressgateway, istio-ingressgateway, istio-policy, istio-telemetry, istio-pilot
Just scaling istiod deployment to 2 and after scaling it back to 1 worked for me. Meanwhile the draining node is in cordoned state. I’m using lens for it, but here is a kubectl:
PS. in my case, istiod was in separate node from other istio related services, and it worked.
This can be avoided with the following spec:
which should probablt be default (the hpa values)
Why close this? I don’t see any concrete resolution provided for this
@linsun @howardjohn Requesting your attention on this. I think we need better defaults here. Since we are in
istiod, I’d just suggest defaulting the replicaCount of istiod and the gateways to 2. This seems to work rather well especially with PDB’s being enabled by default@Arnold1 you need to add one more replica for istio-policy & istio-galley.
If you did install istio with helm you need to add the following lines to values.yaml
Then apply it:
You can save yourself a command @Kostanos. Once you’ve cordoned your Node, use the
kubectl rollout restart deployment/...command instead of manually scaling, which will essentially do the same thing (scale up to 2 Pods, wait for the 2nd to become Ready, then remove the initial Pod).@ostromart you changed the defaults recently to have replica counts =2, right? is this resolved by that fix?
Is there any plan to resolve this issue. It’s still a blocker when using such a tool like kured. It results in a node being effectively locked out of action almost indefinitely.