istio: unable to drain k8s node running istio-policy pod
Deployed istio 1.1.0 RC with helm chart (not template)
I would like to drain out a specific node because it needs a reboot. The PODs
Annotations: scheduler.alpha.kubernetes.io/critical-pod:
prevents that.
error when evicting pod "istio-policy-d48655744-s8ggt" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.
(also note that annotation scheduler.alpha.kubernetes.io/critical-pod is deprecated and will be removed in k8s 1.14!)
About this issue
- Original URL
- State: open
- Created 5 years ago
- Reactions: 23
- Comments: 37 (18 by maintainers)
Commits related to this issue
- common: Remove PodDisruptionBudget resources for Istio Istio 1.6.0 generated manifests include some policy/v1 PodDisruptionBudget resources that we need to remove. See: - https://github.com/istio/ist... — committed to apo-ger/manifests by apo-ger 2 years ago
- common: Remove PodDisruptionBudget resources for Istio Istio 1.6.0 generated manifests include some policy/v1 PodDisruptionBudget resources that we need to remove. See: - https://github.com/istio/ist... — committed to apo-ger/manifests by apo-ger 2 years ago
- common: Remove PodDisruptionBudget resources for Istio Istio 1.6.0 generated manifests include some policy/v1 PodDisruptionBudget resources that we need to remove. See: - https://github.com/istio/ist... — committed to apo-ger/manifests by apo-ger 2 years ago
- common: Remove PodDisruptionBudget resources for Istio Istio 1.6.0 generated manifests include some policy/v1 PodDisruptionBudget resources that we need to remove. See: - https://github.com/istio/ist... — committed to apo-ger/manifests by apo-ger 2 years ago
- common: Add Istio v1.16.0 manifests (#2327) * common: Add Istio v1.16.0 manifests Signed-off-by: Apostolos Gerakaris <apoger@arrikto.com> * Update kustomization file in example to deploy istio-1-16... — committed to kubeflow/manifests by apo-ger 2 years ago
- common: Add Istio v1.16.0 manifests (#2327) * common: Add Istio v1.16.0 manifests Signed-off-by: Apostolos Gerakaris <apoger@arrikto.com> * Update kustomization file in example to deploy istio-1-16... — committed to juliusvonkohout/manifests by apo-ger 2 years ago
The default helm installs are set up with 1 replica and a PDB for several deployments / HPAs. This seems like a case of a bad default config. PDBs are a feature for HA deployments, and HA deployments imply 2+ replicas.
The reason this is bad is that it breaks
kubectl drain
. In itself not good but automatic node-pool upgrades may also fail or take unacceptably long to apply. Node-pools will automatically upscale but then may fail to downscale as well.Don’t know if this is a feature or a bug but it’s not a great default config for istio installs. I would either remove the PDBs or increase default replicas to 2.
Affected deployments (spec.replicas = 1): certmanager, istio-galley Affected HPA (spec.minReplicas = 1): istio-egressgateway, istio-ingressgateway, istio-policy, istio-telemetry, istio-pilot
Just scaling istiod deployment to 2 and after scaling it back to 1 worked for me. Meanwhile the draining node is in cordoned state. I’m using lens for it, but here is a kubectl:
PS. in my case, istiod was in separate node from other istio related services, and it worked.
This can be avoided with the following spec:
which should probablt be default (the hpa values)
Why close this? I don’t see any concrete resolution provided for this
@linsun @howardjohn Requesting your attention on this. I think we need better defaults here. Since we are in
istiod
, I’d just suggest defaulting the replicaCount of istiod and the gateways to 2. This seems to work rather well especially with PDB’s being enabled by default@Arnold1 you need to add one more replica for istio-policy & istio-galley.
If you did install istio with helm you need to add the following lines to values.yaml
Then apply it:
You can save yourself a command @Kostanos. Once you’ve cordoned your Node, use the
kubectl rollout restart deployment/...
command instead of manually scaling, which will essentially do the same thing (scale up to 2 Pods, wait for the 2nd to become Ready, then remove the initial Pod).@ostromart you changed the defaults recently to have replica counts =2, right? is this resolved by that fix?
Is there any plan to resolve this issue. It’s still a blocker when using such a tool like kured. It results in a node being effectively locked out of action almost indefinitely.