What happened:
- PVC was created, & pod consuming this PVC. PVC is using the WaitForFirstConsumer policy.
- controller with SA
persistent-volume-binder in kube-system namespace edited the PVC, attached the following labels:
volume.beta.kubernetes.io/storage-provisioner: ebs.csi.aws.com
volume.kubernetes.io/selected-node: ip-10-0-92-212.ec2.internal
- PVC is pending phase, it wasn’t bound to the PV.
- The node in question was deleted, by cluster autoscaler 1.20 before the PV was provisioned and attached to the node.
- The annotation remains
- The https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler v1.20 cannot scale up the node since there’s the
volume.kubernetes.io/selected-node annotation pointing to the deleted node. The cluster autoscaler sees this PVC bound to the Pod already bound to the node, but the node isn’t there.
- The pod stays forever in pending state
- Deleting this annotation allows cluster autoscaler to do its job, scaling up the cluster and the pod gets scheduled on a newly provisioned node.
- The end.
What you expected to happen:
- Upon node deletion, the
volume.kubernetes.io/selected-node annotation should be cleared.
How to reproduce it (as minimally and precisely as possible):
- You’re playing with race conditions, but the previously mention story should be sometimes replicable.
Anything else we need to know?:
Environment:
- Kubernetes version (use
kubectl version):
Server Version: version.Info{Major:"1", Minor:"19+", GitVersion:"v1.19.6-eks-49a6c0", GitCommit:"49a6c0bf091506e7bafcdb1b142351b69363355a", GitTreeState:"clean", BuildDate:"2020-12-23T22:10:21Z", GoVersion:"go1.15.5", Compiler:"gc", Platform:"linux/amd64"}
- Cloud provider or hardware configuration:
- AWS EKS
It makes sense removing this annotation if node in question does not exist anymore. If the underlying PV can be attached to some other node, like many cloud PVs can, then it should be allowed to do so.
I believe I had this issue: the PVC with annotation for dead node.
My workaround: