kubernetes: Node VM failure doesn't automatically recreate a pod with attached PV
What happened: After a deployment is created with PVC, a node hosting the pod shuts down. After 6 mins timeout, the replacement/new pod is created but cannot come up since the volume is attached to a terminating pod on a shutdown node. To work around this, the volume would be detached from the original pod when it’s force deleted.
What you expected to happen: According to the Node VM Failure scenario at https://vmware.github.io/vsphere-storage-for-kubernetes/documentation/high-availability.html, the recovery mechanism is completely automatic. In reality, it requires manual intervention to force delete a pod.
How to reproduce it (as minimally and precisely as possible):
Anything else we need to know?:
Environment:
- Kubernetes version (use
kubectl version
): v1.12.7-gke.19 - Cloud provider or hardware configuration: vSphere Cloud Provider
- OS (e.g:
cat /etc/os-release
): - Kernel (e.g.
uname -a
): - Install tools:
- Network plugin and version (if this is a network-related bug):
- Others:
/sig vmware
About this issue
- Original URL
- State: open
- Created 5 years ago
- Comments: 16 (10 by maintainers)
This is a known issue across all cloud providers at the moment, @yastij is working on a KEP right now that addresses this https://github.com/kubernetes/enhancements/pull/1116, however, the problem is a bit thorny as it requires coordination from a number of different components (controller manager, scheduler and the kubelet) and errors can lead to data corruption in certain situations. Will let @yastij comment further if there’s anything else to add.