vsphere-csi-driver: FailedMount - FailedPrecondition - Volume does not appear staged
Is this a BUG REPORT or FEATURE REQUEST?: /kind bug
What happened:
While deleing and (re-)applying the kubernetes statefulset configuration. Some of the pods that use vSphere CSI based persistent volumes are failing start and are stuck at ContainerCreating state. This happens randomly to some pods. There are other pods which also use vSphere CSI based PV and are also scheduled on the same worker node which starts without any issues.
Currently, we have 2 pods in this failed state. Here is the link to the Gist containing related k8s objects describe output and vSphere CSI logs: https://gist.github.com/mohideen/3b0cfa29f03d10fb7e1170f5729cb528
What you expected to happen: The pods should start successfully with the corresponding PV mounted on it.
How to reproduce it (as minimally and precisely as possible):
- Create a PVC with using a vSphere storage class.
- Create a Statefulset and configure it use the above PVC
- Verify that the pods has the volume mounted to it.
- Delete the Statefulset
- Redeploy the same Statefulset
- Check the status of the pod
- The pod usually starts correctly.
- On some random occasion, the pod fails to start and it is stuck in
ContainerCreatingstate with the message thatMountVolume.SetUp failed for volume(Note: This has happened only thrice so far in the last month)
Anything else we need to know?:
Environment:
- csi-vsphere version: v2.0.0
- vsphere-cloud-controller-manager version: v2.0.0
- Kubernetes version: 1.18.5
- vSphere version: 6.7U3
- OS (e.g. from /etc/os-release): Red Hat Enterprise Linux Server 7.8 (Maipo)
- Kernel (e.g.
uname -a): 3.10.0-1127.18.2.el7.x86_64 - Install tools: kubeadm
- Others: 6 Node (3 Controller and 3 Worker) cluster
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Reactions: 2
- Comments: 41 (14 by maintainers)
Hello. We are experiencing this issue in one of our production clusters. One pod is getting stuck at
Initstatus and we are getting the following Event:MountVolume.SetUp failed for volume "pvc-e6043fb3-0490-45ca-985d-e8972a0b3fc3" : rpc error: code = FailedPrecondition desc = Volume ID: "30a7a050-6c14-4b0a-80f0-adf5bd1c0c16" does not appear staged to "/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-e6043fb3-0490-45ca-985d-e8972a0b3fc3/globalmount"Killing the pod usually fixes this issue.
Environment: csi-vsphere version: v2.1.0_vmware.1 vsphere-csi-node: v2.1.0_vmware.1 Kubernetes version: v1.21.8 vSphere version: 7.0.3.00300 OS (e.g. from /etc/os-release): Ubuntu 20.04.3 LTS (Focal Fossa) Kernel (e.g. uname -a): 5.4.0-96-generic #109-Ubuntu SMP Wed Jan 12 16:49:16 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
https://github.com/kubernetes-sigs/vsphere-csi-driver/releases/tag/v3.0.0 is released with the fix for this issue.
Can do, it needs to fail again though, since we “fixed” the failed pod by cordoning the node and then restarting the job.