aws-ebs-csi-driver: MountVolume.SetUp failed for volume when quickly re-attaching to same node
/kind bug
What happened? One our setups includes a pod that has an EBS volume which we mount using the aws-ebs-csi-driver, installed using helm.
At a certain time, we may terminate the pod, and sometimes we very quickly bring it back up, using the same volume.
This has a pretty high failure rate, with the following events in the pod:
Normal Scheduled 3m59s default-scheduler Successfully assigned ci-1408605891-e2e-test-full-raftt-317dae9118d34e6d64f4f960cb789/sandcastle to ip-10-2-8-177.eu-central-1.compute.internal
Warning FailedMount 117s kubelet Unable to attach or mount volumes: unmounted volumes=[raftt-mount], unattached volumes=[docker-socket kube-api-access-56xmb raftt-mount]: timed out waiting for the condition
Warning FailedMount 112s (x9 over 4m) kubelet MountVolume.SetUp failed for volume "raftt-pv-vol-0987ddb5694159fb5" : rpc error: code = Internal desc = Could not mount "/var/lib/kubelet/plugins/kubernetes.io/csi/pv/raftt-pv-vol-0987ddb5694159fb5/globalmount" at "/var/lib/kubelet/pods/7ff84ca5-fe5e-4734-a98f-f594d16533ad/volumes/kubernetes.io~csi/raftt-pv-vol-0987ddb5694159fb5/mount": mount failed: exit status 32
Mounting command: mount
Mounting arguments: -t ext4 -o bind /var/lib/kubelet/plugins/kubernetes.io/csi/pv/raftt-pv-vol-0987ddb5694159fb5/globalmount /var/lib/kubelet/pods/7ff84ca5-fe5e-4734-a98f-f594d16533ad/volumes/kubernetes.io~csi/raftt-pv-vol-0987ddb5694159fb5/mount
Output: mount: /var/lib/kubelet/pods/7ff84ca5-fe5e-4734-a98f-f594d16533ad/volumes/kubernetes.io~csi/raftt-pv-vol-0987ddb5694159fb5/mount: special device /var/lib/kubelet/plugins/kubernetes.io/csi/pv/raftt-pv-vol-0987ddb5694159fb5/globalmount does not exist.
Gathering logs from the ebs-csi-node pod:
$ kubectl -n kube-system logs ebs-csi-node-lsxb2 -c ebs-plugin
...
1101 16:05:04.722735 1 driver.go:119] GRPC error: rpc error: code = Internal desc = Could not mount "/var/lib/kubelet/plugins/kubernetes.io/csi/pv/raftt-pv-vol-0987ddb5694159fb5/globalmount" at "/var/lib/kubelet/pods/7ff84ca5-fe5e-4734-a98f-f594d16533ad/volumes/kubernetes.io~csi/raftt-pv-vol-0987ddb5694159fb5/mount": mount failed: exit status 32
Mounting command: mount
Mounting arguments: -t ext4 -o bind /var/lib/kubelet/plugins/kubernetes.io/csi/pv/raftt-pv-vol-0987ddb5694159fb5/globalmount /var/lib/kubelet/pods/7ff84ca5-fe5e-4734-a98f-f594d16533ad/volumes/kubernetes.io~csi/raftt-pv-vol-0987ddb5694159fb5/mount
Output: mount: /var/lib/kubelet/pods/7ff84ca5-fe5e-4734-a98f-f594d16533ad/volumes/kubernetes.io~csi/raftt-pv-vol-0987ddb5694159fb5/mount: special device /var/lib/kubelet/plugins/kubernetes.io/csi/pv/raftt-pv-vol-0987ddb5694159fb5/globalmount does not exist.
E1101 16:05:36.812865 1 mount_linux.go:179] Mount failed: exit status 32
Mounting command: mount
Mounting arguments: -t ext4 -o bind /var/lib/kubelet/plugins/kubernetes.io/csi/pv/raftt-pv-vol-0987ddb5694159fb5/globalmount /var/lib/kubelet/pods/7ff84ca5-fe5e-4734-a98f-f594d16533ad/volumes/kubernetes.io~csi/raftt-pv-vol-0987ddb5694159fb5/mount
Output: mount: /var/lib/kubelet/pods/7ff84ca5-fe5e-4734-a98f-f594d16533ad/volumes/kubernetes.io~csi/raftt-pv-vol-0987ddb5694159fb5/mount: special device /var/lib/kubelet/plugins/kubernetes.io/csi/pv/raftt-pv-vol-0987ddb5694159fb5/globalmount does not exist.
E1101 16:05:36.812926 1 driver.go:119] GRPC error: rpc error: code = Internal desc = Could not mount "/var/lib/kubelet/plugins/kubernetes.io/csi/pv/raftt-pv-vol-0987ddb5694159fb5/globalmount" at "/var/lib/kubelet/pods/7ff84ca5-fe5e-4734-a98f-f594d16533ad/volumes/kubernetes.io~csi/raftt-pv-vol-0987ddb5694159fb5/mount": mount failed: exit status 32
Mounting command: mount
Mounting arguments: -t ext4 -o bind /var/lib/kubelet/plugins/kubernetes.io/csi/pv/raftt-pv-vol-0987ddb5694159fb5/globalmount /var/lib/kubelet/pods/7ff84ca5-fe5e-4734-a98f-f594d16533ad/volumes/kubernetes.io~csi/raftt-pv-vol-0987ddb5694159fb5/mount
Output: mount: /var/lib/kubelet/pods/7ff84ca5-fe5e-4734-a98f-f594d16533ad/volumes/kubernetes.io~csi/raftt-pv-vol-0987ddb5694159fb5/mount: special device /var/lib/kubelet/plugins/kubernetes.io/csi/pv/raftt-pv-vol-0987ddb5694159fb5/globalmount does not exist.
E1101 16:06:40.848729 1 mount_linux.go:179] Mount failed: exit status 32
Mounting command: mount
Mounting arguments: -t ext4 -o bind /var/lib/kubelet/plugins/kubernetes.io/csi/pv/raftt-pv-vol-0987ddb5694159fb5/globalmount /var/lib/kubelet/pods/7ff84ca5-fe5e-4734-a98f-f594d16533ad/volumes/kubernetes.io~csi/raftt-pv-vol-0987ddb5694159fb5/mount
Output: mount: /var/lib/kubelet/pods/7ff84ca5-fe5e-4734-a98f-f594d16533ad/volumes/kubernetes.io~csi/raftt-pv-vol-0987ddb5694159fb5/mount: special device /var/lib/kubelet/plugins/kubernetes.io/csi/pv/raftt-pv-vol-0987ddb5694159fb5/globalmount does not exist.
E1101 16:06:40.848788 1 driver.go:119] GRPC error: rpc error: code = Internal desc = Could not mount "/var/lib/kubelet/plugins/kubernetes.io/csi/pv/raftt-pv-vol-0987ddb5694159fb5/globalmount" at "/var/lib/kubelet/pods/7ff84ca5-fe5e-4734-a98f-f594d16533ad/volumes/kubernetes.io~csi/raftt-pv-vol-0987ddb5694159fb5/mount": mount failed: exit status 32
Mounting command: mount
Mounting arguments: -t ext4 -o bind /var/lib/kubelet/plugins/kubernetes.io/csi/pv/raftt-pv-vol-0987ddb5694159fb5/globalmount /var/lib/kubelet/pods/7ff84ca5-fe5e-4734-a98f-f594d16533ad/volumes/kubernetes.io~csi/raftt-pv-vol-0987ddb5694159fb5/mount
Output: mount: /var/lib/kubelet/pods/7ff84ca5-fe5e-4734-a98f-f594d16533ad/volumes/kubernetes.io~csi/raftt-pv-vol-0987ddb5694159fb5/mount: special device /var/lib/kubelet/plugins/kubernetes.io/csi/pv/raftt-pv-vol-0987ddb5694159fb5/globalmount does not exist.
...
From the AWS console I see that the volume is mounted at /dev/xvdbo, and:
sh-4.2# ls -la xvdbo
lrwxrwxrwx 1 root root 8 Nov 1 15:54 xvdbo -> nvme28n1
Looking at lsblk:
sh-4.2# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
nvme0n1 259:0 0 300G 0 disk
|-nvme0n1p1 259:1 0 300G 0 part /
`-nvme0n1p128 259:2 0 1M 0 part
nvme2n1 259:4 0 5G 0 disk /var/lib/kubelet/pods/e2a8aaef-1831-4caf-8e82-b5b1fc5ed3d0
nvme4n1 259:6 0 5G 0 disk /var/lib/kubelet/pods/8c66a6b3-cee6-44c9-b7d0-61c3c83a2e78
nvme5n1 259:7 0 5G 0 disk /var/lib/kubelet/pods/d5830c4f-4a8a-4472-a22e-213225897150
nvme8n1 259:10 0 5G 0 disk /var/lib/kubelet/pods/7a2b5d4d-35b5-4385-a944-73419bb343b0
nvme10n1 259:12 0 5G 0 disk /var/lib/kubelet/pods/2d449b0b-7f11-4d57-a63f-4865409f52e5
nvme12n1 259:14 0 5G 0 disk /var/lib/kubelet/pods/1a2c830f-5228-4a63-bc70-b33c3047c992
nvme15n1 259:17 0 5G 0 disk /var/lib/kubelet/pods/c2bef48a-ba7f-4b8b-b1ae-953f69fa0818
nvme18n1 259:20 0 5G 0 disk /var/lib/kubelet/pods/39744f99-e543-4679-ba31-edcda1f3e4c2
nvme19n1 259:21 0 5G 0 disk /var/lib/kubelet/pods/211d6610-e0e0-4116-a6dc-3b744b3eb80a
nvme21n1 259:23 0 5G 0 disk /var/lib/kubelet/pods/dd610ee4-279a-4133-bf8e-017dc6301d6c
nvme22n1 259:24 0 5G 0 disk /var/lib/kubelet/pods/bb6c154d-2d28-4271-a315-3b7dd9fa5daf
nvme25n1 259:27 0 5G 0 disk /var/lib/kubelet/pods/34d02041-e9b2-41c0-b494-98bf26b475cb
nvme28n1 259:30 0 5G 0 disk /var/lib/kubelet/pods/d3dab24e-5e47-47e9-8348-0c8040680cd1
So /dev/nvme28n1 is mounted to a different pod! Maybe the previous version of it?
What you expected to happen? The volume be mounted to the new pod.
How to reproduce it (as minimally and precisely as possible)? I believe a kube cluster with the ebs-csi-driver, a pod using a specific volume, and then deleting the pod and immediately re-creating it, using the same volume should do it.
I will try to create a simple recreation soon.
Anything else we need to know?:
Environment
- Kubernetes version:
Client Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.2", GitCommit:"092fbfbf53427de67cac1e9fa54aaa09a28371d7", GitTreeState:"clean", BuildDate:"2021-06-16T12:59:11Z", GoVersion:"go1.16.5", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"21+", GitVersion:"v1.21.2-eks-0389ca3", GitCommit:"8a4e27b9d88142bbdd21b997b532eb6d493df6d2", GitTreeState:"clean", BuildDate:"2021-07-31T01:34:46Z", GoVersion:"go1.16.5", Compiler:"gc", Platform:"linux/amd64"}
- Driver version: Installed using helm:
aws-ebs-csi-driver kube-system 9 2021-09-12 19:38:38.386405 +0300 IDT deployed aws-ebs-csi-driver-2.1.1 1.2.1
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Reactions: 3
- Comments: 15 (4 by maintainers)
@vdhanan I encountered it when I updated a StatefulSet, replacing the Pod on the same node (in that case, the EBS volume was bound to the same node before).
Experienced the same with grace time at the default 30s.