csi-digitalocean: Intermittent MountVolume.MountDevice errors on Pod creation

I’ve set up Rancher 2.0 cluster on DO to test CSI-DO. At first attempt i followed README.md and succeeded with example app. However after trying to do my own stuff I’ve started consistently getting Pod creation errors. To rule out the unknowns I’ve wiped all my stuff and returned back to example app and confirmed that i’m consistently getting this result:

Events:
  Type     Reason                  Age                From                     Message
  ----     ------                  ----               ----                     -------
  Warning  FailedScheduling        21s (x4 over 22s)  default-scheduler        pod has unbound PersistentVolumeClaims
  Normal   Scheduled               19s                default-scheduler        Successfully assigned my-csi-app to node-1
  Normal   SuccessfulMountVolume   19s                kubelet, node-1          MountVolume.SetUp succeeded for volume "default-token-jw8hg"
  Normal   SuccessfulAttachVolume  15s                attachdetach-controller  AttachVolume.Attach succeeded for volume "pvc-4b1897fb-5f66-11e8-8f45-c2743f7bbff2"
  Warning  FailedMount             5s (x4 over 10s)   kubelet, node-1          MountVolume.MountDevice failed for volume "pvc-4b1897fb-5f66-11e8-8f45-c2743f7bbff2" : rpc error: code = Internal desc = formatting disk failed: exit status 1 cmd: 'mkfs.ext4 -F /dev/disk/by-id/scsi-0DO_Volume_pvc-4b1897fb-5f66-11e8-8f45-c2743f7bbff2' output: "mke2fs 1.43.7 (16-Oct-2017)\nThe file /dev/disk/by-id/scsi-0DO_Volume_pvc-4b1897fb-5f66-11e8-8f45-c2743f7bbff2 does not exist and no size was specified.\n"

However just after waiting for couple hours problem is gone. I wonder if it was Block Storage degradation I’ve just happened to witness or this is something related to CSI-DO. Unfortunatelly I’ve wiped out that cluster and after setting up new one example app deploys just fine. I will provide additional info you might need if I witness this problem again.

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Reactions: 2
  • Comments: 33 (14 by maintainers)

Most upvoted comments

I found the issue, which is a race condition that could happen very often due how the DigitalOcean API works. This is fixed with https://github.com/digitalocean/csi-digitalocean/pull/61 I tested it and it works fine.

I pushed a new image to test with the version :dev. You can update the attacher component with the following command:

kubectl set image statefulset csi-attacher-doplugin digitalocean-csi-plugin=digitalocean/do-csi-plugin:dev -n kube-system && kubectl delete pods/csi-attacher-doplugin-0  -n kube-system

The above command should replace the current image with the :dev version. Deleting the pod makes sure the statefulset creates a new pod with the latest image. Can anyone test it with this image, would appreciate it.

I’m facing the same issue, Restarting the nodes helped me solve the issue. Thanks @zarbis for the tip 👍 😃

@Azuka can you please try this example: https://github.com/digitalocean/csi-digitalocean/issues/32#issuecomment-414978118 if not please provide your manifests in full deployable version so I can test it myself. This might be or not the same error. Thanks!