csi-driver: CSI Mount Error on Hetzner

Hi all,

hope somebody has a hint for me as I already invested hours without a solution, in the beginning everything with my kuberone cluster on Hetzner worked as expected. auto provisioning of block storage over the csi interface is exactly as it should. Since days across the different namespaces errors occur at initial pod starting after provisioning via helm

the error message at pod startup

MountVolume.MountDevice failed for volume "pvc-a0a20263-9462-4c3e-9ebb-be45b92da7f4" : rpc error: code = Internal desc = failed to stage volume: format of disk "/dev/disk/by-id/scsi-0HC_Volume_16943979" failed: type:("ext4") target:("/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-a0a20263-9462-4c3e-9ebb-be45b92da7f4/globalmount") options:("defaults") errcode:(exit status 1) output:(mke2fs 1.45.7 (28-Jan-2021) The file /dev/disk/by-id/scsi-0HC_Volume_16943979 does not exist and no size was specified. )
Unable to attach or mount volumes: unmounted volumes=[redis-pvc], unattached volumes=[redis-pvc default-token-g7mjq]: timed out waiting for the condition

parts of the deployment file

spec:
  volumes:
    - name: redis-pvc
      persistentVolumeClaim:
        claimName: redis-pvc-chirpstack-redis-0
    - name: default-token-g7mjq
      secret:
        secretName: default-token-g7mjq
        defaultMode: 420


 volumeMounts:
        - name: redis-pvc
          mountPath: /data
        - name: default-token-g7mjq
          readOnly: true
          mountPath: /var/run/secrets/kubernetes.io/serviceaccount

pvc is like this


apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: redis-pvc-chirpstack-redis-0
  namespace: chirpstack
  selfLink: >-
    /api/v1/namespaces/chirpstack/persistentvolumeclaims/redis-pvc-chirpstack-redis-0
  uid: 70520ae2-b99f-4d7f-a625-5e97d1748dd9
  resourceVersion: '2367972'
  creationTimestamp: '2022-02-15T17:44:04Z'
  labels:
    app: redis
    release: chirpstack
  annotations:
    pv.kubernetes.io/bind-completed: 'yes'
    pv.kubernetes.io/bound-by-controller: 'yes'
    volume.beta.kubernetes.io/storage-provisioner: csi.hetzner.cloud
    volume.kubernetes.io/selected-node: oc4-pool1-779fc8f494-whkcs
  finalizers:
    - kubernetes.io/pvc-protection
  managedFields:
    - manager: kube-scheduler
      operation: Update
      apiVersion: v1
      time: '2022-02-15T17:44:04Z'
      fieldsType: FieldsV1
      fieldsV1:
        f:metadata:
          f:annotations:
            .: {}
            f:volume.kubernetes.io/selected-node: {}
    - manager: kube-controller-manager
      operation: Update
      apiVersion: v1
      time: '2022-02-15T17:44:07Z'
      fieldsType: FieldsV1
      fieldsV1:
        f:metadata:
          f:annotations:
            f:pv.kubernetes.io/bind-completed: {}
            f:pv.kubernetes.io/bound-by-controller: {}
            f:volume.beta.kubernetes.io/storage-provisioner: {}
          f:labels:
            .: {}
            f:app: {}
            f:release: {}
        f:spec:
          f:accessModes: {}
          f:resources:
            f:requests:
              .: {}
              f:storage: {}
          f:storageClassName: {}
          f:volumeMode: {}
          f:volumeName: {}
        f:status:
          f:accessModes: {}
          f:capacity:
            .: {}
            f:storage: {}
          f:phase: {}
status:
  phase: Bound
  accessModes:
    - ReadWriteOnce
  capacity:
    storage: 10Gi
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 500M
  volumeName: pvc-70520ae2-b99f-4d7f-a625-5e97d1748dd9
  storageClassName: hcloud-volumes
  volumeMode: Filesystem

pv

apiVersion: v1
kind: PersistentVolume
metadata:
  name: pvc-70520ae2-b99f-4d7f-a625-5e97d1748dd9
  selfLink: /api/v1/persistentvolumes/pvc-70520ae2-b99f-4d7f-a625-5e97d1748dd9
  uid: cc6f8d63-a75f-4954-88d3-4ad5c68ffbc1
  resourceVersion: '2367987'
  creationTimestamp: '2022-02-15T17:44:07Z'
  annotations:
    pv.kubernetes.io/provisioned-by: csi.hetzner.cloud
  finalizers:
    - kubernetes.io/pv-protection
    - external-attacher/csi-hetzner-cloud
  managedFields:
    - manager: csi-provisioner
      operation: Update
      apiVersion: v1
      time: '2022-02-15T17:44:07Z'
      fieldsType: FieldsV1
      fieldsV1:
        f:metadata:
          f:annotations:
            .: {}
            f:pv.kubernetes.io/provisioned-by: {}
        f:spec:
          f:accessModes: {}
          f:capacity:
            .: {}
            f:storage: {}
          f:claimRef:
            .: {}
            f:apiVersion: {}
            f:kind: {}
            f:name: {}
            f:namespace: {}
            f:resourceVersion: {}
            f:uid: {}
          f:csi:
            .: {}
            f:driver: {}
            f:fsType: {}
            f:volumeAttributes:
              .: {}
              f:storage.kubernetes.io/csiProvisionerIdentity: {}
            f:volumeHandle: {}
          f:nodeAffinity:
            .: {}
            f:required:
              .: {}
              f:nodeSelectorTerms: {}
          f:persistentVolumeReclaimPolicy: {}
          f:storageClassName: {}
          f:volumeMode: {}
    - manager: kube-controller-manager
      operation: Update
      apiVersion: v1
      time: '2022-02-15T17:44:07Z'
      fieldsType: FieldsV1
      fieldsV1:
        f:status:
          f:phase: {}
    - manager: csi-attacher
      operation: Update
      apiVersion: v1
      time: '2022-02-15T17:44:08Z'
      fieldsType: FieldsV1
      fieldsV1:
        f:metadata:
          f:finalizers:
            v:"external-attacher/csi-hetzner-cloud": {}
status:
  phase: Bound
spec:
  capacity:
    storage: 10Gi
  csi:
    driver: csi.hetzner.cloud
    volumeHandle: '16943978'
    fsType: ext4
    volumeAttributes:
      storage.kubernetes.io/csiProvisionerIdentity: 1644479434542-8081-csi.hetzner.cloud
  accessModes:
    - ReadWriteOnce
  claimRef:
    kind: PersistentVolumeClaim
    namespace: chirpstack
    name: redis-pvc-chirpstack-redis-0
    uid: 70520ae2-b99f-4d7f-a625-5e97d1748dd9
    apiVersion: v1
    resourceVersion: '2367918'
  persistentVolumeReclaimPolicy: Delete
  storageClassName: hcloud-volumes
  volumeMode: Filesystem
  nodeAffinity:
    required:
      nodeSelectorTerms:
        - matchExpressions:
            - key: csi.hetzner.cloud/location
              operator: In
              values:
                - nbg1

this messages happen across different deployments with different applications / services.

thx a lot Martin

About this issue

  • Original URL
  • State: open
  • Created 2 years ago
  • Reactions: 6
  • Comments: 26 (3 by maintainers)

Commits related to this issue

Most upvoted comments

Probably it will help somebody: kubeone works as solution, but I cannot forget this problem, because “vitobotta/hetzner-k3s project” was my step into world of Kubernetes. So I investigate the problem a little bit more.

I tried with node restarts, but not helps. Then I manually remove the VolumeAttachment entry of PVC and this solves the situation automatically. I suppose the problem is based within the code, which check, if a VolumeAttachment must be removed/reinitialized. I will try, if I can debug CSI Driver a little bit, but probably this is beyond my knowledge at the moment.

So the manual workaround for me, was:

# example: kubectl get volumeattachment | grep pvc-9a627590-1234-5678-905a-fb6089af008f
kubectl get volumeattachment | grep pvc-id

# example: kubectl delete volumeattachment csi-8e250e6be1c12345678aacf437308cf6650f431a90db43b1a4ce559f46da7ad6
kubectl delete volumeattachment csi-id

This issue has been marked as stale because it has not had recent activity. The bot will close the issue if no further action occurs.

I spent some more time investigating the problems in this issue. I think this issue contains two different problems:


The original problem, where the volume was supposed to be attached according to the api and NodePublishVolume fails with

MountVolume.MountDevice failed for volume "$PV_NAME" : rpc error: code = Internal desc = failed to stage volume: format of disk "/dev/disk/by-id/scsi-0HC_Volume_$VOLUME_ID" failed: type:("ext4") target:("/var/lib/kubelet/plugins/kubernetes.io/csi/pv/$PV_NAME/globalmount") options:("defaults") errcode:(exit status 1) output:(mke2fs 1.45.7 (28-Jan-2021) The file /dev/disk/by-id/scsi-0HC_Volume_$VOLUME_ID does not exist and no size was specified. )

Unfortunately I could not find a cause for this problem (yet), it would be great if all of the affected users could upgrade to the latest version of the driver and then send me debug logs (see here for debug logs).


The second problem, where the publishing fails because the NodePublishVolume call is missing a path to the device. This fails with message:

level=error ts=2022-09-29T09:31:10.425207836Z component=grpc-server msg="handler failed" err="rpc error: code = Internal desc = failed to publish volume: exit status 1\nmke2fs 1.45.7 (28-Jan-2021)\nThe file does not exist and no size was specified.

As far as I could tell this happened:

  • The cluster was using csi-driver v1.6.0
  • A volume was created and attached to a node
  • The csi-driver running in the cluster was upgrade to v2.0.0+ (or latest at some point after 2022-02-15)
  • The problem occured

We made a change (#264) in the way we get the linux device path of the volume (which is used in NodePublishVolume). Prior to the linked PR, we retrieved the volume from the API in NodeStageVolume and used that path to mount the volume. In an effort to remove any API calls from the Node part, we changed this mechanism to instead pass the device path from ControllerPublishVolume to NodePublishVolume using PublishContext. This PublishContext is saved on the VolumeAttachment object in kubernetes.

As we only started setting the PublishContext for VolumeAttachments created after v2.0.0 (or latest at some point after 2022-02-15), the fields are missing for older VolumeAttachments.

You can run the following command to find any lingering VolumeAttachments in your clusters that might still be affected by this:

$ kubectl get volumeattachment -o custom-columns=NAME:.metadata.name,PVC:.spec.source.persistentVolumeName,ATTACHER:.spec.attacher,DEVICEPATH:.status.attachmentMetadata.devicePath | grep -E 'ATTACHER|csi.hetzner.cloud' --color=never
NAME                                                                   PVC                                        ATTACHER            DEVICEPATH
csi-052551d52da355ad5159b15ce67803b77154dd33c60d136ec504d59e1c0d1eac   pvc-e42ffe07-3eb5-4b24-865a-15a40c24b82b   csi.hetzner.cloud   /dev/disk/by-id/scsi-0HC_Volume_1234567890
csi-1ee89e51ea7f012153b3d2f117cfa37871286e870a5b2e0c808252968362e3ef   pvc-2cca061d-dc80-44f6-87bc-aab963c220d2   csi.hetzner.cloud   <none>

If you see any lines with <none> in the DEVICEPATH column, you need to recreate the VolumeAttachment, the workaround from @swarnat works for this.

Do you have any updates?

We are also facing this issue. It occurs any time a server has been crashed after trying to add a new pvc. Then we restart the server in the Hetzner Cloud Console. When the server has been started again, this new pvc is listet correctly in the Hetzner Cloud Console and in Kubernetes, but the pvc does not exists in /dev/disk/by-id/.

The same issue came up for me today after upgrading the csi-driver from 1.6.0 to 2.1.0 according to the upgrade guide.

@apricote’s command to find affected attachments was very helpful. It is important to note though that the corresponding pod(s) should be immediately restarted (deleted) afterwards as pv access is lost directly after deleting the faulty volume attachments as described by @swarnat.

Thanks everyone for investigating this!

Every day an interesting project. Thanks for mentioning kubeone. I never heard from that tool and will check, what we need to adjust to setup vanilla k8s ourself.