csi-driver: Random volume mount error

This error follows me for the last year.

tax-mysql-0                                       0/2     Init:0/1   0          3m5s
Warning  FailedAttachVolume  119s   attachdetach-controller  AttachVolume.Attach failed for volume "pvc-b269c782-6c14-41d8-a043-9b8c890e485a" : rpc error: code = Internal desc = failed to publish volume: Volume assigning step has failed due to an unknown error. (unknown_error)

Pod randomly can’t create volume mount. And stays in Init state forever. Sometimes cloud node on which pod was running goes offline, but not always. I need to go to Hetzner console and turn node on manually. Workaround is deleting pod and wait for another try. csi version 1.6.0 kubernetes 1.19.9

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Reactions: 5
  • Comments: 22 (4 by maintainers)

Most upvoted comments

@apricote thanks for offering to help!

We have similar issues. Sometimes volumes take forever to mount to the pod which results in startup times of 10min+:

  Normal   Scheduled    11m                    default-scheduler  Successfully assigned pod/pod-54dc234c3-t3t54 to node-name
  Warning  FailedMount  5m20s (x2 over 7m34s)  kubelet            Unable to attach or mount volumes: unmounted volumes=[pod-cache-volume], unattached volumes=[pod-config-volume pod-cache-volume kube-api-access-t441t]: timed out waiting for the condition
  Warning  FailedMount  51s (x3 over 9m49s)    kubelet            Unable to attach or mount volumes: unmounted volumes=[pod-cache-volume], unattached volumes=[pod-cache-volume kube-api-access-t441t pod-config-volume]: timed out waiting for the condition
  Normal   Pulling      5s                     kubelet            Pulling image "image"
  Normal   Pulled       4s                     kubelet            Successfully pulled image "image" in 570.451336ms (570.476615ms including waiting)
  Normal   Created      4s                     kubelet            Created container pod
  Normal   Started      4s                     kubelet            Started container pod

The volume cannot be mounted but according to Hetzner CSI Controller the disk is attached, opened (LUKS) and ready.

Sometimes we also get errors, that state that the server is locked.

  Warning  FailedAttachVolume      2m59s (x2 over 3m1s)  attachdetach-controller  AttachVolume.Attach failed for volume "pvc-233f32ff-78cd-c884-234c-f9898a9f8989" : rpc error: code = Unavailable desc = failed to publish volume: server is locked
  Normal   SuccessfulAttachVolume  2m58s                 attachdetach-controller  AttachVolume.Attach succeeded for volume "pvc-233f32ff-78cd-c884-234c-f9898a9f8989"
  Warning  FailedMount             40s (x2 over 2m54s)   kubelet                  Unable to attach or mount volumes: unmounted volumes=[data], unattached volumes=[data kube-api-access-t441t]: timed out waiting for the condition

This is very annoying and we would be very happy to assist finding the issue!

We already had a look into our debug logs and were not able to find any issues there. These logs are generated close to pod scheduling:

hcloud-csi-driver level=debug component=grpc-server msg="handling request" method=/csi.v1.Node/NodeUnpublishVolume req="volume_id:\"123456789\" target_path:\"/var/lib/kubelet/pods/podID/volumes/kubernetes.io~csi/pvcID/mount\" "
hcloud-csi-driver level=info component=linux-mount-service msg="unpublishing volume" target-path=/var/lib/kubelet/pods/podID/volumes/kubernetes.io~csi/pvcID/mount device-path=/dev/mapper/scsi-0HC_Volume_123456789
hcloud-csi-driver 1 mount_helper_common.go:133] Warning: "/var/lib/kubelet/pods/podID/volumes/kubernetes.io~csi/pvcID/mount" is not a mountpoint, deleting
hcloud-csi-driver level=info component=linux-mount-service msg="closing LUKS device" luksDeviceName=scsi-0HC_Volume_123456789
hcloud-csi-driver level=debug component=grpc-server msg="finished handling request" method=/csi.v1.Node/NodeUnpublishVolume err=null
hcloud-csi-driver level=debug grpc-server msg="handling request" method=/csi.v1.Node/NodePublishVolume req="volume_id:\"123456789\" publish_context:<key:\"devicePath\" value:\"/dev/disk/by-id/scsi-0HC_Volume_123456789\" > target_path:\"/var/lib/kubelet/pods/podID/volumes/kubernetes.io~csi/pvcID/mount\" volume_capability:<mount:<fs_type:\"ext4\" > access_mode:<mode:SINGLE_NODE_WRITER > > secrets:<key:\"encryption-passphrase\" value:\"XXXXXXXXXXXXXX\" > volume_context:<key:\"csi.storage.k8s.io/ephemeral\" value:\"false\" > volume_context:<key:\"csi.storage.k8s.io/pod.name\" value:\"podName\" > volume_context:<key:\"csi.storage.k8s.io/pod.namespace\" value:\"namespace\" > volume_context:<key:\"csi.storage.k8s.io/pod.uid\" value:\"podID\" > volume_context:<key:\"csi.storage.k8s.io/serviceAccount.name\" value:\"serviceAccountName\" > volume_context:<key:\"storage.kubernetes.io/csiProvisionerIdentity\" value:\"id-csi.hetzner.cloud\" > "
hcloud-csi-driver level=info linux-mount-service msg="opening LUKS device" devicePath=/dev/disk/by-id/scsi-0HC_Volume_123456789 luksDeviceName=scsi-0HC_Volume_123456789
hcloud-csi-driver level=info linux-mount-service msg="publishing volume" target-path=/var/lib/kubelet/pods/podID/volumes/kubernetes.io~csi/pvcID/mount device-path=/dev/mapper/scsi-0HC_Volume_123456789 fs-type=ext4 block-volume=false readonly=false mount-options= encrypted=true
hcloud-csi-driver level=debug grpc-server msg="finished handling request" method=/csi.v1.Node/NodePublishVolume err=null

Then we see nothing for a long time (thats probably the time during which the pod isn’t able to mount the volume). At some point, when the volume was successfully mounted, we see these logs:

hcloud-csi-node-j5vzl hcloud-csi-driver level=debug component=grpc-server msg="handling request" method=/csi.v1.Node/NodeGetVolumeStats req="volume_id:\"0HC_Volume_123456789\" volume_path:\"/var/lib/kubelet/pods/podID/volumes/kubernetes.io~csi/pvcID/mount\" "
hcloud-csi-node-j5vzl hcloud-csi-driver level=debug component=linux-mount-service msg="checking path existence" path=/var/lib/kubelet/pods/podID/volumes/kubernetes.io~csi/pvcID/mount
hcloud-csi-node-j5vzl hcloud-csi-driver level=debug component=grpc-server msg="finished handling request" method=/csi.v1.Node/NodeGetVolumeStats err=null

We don’t know what changed or what made the volume suddenly mountable.

Sometimes node gets turned off after failed volume mount, from logs it seems like powerloss, with no error in syslog at all. So I need to go to hetzner web console and turn it on manually. Indeed maybe it is related to virtualization errors somehow. But this volume mount problem happens very often.