csi-driver: Volume assigning step has failed due to an unknown error

Kubernetes version: 1.13.1 Ubuntu version: 18.04 One master node, two worker nodes (all CX21) Using the provided example in the README

The volume gets created as I can see it in the Hetzner Cloud dashboard, but it isn’t attached to a server.

Name:               my-csi-app
Namespace:          default
Priority:           0
PriorityClassName:  <none>
Node:               worker2
Start Time:         Mon, 14 Jan 2019 13:39:28 +0100
Labels:             <none>
Annotations:        kubectl.kubernetes.io/last-applied-configuration:
                      {"apiVersion":"v1","kind":"Pod","metadata":{"annotations":{},"name":"my-csi-app","namespace":"default"},"spec":{"containers":[{"command":[...
Status:             Pending
IP:
Containers:
  my-frontend:
    Container ID:
    Image:         busybox
    Image ID:
    Port:          <none>
    Host Port:     <none>
    Command:
      sleep
      1000000
    State:          Waiting
      Reason:       ContainerCreating
    Ready:          False
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /data from my-csi-volume (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-9pq7v (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  my-csi-volume:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  csi-pvc
    ReadOnly:   false
  default-token-9pq7v:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-9pq7v
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason              Age                  From                     Message
  ----     ------              ----                 ----                     -------
  Warning  FailedScheduling    104s (x4 over 110s)  default-scheduler        pod has unbound immediate PersistentVolumeClaims (repeated 3 times)
  Normal   Scheduled           104s                 default-scheduler        Successfully assigned default/my-csi-app to worker2
  Warning  FailedAttachVolume  37s (x8 over 101s)   attachdetach-controller  AttachVolume.Attach failed for volume "pvc-6b5bd076-17f9-11e9-a20a-9600001463bf" : rpc error: code = Internal desc = failed to publish volume: Volume assigning step has failed due to an unknown error. (unknown_error)

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Reactions: 2
  • Comments: 17 (2 by maintainers)

Most upvoted comments

I had a look at your failed attach API call and it turned out it wasn’t a problem related to automount. It was another problem which was fixed by you rebooting your server.

@djboris9 Got the same answer from Hetzner. But sometimes just rebooting isn’t enough. Then detaching/re-attaching of volumes in the web gui is required.

@shyblower can confirm this. In the documentation is written, that you can attach up to 16 volumes to a server. However it feels like you just can attach a volume 16 times to a server. The more volumes you have and the more volumes you create/destroy by the hcloud-csi-driver, the more likely you ran into this problem.

Following helped me:

  1. detaching all volumes
  2. server reboot
  3. re-attaching all volumes

hcloud-csi-driver 1.6.0

I am getting this more and more often. It’s especially easy to trigger if you have a stateful-set with 4+ members and podManagementPolicy = "Parallel".

Is there any way to find out what’s causing this? Because it easily breaks deployments.

Mounting fails also show up in the console:

image

Update Rebooting the servers solves the issue. Support also told me to do that. It’s not a very nice solution though.