k3s: CSI plugin functionality is broken: "attacher.Attach failed: volumeattachments.storage.k8s.io is forbidden"

Bug

When installing a CSI Driver, PVCs and PVs are created but will not attach to a pod with this error:

E0727 14:03:55.220699    2205 csi_attacher.go:93] kubernetes.io/csi: attacher.Attach failed: volumeattachments.storage.k8s.io is forbidden: User "system:node:master1" cannot create resource "volumeattachments" in API group "storage.k8s.io" at the cluster scope: can only get individual resources of this type

To Reproduce

Expected behavior PVs are able to attach to a pod without issue

Additional context The issue has been reported for multiple CSI drivers and all things point to k3s. I can personally confirm that this works normally when used with a kubespray cluster.

hetznercloud/csi-driver: https://github.com/hetznercloud/csi-driver/issues/46 longhorn CSI driver: https://forums.rancher.com/t/longhorn-on-k3s-pv-attach-error/14920

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Reactions: 6
  • Comments: 36 (8 by maintainers)

Most upvoted comments

I don’t know much about authentication and authotirzation yet… but I was playing a little and got a volume attached and mounted by doing this:

  1. Because the system:node ClusterRole had only the verb ‘get’ for ‘volumeattachments’, I added ‘create’, ‘delete’, ‘patch’, ‘update’, ‘list’ and ‘watch’ after seeing what’s in the other sections of the role…

  2. I edited the ClusterRoleBinding for system:node and since there were no subjects I tried adding these:

subjects:
- apiGroup: rbac.authorization.k8s.io
  kind: Group
  name: system:nodes

I have no idea of what I’m doing but with this change it works LOL 😄 the volume is mounted and working correctly. Is this something that should be done in K3s?

FWIW I ran across this issue today - had a test v1.19.7+k3s1 cluster running Longhorn, and for reasons decided to run sudo service k3s restart and sudo service k3s-agent restart on each of my nodes in sequence.

One of the pods kept failing with MountVolume.WaitForAttach failed for volume "pvc-da55e64f-4928-4a78-bb15-2b811712a17d" : volume pvc-da55e64f-4928-4a78-bb15-2b811712a17d has GET error for volume attachment csi-03f2267d55258d7abc4cfa13778de5e53882fa7d96d2d1decb616cb26b9d1472: volumeattachments.storage.k8s.io "csi-03f2267d55258d7abc4cfa13778de5e53882fa7d96d2d1decb616cb26b9d1472" is forbidden: User "system:node:hostname" cannot get resource "volumeattachments" in API group "storage.k8s.io" at the cluster scope: no relationship found between node 'hostname' and this object

I modified and pushed a new PVC, adding a “2” to the end of the metadata.name field, then edited the deployment spec.template.spec.volumes.persistentVolumeClaim.claimName to match and pushed that… the pod came up just fine.

Finally, I edited the deployment config’s volume claim name back to the original, and it appears to be attaching again now (logs appear normal, and I can kubectl exec into the container and read the mounted files which appear to be intact).

I have no idea how things got into this state - but I hope this helps other folks who come across this problem.

This will be fixed with the next release and using k8s v1.15. The nim package variable is the main culprit here since it contains host information which is different for the attachdetach controller and kubelet. In general we can say that package level state like this is a bad idea, unfortunately there is no good way for us to audit the code for variables like this. We could try forking to help isolate processes but that will likely come with an additional memory cost.

This has something to do with us running a combined binary, it looks like some CSI operations are being picked up by the kubelet instead of the attachdetach controller. If you run the server with --disable-agent and run a separate agent the CSI attach will work correctly. Stack trace of attacher code:

github.com/rancher/k3s/vendor/k8s.io/kubernetes/pkg/volume/csi.(*csiAttacher).Attach(0xc006719e30, 0xc00669cc20, 0xc00453019b, 0x5, 0xc008cfeb30, 0x10, 0x10, 0x2f99aa0)
        /Users/erik/go/src/github.com/rancher/k3s/vendor/k8s.io/kubernetes/pkg/volume/csi/csi_attacher.go:90 +0x2d6
github.com/rancher/k3s/vendor/k8s.io/kubernetes/pkg/volume/util/operationexecutor.(*operationGenerator).GenerateAttachVolumeFunc.func2(0x8, 0x3557a98, 0xc004458d00, 0xc002a07720)
        /Users/erik/go/src/github.com/rancher/k3s/vendor/k8s.io/kubernetes/pkg/volume/util/operationexecutor/operation_generator.go:346 +0x9b
github.com/rancher/k3s/vendor/k8s.io/kubernetes/pkg/volume/util/nestedpendingoperations.(*nestedPendingOperations).Run.func1(0xc006f76200, 0xc002a07720, 0x4e, 0x0, 0x0, 0xc0087e18c0, 0xc00659de00, 0xc0087e1940, 0x38, 0x40, ...)
        /Users/erik/go/src/github.com/rancher/k3s/vendor/k8s.io/kubernetes/pkg/volume/util/nestedpendingoperations/nestedpendingoperations.go:143 +0x146
created by github.com/rancher/k3s/vendor/k8s.io/kubernetes/pkg/volume/util/nestedpendingoperations.(*nestedPendingOperations).Run
        /Users/erik/go/src/github.com/rancher/k3s/vendor/k8s.io/kubernetes/pkg/volume/util/nestedpendingoperations/nestedpendingoperations.go:130 +0x2ce

Similarly we are running with --use-service-account-credentials=true for controller-manager and appear to be using the attach detach controller service account.

I think like @vitobotta found an RBAC rule is going to be the easiest fix at the moment. Instead of modifying the system:node role for create volume attachment and binding system:nodes to it I just created a new role and binding for system:nodes with minimal permissions:

kubectl apply -f - <<EOF

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: system:nodes:volumeattachments
rules:
- apiGroups:
  - storage.k8s.io
  resources:
  - volumeattachments
  verbs:
  - create
  - watch

---

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: system:nodes:volumeattachments
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: system:nodes:volumeattachments
subjects:
- apiGroup: rbac.authorization.k8s.io
  kind: Group
  name: system:nodes

EOF

Yes! I just tried 0.9.0 RC2 and I didn’t need the hack this time. Thanks! 😃

As a temporary workaround, are there any changes I can make to permissions or something to make it work? Thanks