longhorn: Cant attach existing volumes after network and node issues: GRPC error: rpc error: code = Internal desc = Action [attach] not available on
Issue reared its head after nodes were down for some time and there were some transient Network connectivity issues, routing problems etc.
It has happened before but I just created new volumes, this time I would like to see if it’s possible to reattach them.
Environment:
- Longhorn version: 1.1.0
- Kubernetes version: 1.2.x
- Node config
- OS type and version: K3S Latest/Test 1.2.5
- CPU per node: 4vcpu
- Memory per node: 12G
- Disk type: ext4
- Network bandwidth and latency between the nodes: 300-500Mbit
- Underlying Infrastructure (Virtual Machines / Proxmox):
Additional context
Had network problems and now can’t seem to attach existing longhorn volumes to two pods. I can only create new ones and attach.
I looked in the filesystem location and I can see that all the folders seem to be there and intact but I don’t know what I’m looking at and haven no idea what is stopping the attaching. I’m guessing it could be some kind of cluster race condition or split brain that it cant recover from, although It does allow creating new volumes and attaching them.
I am also unable to get to the longhorn UI to aid t-shooting but thats a different issue
Any ideas where I can start to troubleshoot or run any force recovery commands?
Many thanks, Jon.
Various errors:
AttachVolume.Attach failed for volume "pvc-9e6f1da9-f714-430f-a5a4-38e3fa95c700" : rpc error: code = Internal desc = Action [attach] not available on [&{pvc-9e6f1da9-f714-430f-a5a4-38e3fa95c700 volume map[self:http://longhorn-backend:9500/v1/volumes/pvc-9e6f1da9-f714-430f-a5a4-38e3fa95c700] map[salvage:http://longhorn-backend:9500/v1/volumes/pvc-9e6f1da9-f714-430f-a5a4-38e3fa95c700?action=salvage]}]
csi attacher logs
I0325 09:49:58.929165 1 connection.go:183] GRPC request: {"node_id":"kube2.wizznet.co.uk","volume_capability":{"AccessType":{"Mount":{"fs_type":"ext4"}},"access_mode":{"mode":1}},"volume_context":{"baseImage":"","fromBackup":"","numberOfReplicas":"2","staleReplicaTimeout":"30","storage.kubernetes.io/csiProvisionerIdentity":"1616427639514-8081-driver.longhorn.io"},"volume_id":"pvc-9e6f1da9-f714-430f-a5a4-38e3fa95c700"}
I0325 09:49:58.958803 1 connection.go:185] GRPC response: {}
I0325 09:49:58.959350 1 connection.go:186] GRPC error: rpc error: code = Internal desc = Action [attach] not available on [&{pvc-9e6f1da9-f714-430f-a5a4-38e3fa95c700 volume map[self:http://longhorn-backend:9500/v1/volumes/pvc-9e6f1da9-f714-430f-a5a4-38e3fa95c700] map[salvage:http://longhorn-backend:9500/v1/volumes/pvc-9e6f1da9-f714-430f-a5a4-38e3fa95c700?action=salvage]}]
I0325 09:49:58.959394 1 csi_handler.go:559] Saving attach error to "csi-a9b52b3f1de4b146e0078ebfa471b2f26ffaedf4439d25614e3db8985a1a947b"
I0325 09:49:59.039710 1 controller.go:158] Ignoring VolumeAttachment "csi-a9b52b3f1de4b146e0078ebfa471b2f26ffaedf4439d25614e3db8985a1a947b" change
I0325 09:49:59.088097 1 csi_handler.go:570] Saved attach error to "csi-a9b52b3f1de4b146e0078ebfa471b2f26ffaedf4439d25614e3db8985a1a947b"
I0325 09:49:59.088135 1 csi_handler.go:219] Error processing "csi-a9b52b3f1de4b146e0078ebfa471b2f26ffaedf4439d25614e3db8985a1a947b": failed to attach: rpc error: code = Internal desc = Action [attach] not available on [&{pvc-9e6f1da9-f714-430f-a5a4-38e3fa95c700 volume map[self:http://longhorn-backend:9500/v1/volumes/pvc-9e6f1da9-f714-430f-a5a4-38e3fa95c700] map[salvage:http://longhorn-backend:9500/v1/volumes/pvc-9e6f1da9-f714-430f-a5a4-38e3fa95c700?action=salvage]}]
I0325 09:50:01.831360 1 leaderelection.go:283] successfully renewed lease longhorn-system/external-attacher-leader-driver-longhorn-io
I0325 09:50:07.088365 1 controller.go:198] Started VA processing "csi-a9b52b3f1de4b146e0078ebfa471b2f26ffaedf4439d25614e3db8985a1a947b"
I0325 09:50:07.088396 1 csi_handler.go:209] CSIHandler: processing VA "csi-a9b52b3f1de4b146e0078ebfa471b2f26ffaedf4439d25614e3db8985a1a947b"
I0325 09:50:07.088406 1 csi_handler.go:236] Attaching "csi-a9b52b3f1de4b146e0078ebfa471b2f26ffaedf4439d25614e3db8985a1a947b"
I0325 09:50:07.088413 1 csi_handler.go:396] Starting attach operation for "csi-a9b52b3f1de4b146e0078ebfa471b2f26ffaedf4439d25614e3db8985a1a947b"
I0325 09:50:07.088471 1 csi_handler.go:329] PV finalizer is already set on "pvc-9e6f1da9-f714-430f-a5a4-38e3fa95c700"
I0325 09:50:07.088490 1 csi_handler.go:689] Found NodeID kube2.wizznet.co.uk in CSINode kube2.wizznet.co.uk
I0325 09:50:07.088509 1 csi_handler.go:289] VA finalizer is already set on "csi-a9b52b3f1de4b146e0078ebfa471b2f26ffaedf4439d25614e3db8985a1a947b"
I0325 09:50:07.088525 1 csi_handler.go:303] NodeID annotation is already set on "csi-a9b52b3f1de4b146e0078ebfa471b2f26ffaedf4439d25614e3db8985a1a947b"
I0325 09:50:07.088540 1 connection.go:182] GRPC call: /csi.v1.Controller/ControllerPublishVolume
I0325 09:50:07.088544 1 connection.go:183] GRPC request: {"node_id":"kube2.wizznet.co.uk","volume_capability":{"AccessType":{"Mount":{"fs_type":"ext4"}},"access_mode":{"mode":1}},"volume_context":{"
I0325 09:50:43.645135 1 leaderelection.go:247] failed to acquire lease longhorn-system/external-attacher-leader-driver-longhorn-io
I0325 09:50:57.112125 1 leaderelection.go:352] lock is held by csi-attacher-5dcdcd5984-kg2mk and has not yet expired
I0325 09:50:57.112154 1 leaderelection.go:247] failed to acquire lease longhorn-system/external-attacher-leader-driver-longhorn-io
I0325 09:51:07.901273 1 leaderelection.go:352] lock is held by csi-attacher-5dcdcd5984-kg2mk and has not yet expired
I0325 09:51:07.901306 1 leaderelection.go:247] failed to acquire lease longhorn-system/external-attacher-leader-driver-longhorn-io
I0325 09:51:17.425405 1 leaderelection.go:352] lock is held by csi-attacher-5dcdcd5984-kg2mk and has not yet expired
I0325 09:51:17.425431 1 leaderelection.go:247] failed to acquire lease longhorn-system/external-attacher-leader-driver-longhorn-io
I0325 09:51:22.675469 1 leaderelection.go:352] lock is held by csi-attacher-5dcdcd5984-kg2mk and has not yet expired
I0325 09:51:22.675486 1 leaderelection.go:247] failed to acquire lease longhorn-system/external-attacher-leader-driver-longhorn-io
I0325 09:51:33.588010 1 leaderelection.go:352] lock is held by csi-attacher-5dcdcd5984-kg2mk and has not yet expired
I0325 09:51:33.588035 1 leaderelection.go:247] failed to acquire lease longhorn-system/external-attacher-leader-driver-longhorn-io
I0325 09:51:42.190493 1 leaderelection.go:352] lock is held by csi-attacher-5dcdcd5984-kg2mk and has not yet expired
I0325 09:51:42.190518 1 leaderelection.go:247] failed to acquire lease longhorn-system/external-attacher-leader-driver-longhorn-io
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 16 (3 by maintainers)
All working again now thanks 😃
I’m pretty sure auto salvage is already turned on yet they didn’t salvage themselves? Cheers, Jon.
@bodleytunes I am currently working on some fixes for the auto salvage, since the issue was solved I am closing for now. If you encounter another salvage issue once v1.1.1 is release please reopen.
ok got it working following this: https://forums.rancher.com/t/longhorn-ui-with-traefik/16742/2