vsphere-csi-driver: CNS volumes disappear and all in cluster operations fail
Is this a BUG REPORT or FEATURE REQUEST?:
/kind bug
What happened: In 2 separate kubernetes clusters we have observed failures when pods get moved and as a result the PVs have to be attached to the new node. The CNS volume object seems to just disappear, the FCD is still intact on the vsan. Nothing has triggered a deletion as far as I can seeā¦
Events from the kubernetes side:
kubectl get event -n workload
LAST SEEN TYPE REASON OBJECT MESSAGE
26s Normal Scheduled pod/redis-754fbf4bd-26lbf Successfully assigned workload/redis-754fbf4bd-26lbf to worker3
11s Warning FailedAttachVolume pod/redis-754fbf4bd-26lbf AttachVolume.Attach failed for volume "pvc-202662bf-3ce7-40a0-96af-d22d58198dce" : rpc error: code = Internal desc = failed to attach disk: "eb05ab8b-dcd0-4217-9c34-3ec8bda666a9" with node: "worker3" err ServerFaultCode: Received SOAP response fault from [<cs p:00007fa4dc0a6290, TCP:localhost:443>]: retrieveVStorageObject
Logs from csi-controller:
2021-09-17 12:27:36
I0917 09:27:36.946521 1 reflector.go:381] k8s.io/client-go/informers/factory.go:134: forcing resync
2021-09-17 12:27:36
I0917 09:27:36.340379 1 reflector.go:381] k8s.io/client-go/informers/factory.go:134: forcing resync
2021-09-17 12:27:35
I0917 09:27:35.969819 1 csi_handler.go:226] Error processing "csi-697720ade9eeaa3b9851f3276fb8a3270cda2ff287e7a44690e09c7a9b3bcfba": failed to detach: rpc error: code = Internal desc = volumeID "eb05ab8b-dcd0-4217-9c34-3ec8bda666a9" not found in QueryVolume
2021-09-17 12:27:35
I0917 09:27:35.969770 1 csi_handler.go:612] Saved detach error to "csi-697720ade9eeaa3b9851f3276fb8a3270cda2ff287e7a44690e09c7a9b3bcfba"
2021-09-17 12:27:35
I0917 09:27:35.967871 1 controller.go:158] Ignoring VolumeAttachment "csi-697720ade9eeaa3b9851f3276fb8a3270cda2ff287e7a44690e09c7a9b3bcfba" change
2021-09-17 12:27:35
I0917 09:27:35.952349 1 csi_handler.go:601] Saving detach error to "csi-697720ade9eeaa3b9851f3276fb8a3270cda2ff287e7a44690e09c7a9b3bcfba"
2021-09-17 12:27:35
{"level":"error","time":"2021-09-17T09:27:35.951799411Z","caller":"vanilla/controller.go:883","msg":"volumeID \"eb05ab8b-dcd0-4217-9c34-3ec8bda666a9\" not found in QueryVolume","TraceId":"e3830e8e-5954-433b-a777-e5623668c7b3","stacktrace":"sigs.k8s.io/vsphere-csi-driver/pkg/csi/service/vanilla.(*controller).ControllerUnpublishVolume.func1\n\t/build/pkg/csi/service/vanilla/controller.go:883\nsigs.k8s.io/vsphere-csi-driver/pkg/csi/service/vanilla.(*controller).ControllerUnpublishVolume\n\t/build/pkg/csi/service/vanilla/controller.go:937\ngithub.com/container-storage-interface/spec/lib/go/csi._Controller_ControllerUnpublishVolume_Handler.func1\n\t/go/pkg/mod/github.com/container-storage-interface/spec@v1.2.0/lib/go/csi/csi.pb.go:5200\ngithub.com/rexray/gocsi/middleware/serialvolume.(*interceptor).controllerUnpublishVolume\n\t/go/pkg/mod/github.com/rexray/gocsi@v1.2.2/middleware/serialvolume/serial_volume_locker.go:141\ngithub.com/rexray/gocsi/middleware/serialvolume.(*interceptor).handle\n\t/go/pkg/mod/github.com/rexray/gocsi@v1.2.2/middleware/serialvolume/serial_volume_locker.go:88\ngithub.com/rexray/gocsi/utils.ChainUnaryServer.func2.1.1\n\t/go/pkg/mod/github.com/rexray/gocsi@v1.2.2/utils/utils_middleware.go:99\ngithub.com/rexray/gocsi/middleware/specvalidator.(*interceptor).handleServer.func1\n\t/go/pkg/mod/github.com/rexray/gocsi@v1.2.2/middleware/specvalidator/spec_validator.go:178\ngithub.com/rexray/gocsi/middleware/specvalidator.(*interceptor).handle\n\t/go/pkg/mod/github.com/rexray/gocsi@v1.2.2/middleware/specvalidator/spec_validator.go:218\ngithub.com/rexray/gocsi/middleware/specvalidator.(*interceptor).handleServer\n\t/go/pkg/mod/github.com/rexray/gocsi@v1.2.2/middleware/specvalidator/spec_validator.go:177\ngithub.com/rexray/gocsi/utils.ChainUnaryServer.func2.1.1\n\t/go/pkg/mod/github.com/rexray/gocsi@v1.2.2/utils/utils_middleware.go:99\ngithub.com/rexray/gocsi.(*StoragePlugin).injectContext\n\t/go/pkg/mod/github.com/rexray/gocsi@v1.2.2/middleware.go:231\ngithub.com/rexray/gocsi/utils.ChainUnaryServer.func2.1.1\n\t/go/pkg/mod/github.com/rexray/gocsi@v1.2.2/utils/utils_middleware.go:99\ngithub.com/rexray/gocsi/utils.ChainUnaryServer.func2\n\t/go/pkg/mod/github.com/rexray/gocsi@v1.2.2/utils/utils_middleware.go:106\ngithub.com/container-storage-interface/spec/lib/go/csi._Controller_ControllerUnpublishVolume_Handler\n\t/go/pkg/mod/github.com/container-storage-interface/spec@v1.2.0/lib/go/csi/csi.pb.go:5202\ngoogle.golang.org/grpc.(*Server).processUnaryRPC\n\t/go/pkg/mod/google.golang.org/grpc@v1.26.0/server.go:1024\ngoogle.golang.org/grpc.(*Server).handleStream\n\t/go/pkg/mod/google.golang.org/grpc@v1.26.0/server.go:1313\ngoogle.golang.org/grpc.(*Server).serveStreams.func1.1\n\t/go/pkg/mod/google.golang.org/grpc@v1.26.0/server.go:722"}
2021-09-17 12:27:35
{"level":"info","time":"2021-09-17T09:27:35.904885335Z","caller":"vanilla/controller.go:857","msg":"ControllerUnpublishVolume: called with args {VolumeId:eb05ab8b-dcd0-4217-9c34-3ec8bda666a9 NodeId:worker3 Secrets:map[] XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}","TraceId":"e3830e8e-5954-433b-a777-e5623668c7b3"}
2021-09-17 12:27:35
I0917 09:27:35.903737 1 csi_handler.go:715] Found NodeID worker3 in CSINode worker3
And some entries from vsanvcmgmtd:
2021-09-15T22:18:29.157Z verbose vsanvcmgmtd[10186] [vSAN@6876 sub=PyBackedMO opId=sps-Main-13723-334-8a1a] Enter vasa.NotificationManager.getAlarm, Pending: 1 (5269718e-b4b7-0511-b82b-b8f707de46d5)
2021-09-15T22:18:29.157Z verbose vsanvcmgmtd[10186] [vSAN@6876 sub=PyBackedMO opId=sps-Main-13723-334-8a1a] Exit vasa.NotificationManager.getAlarm (0 ms)
2021-09-15T22:18:38.683Z info vsanvcmgmtd[07066] [vSAN@6876 sub=AccessChecker] User CLISTER.LOCAL\user@10.170.100.178 was authenticated with soap session id. 52db629f-f84a-15b2-6401-b41b25af9ec7 (52a55bd9-9ac9-d4bd-d9f0-d6145bb4f7a5)
2021-09-15T22:18:38.704Z verbose vsanvcmgmtd[07065] [vSAN@6876 sub=PyBackedMO opId=07dc8a1b] Enter vim.cns.VolumeManager.queryAll, Pending: 1 (52db629f-f84a-15b2-6401-b41b25af9ec7)
2021-09-15T22:18:38.706Z verbose vsanvcmgmtd[07065] [vSAN@6876 sub=PyBackedMO opId=07dc8a1b] Exit vim.cns.VolumeManager.queryAll (1 ms)
2021-09-15T22:18:38.731Z verbose vsanvcmgmtd[07104] [vSAN@6876 sub=PyBackedMO opId=07dc8a1c] Enter vim.cns.VolumeManager.query, Pending: 1 (52db629f-f84a-15b2-6401-b41b25af9ec7)
2021-09-15T22:18:38.885Z verbose vsanvcmgmtd[07104] [vSAN@6876 sub=PyBackedMO opId=07dc8a1c] Exit vim.cns.VolumeManager.query (154 ms)
2021-09-15T22:18:38.923Z verbose vsanvcmgmtd[10186] [vSAN@6876 sub=PyBackedMO opId=07dc8a1d] Enter vim.cns.VolumeManager.create, Pending: 1 (52db629f-f84a-15b2-6401-b41b25af9ec7)
2021-09-15T22:18:38.928Z info vsanvcmgmtd[10186] [vSAN@6876 sub=VolumeManager opId=07dc8a1d] CNS: create volume task created: task-314869
2021-09-15T22:18:38.928Z info vsanvcmgmtd[10186] [vSAN@6876 sub=VolumeManager opId=07dc8a1d] CNS: Creating volume task scheduled.
2021-09-15T22:18:38.928Z verbose vsanvcmgmtd[10186] [vSAN@6876 sub=PyBackedMO opId=07dc8a1d] Exit vim.cns.VolumeManager.create (4 ms)
2021-09-15T22:18:38.933Z info vsanvcmgmtd[25351] [vSAN@6876 sub=VolumeManager opId=07dc8a1d] CNS: Creating volume task started
2021-09-15T22:18:39.029Z error vsanvcmgmtd[25351] [vSAN@6876 sub=VolumeManager opId=07dc8a1d] CNS: backingDiskId not found: eb05ab8b-dcd0-4217-9c34-3ec8bda666a9, N3Vim5Fault8NotFound9ExceptionE(Fault cause: vim.fault.NotFound
--> )
--> [context]zKq7AVECAAAAAFnREgEVdnNhbnZjbWdtdGQAAFy7KmxpYnZtYWNvcmUuc28AAAw5GwB+uxgBNG38bGlidmltLXR5cGVzLnNvAIG7mg8BgVrlDwECtWEObGlidm1vbWkuc28AAspdEQJXXxEDHuoCbGliUHlDcHBWbW9taS5zbwACHJISAvGKEgTozgNsaWJ2c2xtLXR5cGVzLnNvAAXF3AdfY25zLnNvAAVN1gUFixQGABYkIwCSJiMA5RIrBtRzAGxpYnB0aHJlYWQuc28uMAAHzY4ObGliYy5zby42AA==[/context]
2021-09-15T22:18:39.031Z info vsanvcmgmtd[25351] [vSAN@6876 sub=VolumeManager opId=07dc8a1d] Create volume completed: task-314869
2021-09-15T22:18:39.054Z verbose vsanvcmgmtd[12908] [vSAN@6876 sub=PyBackedMO opId=07dc8a1e] Enter vim.cns.VolumeManager.create, Pending: 1 (52db629f-f84a-15b2-6401-b41b25af9ec7)
2021-09-15T22:18:39.057Z info vsanvcmgmtd[12908] [vSAN@6876 sub=VolumeManager opId=07dc8a1e] CNS: create volume task created: task-314870
2021-09-15T22:18:39.057Z info vsanvcmgmtd[12908] [vSAN@6876 sub=VolumeManager opId=07dc8a1e] CNS: Creating volume task scheduled.
2021-09-15T22:18:39.057Z verbose vsanvcmgmtd[12908] [vSAN@6876 sub=PyBackedMO opId=07dc8a1e] Exit vim.cns.VolumeManager.create (3 ms)
2021-09-15T22:18:39.062Z info vsanvcmgmtd[43408] [vSAN@6876 sub=VolumeManager opId=07dc8a1e] CNS: Creating volume task started
2021-09-15T22:18:39.158Z verbose vsanvcmgmtd[07049] [vSAN@6876 sub=PyBackedMO opId=sps-Main-13723-334-8a1f] Enter vasa.NotificationManager.getAlarm, Pending: 1 (5269718e-b4b7-0511-b82b-b8f707de46d5)
2021-09-15T22:18:39.158Z verbose vsanvcmgmtd[07049] [vSAN@6876 sub=PyBackedMO opId=sps-Main-13723-334-8a1f]
What you expected to happen:
CNS volume objects should not just disappear?
How to reproduce it (as minimally and precisely as possible):
Anything else we need to know?:
Environment:
- csi-vsphere version:
- vsphere-cloud-controller-manager version: v1.20.0
- Kubernetes version: 1.20.10
- vSphere version: 6.7u3 (6.7.0.48000)
- OS (e.g. from /etc/os-release): ubuntu 20.04
- Kernel (e.g.
uname -a
):5.4.0-65-generic #73-Ubuntu SMP Mon Jan 18 17:25:17 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
- Install tools:
- Others:
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 26 (8 by maintainers)
@jingxu97 Have you deployed 2 K8s clusters with the same cluster id? We have seen similar things happening in the past when there are 2 K8s clusters with the same cluster id.