vsphere-csi-driver: failed to get CsiNodeTopology for the node
What happened:
vsphere-csi-node
DaemonSet node-driver-registrar
fails with failed to get CsiNodeTopology for the node
I0322 12:55:25.883806 1 main.go:166] Version: v2.5.0
I0322 12:55:25.883841 1 main.go:167] Running node-driver-registrar in mode=registration
I0322 12:55:25.884289 1 main.go:191] Attempting to open a gRPC connection with: "/csi/csi.sock"
I0322 12:55:25.884310 1 connection.go:154] Connecting to unix:///csi/csi.sock
I0322 12:55:25.884693 1 main.go:198] Calling CSI driver to discover driver name
I0322 12:55:25.884717 1 connection.go:183] GRPC call: /csi.v1.Identity/GetPluginInfo
I0322 12:55:25.884721 1 connection.go:184] GRPC request: {}
I0322 12:55:25.886858 1 connection.go:186] GRPC response: {"name":"csi.vsphere.vmware.com","vendor_version":"v2.5.0"}
I0322 12:55:25.886926 1 connection.go:187] GRPC error: <nil>
I0322 12:55:25.886933 1 main.go:208] CSI driver name: "csi.vsphere.vmware.com"
I0322 12:55:25.886971 1 node_register.go:53] Starting Registration Server at: /registration/csi.vsphere.vmware.com-reg.sock
I0322 12:55:25.887559 1 node_register.go:62] Registration Server started at: /registration/csi.vsphere.vmware.com-reg.sock
I0322 12:55:25.887693 1 node_register.go:92] Skipping HTTP server because endpoint is set to: ""
I0322 12:55:27.616658 1 main.go:102] Received GetInfo call: &InfoRequest{}
I0322 12:55:27.617124 1 main.go:109] "Kubelet registration probe created" path="/var/lib/kubelet/plugins/csi.vsphere.vmware.com/registration"
I0322 12:55:27.636091 1 main.go:120] Received NotifyRegistrationStatus call: &RegistrationStatus{PluginRegistered:false,Error:RegisterPlugin error -- plugin registration failed with err: rpc error: code = Internal desc = failed to get CsiNodeTopology for the node: "talos-10-120-8-82". Error: no matches for kind "CSINodeTopology" in version "cns.vmware.com/v1alpha1",}
E0322 12:55:27.636112 1 main.go:122] Registration process failed with error: RegisterPlugin error -- plugin registration failed with err: rpc error: code = Internal desc = failed to get CsiNodeTopology for the node: "talos-10-120-8-82". Error: no matches for kind "CSINodeTopology" in version "cns.vmware.com/v1alpha1", restarting registration container.
What you expected to happen:
The only information I can find on CSINodeTopology with respect to this driver is on the guide Deploying vSphere Container Storage Plug-in with Topology, however, I do NOT have the 2 arguments for the external-provisioner sidecar uncommented as instructed. Other than that, I can’t even locate the CSINodeTopology
cns.vmware.com/v1alpha1
CRD.
How to reproduce it (as minimally and precisely as possible):
Deploy the vsphere-csi-driver
as instructed at Install vSphere Container Storage Plug-in
Anything else we need to know?:
Environment:
- csi-vsphere version: v2.5.0
- vsphere-cloud-controller-manager version: v1.22.5
- Kubernetes version: v1.23.4
- vSphere version: v7.0.3
- OS (e.g. from /etc/os-release): talos v1.0.0-beta.1
- Install tools: kubectl
/kind bug
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Reactions: 1
- Comments: 36 (1 by maintainers)
I can confirm that I’ve hit this problem also (new deployment, vsphere 7.0u3, k3s v1.24.4+k3s1)
as mentioned here, and in https://github.com/kubernetes-sigs/vsphere-csi-driver/issues/1948 the default setting of
improved-volume-topology: 'true'
in vsphere-csi-driver.yamlseems to be the cause, and changing it to
false
allows the pods to deploy.If you don’t need the feature you may set
improved-volume-topology: 'false'
inConfigMaps/internal-feature-states.csi.vsphere.vmware.com
. Otherwise this can fail for multiple reasons (e.g. as pointed out because of missing permissions in vCenter). Simply disabling the feature we didn’t want to use fixed the issue for us. It seems as it is enabled by default in more recent vSphere CSI releases. I’m not sure why this is needed since the manifest still has the comments args you’d need to enable toplogy awareness. The new feature gates are not very well documented.Enabling
Enable Improved Volume Topology
causes this error for us. Removing the selection and redeploying brings it online and stable.@shalini-b I think i’m making forward progress, my current deployment succeeds, but fails to launch all the containers in the daemonset:
each deamonset container vmware-system-csi/vsphere-csi-node-2pdr4:node-driver-registrar presents a similar log description:
the machine’s UUID:
the machines provider-id:
both match. But the CSI-node driver is seems to swap the bytes around?
if i reverse the first 12 bytes, they match
edit: i think this is https://github.com/kubernetes-sigs/vsphere-csi-driver/issues/1629, updating to v2.5.1 images final-edit: cool. I can make PVCs! way to go
So for my setup, this issue was caused by the vSphere CPI not working correctly and thus not untainting the nodes which never allowed the csi pods to run and I believe one of them is responsible for creating the CRD.
My CPI issue is documented here: https://github.com/kubernetes/cloud-provider-vsphere/issues/614