kubernetes: vSphere cloud provider fails to attach a volume due to Unable to find VM by UUID
Is this a BUG REPORT or FEATURE REQUEST?: /kind bug
What happened: I’m trying to deploy a Kubernetes v1.9.2 cluster on vSphere with the vSphere-Cloud-Provider. All works fine except the persistent volume attachment. With Kubernetes v1.8.7 the attachment works fine on the same VMs and vSphere environment.
What you expected to happen: The volume attachment works without errors like “Cannot find node “kbnnode01” in cache. Node not found!!!” or “[datacenter.go:78] Unable to find VM by UUID. VM UUID: f7f53642-5cc2-ced1-37f1-c6b04522a27e” in the kube-controller log.
How to reproduce it (as minimally and precisely as possible): Deploy a Kubernetes Cluster via Kubespray on vSphere based off the official kubernetes docs and kubespray docs:
- https://kubernetes.io/docs/getting-started-guides/vsphere/
- https://github.com/kubernetes-incubator/kubespray/blob/master/docs/vsphere.md
And try to deploy the offical vSphere-Cloud-Provider test pods (persistent volume) provided here: https://github.com/kubernetes/kubernetes/tree/master/examples/volumes/vsphere
Anything else we need to know?: With Kubernets v1.8.7 the volume attachment works on the same VM and vSphere environment. Both the CoreOS and vanilla Hyperkube images from Google are affected. The issue remains after i have added the VM UUID to the cloud config.
Based on the VMware docs for the vSphere-Cloud-Provider (https://vmware.github.io/vsphere-storage-for-kubernetes/documentation/existing.html) the provider should pick the UUID from /sys/class/dmi/id/product_serial if vm-uuid is unset or empty. But this ID is different from the IDs in my logs so it can’t find the VM in vSphere. product_serial from kbnnode01 is 42365F38-CF20-C79C-80D3-52363D75A0EF and from logs it is f7f53642-5cc2-ced1-37f1-c6b04522a27e
Environment:
- Kubernetes version (use
kubectl version): v1.9.2 - Cloud provider or hardware configuration: vSphere-Cloud-Provider on ESXi 6.5 and vCenter 6.5 with latest updates
- OS (e.g. from /etc/os-release): Ubuntu 16.04
- Kernel (e.g.
uname -a): Linux kbnmaster01 4.4.0-112-generic - Install tools: Kubespray
- Others: Used the RBAC Role provided in issue #57279 to fix the “Failed to list *v1.Node: nodes is forbidden” error
My cloud config:
[Global]
datacenter = "Falkenstein"
datastore = "datastore1"
insecure-flag = 1
password = "SECRET"
port = 443
server = "vcenter.xnet.local"
user = "kubernetes_svc@vsphere.local"
working-dir = "/Falkenstein/vm/Kubernetes/"
vm-uuid =
[Disk]
scsicontrollertype = pvscsi
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Comments: 37 (18 by maintainers)
Looks like the issue is still valid with vm compability 6.7.0 u3. For guest OS ubuntu 18.04.3, with Kubernetes 1.15.3, Kubespray 2.11.0 unfortunately, UUID and product_serial are not same.