kubernetes: vSphere cloud provider fails to attach a volume due to Unable to find VM by UUID

Is this a BUG REPORT or FEATURE REQUEST?: /kind bug

What happened: I’m trying to deploy a Kubernetes v1.9.2 cluster on vSphere with the vSphere-Cloud-Provider. All works fine except the persistent volume attachment. With Kubernetes v1.8.7 the attachment works fine on the same VMs and vSphere environment.

What you expected to happen: The volume attachment works without errors like “Cannot find node “kbnnode01” in cache. Node not found!!!” or “[datacenter.go:78] Unable to find VM by UUID. VM UUID: f7f53642-5cc2-ced1-37f1-c6b04522a27e” in the kube-controller log.

How to reproduce it (as minimally and precisely as possible): Deploy a Kubernetes Cluster via Kubespray on vSphere based off the official kubernetes docs and kubespray docs:

And try to deploy the offical vSphere-Cloud-Provider test pods (persistent volume) provided here: https://github.com/kubernetes/kubernetes/tree/master/examples/volumes/vsphere

Anything else we need to know?: With Kubernets v1.8.7 the volume attachment works on the same VM and vSphere environment. Both the CoreOS and vanilla Hyperkube images from Google are affected. The issue remains after i have added the VM UUID to the cloud config.

Based on the VMware docs for the vSphere-Cloud-Provider (https://vmware.github.io/vsphere-storage-for-kubernetes/documentation/existing.html) the provider should pick the UUID from /sys/class/dmi/id/product_serial if vm-uuid is unset or empty. But this ID is different from the IDs in my logs so it can’t find the VM in vSphere. product_serial from kbnnode01 is 42365F38-CF20-C79C-80D3-52363D75A0EF and from logs it is f7f53642-5cc2-ced1-37f1-c6b04522a27e

Environment:

  • Kubernetes version (use kubectl version): v1.9.2
  • Cloud provider or hardware configuration: vSphere-Cloud-Provider on ESXi 6.5 and vCenter 6.5 with latest updates
  • OS (e.g. from /etc/os-release): Ubuntu 16.04
  • Kernel (e.g. uname -a): Linux kbnmaster01 4.4.0-112-generic
  • Install tools: Kubespray
  • Others: Used the RBAC Role provided in issue #57279 to fix the “Failed to list *v1.Node: nodes is forbidden” error

My cloud config:

[Global]
datacenter = "Falkenstein"
datastore = "datastore1"
insecure-flag = 1
password = "SECRET"
port = 443
server = "vcenter.xnet.local"
user = "kubernetes_svc@vsphere.local"
working-dir = "/Falkenstein/vm/Kubernetes/"
vm-uuid =

[Disk]
scsicontrollertype = pvscsi

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Comments: 37 (18 by maintainers)

Commits related to this issue

Most upvoted comments

Looks like the issue is still valid with vm compability 6.7.0 u3. For guest OS ubuntu 18.04.3, with Kubernetes 1.15.3, Kubespray 2.11.0 unfortunately, UUID and product_serial are not same.