kubernetes: vsphere cloud provider vcp stopped working after update to kubernetes 1.11.0

Is this a BUG REPORT or FEATURE REQUEST?: /kind bug /sig cloud-provider What happened: I had vcp working perfectly on kubernetes v1.9.6 . I had the installation done using kubeadm on our vsphere/vcenter environment .

Yesterday i did update from 1.9.6 -> 1.10.4 -> 1.11.0 , but after the update I am not able to put vcp back to work. I tried complete reinstall but still it is not working . It is complaining about VM Not Found as below What you expected to happen: VCP to work as it was on v1.9.6 How to reproduce it (as minimally and precisely as possible): Not sure, this is a production environment so I can’t a lot of tries

Anything else we need to know?:

Jul  7 05:08:41 localhost journal: E0707 05:08:41.853267       1 datacenter.go:78] Unable to find VM by UUID. VM UUID: 
Jul  7 05:08:41 localhost journal: E0707 05:08:41.853374       1 nodemanager.go:414] Error "No VM found" node info for node "engine01" not found
Jul  7 05:08:41 localhost journal: E0707 05:08:41.853416       1 vsphere_util.go:134] Error while obtaining Kubernetes node nodeVmDetail details. error : No VM found
Jul  7 05:08:41 localhost journal: E0707 05:08:41.853444       1 vsphere.go:1160] Failed to get shared datastore: No VM found
Jul  7 05:08:41 localhost journal: I0707 05:08:41.854301       1 event.go:221] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"default", Name:"test-disk", UID:"a7c88c13-813d-11e8-aa8c-0050568166d0", APIVersion:"v1", ResourceVersion:"31538424", FieldPath:""}): type: 'Warning' reason: 'ProvisioningFailed' Failed to provision volume with StorageClass "thin-disk": No VM found
Jul  7 05:08:43 localhost journal: I0707 05:08:43.768259       1 reconciler.go:291] attacherDetacher.AttachVolume started for volume "pvc-a3091746-6a16-11e8-87de-0050568166d0" (UniqueName: "kubernetes.io/vsphere-volume/[3PAR_Datastore06] kubevols/kubernetes-dynamic-pvc-a3091746-6a16-11e8-87de-0050568166d0.vmdk") from node "engine02" 
Jul  7 05:08:43 localhost journal: E0707 05:08:43.785846       1 datacenter.go:78] Unable to find VM by UUID. VM UUID: 
Jul  7 05:08:43 localhost journal: E0707 05:08:43.785913       1 nodemanager.go:282] Error "No VM found" node info for node "engine02" not found
Jul  7 05:08:43 localhost journal: E0707 05:08:43.785938       1 vsphere.go:550] Cannot find node "engine02" in cache. Node not found!!!
Jul  7 05:08:43 localhost journal: E0707 05:08:43.786012       1 attacher.go:80] Error attaching volume "[3PAR_Datastore06] kubevols/kubernetes-dynamic-pvc-a3091746-6a16-11e8-87de-0050568166d0.vmdk" to node "engine02": No VM found

Jul  7 05:31:40 localhost journal: E0707 05:31:40.976785       1 datacenter.go:78] Unable to find VM by UUID. VM UUID: 
Jul  7 05:31:40 localhost journal: E0707 05:31:40.976856       1 nodemanager.go:414] Error "No VM found" node info for node "engine01" not found
Jul  7 05:31:40 localhost journal: E0707 05:31:40.976883       1 vsphere_util.go:134] Error while obtaining Kubernetes node nodeVmDetail details. error : No VM found
Jul  7 05:31:40 localhost journal: E0707 05:31:40.976900       1 vsphere.go:1160] Failed to get shared datastore: No VM found
Jul  7 05:31:40 localhost journal: I0707 05:31:40.977444       1 event.go:221] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"default", Name:"test-disk", UID:"a7c88c13-813d-11e8-aa8c-0050568166d0", APIVersion:"v1", ResourceVersion:"31538424", FieldPath:""}): type: 'Warning' reason: 'ProvisioningFailed' Failed to provision volume with StorageClass "thin-disk": No VM found

Environment:

  • Kubernetes version (use kubectl version):
Client Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.0", GitCommit:"91e7b4fd31fcd3d5f436da26c980becec37ceefe", GitTreeState:"clean", BuildDate:"2018-06-27T20:17:28Z", GoVersion:"go1.10.2", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.0", GitCommit:"91e7b4fd31fcd3d5f436da26c980becec37ceefe", GitTreeState:"clean", BuildDate:"2018-06-27T20:08:34Z", GoVersion:"go1.10.2", Compiler:"gc", Platform:"linux/amd64"}
  • Cloud provider or hardware configuration: vsphere
  • OS (e.g. from /etc/os-release):
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"

CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"
  • Kernel (e.g. uname -a): Linux engine03 3.10.0-693.21.1.el7.x86_64 #1 SMP Wed Mar 7 19:03:37 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
  • Install tools:
  • Others:

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Comments: 55 (8 by maintainers)

Most upvoted comments

@w-leads great news for me 👍 , after updating the node info everything worked perfectly. I don’t know if it is correct to set it manually as I did

(⎈ |production:kube-system)➜  ~ kubectl get nodes -o json | jq '.items[]|[.metadata.name, .spec.externalID, .spec.providerID]'
[
  "engine01",
  "engine01",
  "vsphere://4201288E-D695-A4BC-14A4-601CCD17D9A2"
]
[
  "engine02",
  "engine02",
  "vsphere://4201FDD5-32DA-E706-A6E7-3C6A82F72E1D"
]
[
  "engine03",
  "engine03",
  "vsphere://4201da8e-d1e5-c356-d114-d9e1af6080ef"
]
[
  "kube-master",
  "4201b165-5678-30b8-28f4-c441fedb7ae9",
  "vsphere://4201b165-5678-30b8-28f4-c441fedb7ae9"
]

One thing I noticed while updating kubernetes through kubeadm , is that on master it worked perfectly without any issue while on nodes the kubelet did not start and I had to manually create the file /var/lib/kubelet/kubeadm-flags.env with KUBELET_KUBEADM_ARGS=--cgroup-driver=systemd --cni-bin-dir=/opt/cni/bin --cni-conf-dir=/etc/cni/net.d --network-plugin=cni

Thanks for your help

Hello.

After updating node providerId everything worked perfectly.

To update this info, I used this command :

kubectl patch node <Node name> -p '{"spec":{"providerID":"vsphere://<vm uuid>"}}'

To show the vm uuid, connect in the vm and execute :

cat /sys/class/dmi/id/product_serial | sed -e 's/^VMware-//' -e 's/-/ /' | awk '{ print toupper($1$2$3$4 "-" $5$6 "-" $7$8 "-" $9$10 "-" $11$12$13$14$15$16) }'

@divyenpatel ,

I specified the configuration before running kubeadm, the config file I passed to kubeadm looked something like this:

kind: ClusterConfiguration
kubernetesVersion: v1.13.2
networking:
  podSubnet: 10.32.0.0/12
apiServer:
  certSANs:
  - "<loadbalancer>"
  extraArgs:
    cloud-provider: vsphere
    cloud-config: /etc/kubernetes/pki/vsphere.conf
controlPlaneEndpoint: "<loadbalancer>:6443"
controllerManager:
  extraArgs:
    cloud-config: /etc/kubernetes/pki/vsphere.conf
    cloud-provider: vsphere

I also specified the following in /etc/sysconfig/kublet:

KUBELET_EXTRA_ARGS=--cloud-provider=vsphere --cloud-config=/etc/kubernetes/pki/vsphere.conf

I ran kubeadm init on the first master node, and then used kubeadm join with the proper arguments on the remaining two master nodes and the three worker nodes (per the current 1.13 documentation on kubernetes.io). When I checked the provider ids of all the nodes via kubectl they all had the proper provider id that matched the machine’s UUID.

Besides the workaround to manually patch each node, was there a fix for newer versions? Just tried k8s 1.11.3 is still still facing this issue (with same vsphere settings which work perfectly on 1.10.x) I think only vmware experts like @divyenpatel can help.

Let me suggest a small fix to the above script that supports spaces in FOLDER path

#!/bin/bash

export GOVC_USERNAME='<user>'
export GOVC_INSECURE=1
export GOVC_PASSWORD='<password>'
export GOVC_URL='<server>'
DATACENTER='Coop'
FOLDER='<path>'
# In my case I'm using a prefix for the VM's, so grep'ing is necessary.
# You can remove it if the folder you are using only contains the machines you need.
VM_PREFIX='<prefix>'
IFS=$'\n'
for vm in $(./govc ls "/$DATACENTER/vm/$FOLDER" | grep $VM_PREFIX); do
  MACHINE_INFO=$(./govc vm.info -json -dc=$DATACENTER -vm.ipath="/$vm" -e=true)
  # My VMs are created on vmware with upper case names, so I need to edit the names with awk
  VM_NAME=$(jq -r ' .VirtualMachines[] | .Name' <<< $MACHINE_INFO | awk '{print tolower($0)}')
  # UUIDs come in lowercase, upper case then
  VM_UUID=$( jq -r ' .VirtualMachines[] | .Config.Uuid' <<< $MACHINE_INFO | awk '{print toupper($0)}')
  echo "Patching $VM_NAME with UUID:$VM_UUID"
  # This is done using dry-run to avoid possible mistakes, remove when you are confident you got everything right.
  kubectl patch node $VM_NAME -p "{\"spec\":{\"providerID\":\"vsphere://$VM_UUID\"}}"
done

@Vislor

In this case kubelet is already running with cloud-provider vsphere enabled. and node registration is happening after that, which is setting provider id correctly.

But if you already have kubernetes cluster deployed with kubeadm, without VCP enabled, and later when you try to enable VCP, i think node does not get provider id. and we have to patch nodes manually, or remove nodes from API servers and register them back with restarting kubelet on nodes.

@dbason 1.12 looks fine to me.