kubernetes: VSphere Cloud Provider dynamic storage provisioning got very slow after upgrading from v1.14 to v1.15.

What happened: We are using vsphere cloud provider. Dynamic storage provisioning got very slow after upgrading our K8s clusters from v1.14 to v1.15. What you expected to happen: We expected pvs to be created within seconds as it happens on K8s v1.14 cluster. However depending on the number pvcs created by the developers on the cluster, the creation of a pv took between 2 minutes to 156 minutes after the K8s upgrade.

How to reproduce it (as minimally and precisely as possible): Provision a v1.14 K8s cluster using VMs running on a single datacenter, no zoning used. Use the following vsphere.conf:

[Global]
#TODO: install certs for vcenter https and remove insecure-flag
insecure-flag = 1
secret-name = "vsphere-cloud-provider-secret"
secret-namespace = "kube-system"

[VirtualCenter "<IP_ADDRESS"]
port = <PORT>
datacenters = <DATACENTER>

[Workspace]
server = "<IP_ADDRESS"
datacenter = "<DATACENTER>"
default-datastore = "<YOUR_DEFAULT_DS>"
folder = "vm/Kubernetes"

[Disk]
scsicontrollertype = pvscsi

push the following storageclass:

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: vmfs-policydelete
parameters:
  diskformat: zeroedthick
  fstype: ext4
provisioner: kubernetes.io/vsphere-volume
reclaimPolicy: Delete
volumeBindingMode: Immediate

Create a pvc using the storageclass above and see that pv gets created within 3 seconds. Above please spot folder = "vm/Kubernetes", on a v1.14 cluster this could be folder = Kubernetes and the datastore would be found, with the commits cited down below as 1 and 2, once we upgraded to v1.15, we were getting Ambigous datastore name error when we have folder = Kubernetes in our vsphere.conf file. One reason is the fact that unfortunately our VMWARE Team decided to use the same folder name (which is also Kubernetes) for the datastores which they created for us. Once we realized that the folder names are same (for VMs and Datastores), we set the folder value to "vm/Kubernetes" in vsphere.conf and things started to work again but very slowly. Dynamic storage provisioning is taking really long time on our v1.15 cluster and downgrading to v1.14 from v1.15 is not an option for us.

Now upgrade your v1.14 cluster to v1.15 using kubeadm and repeat the exercise above. You will see that pv creation will take around 2 minutes.

Anything else we need to know?: We do not use any zoning, I suspect this issue arose with the following commits: 1, 2. Environment:

Kubernetes version (use kubectl version):

Client Version: version.Info{Major:"1", Minor:"16", GitVersion:"v1.16.2", GitCommit:"c97fe5036ef3df2967d086711e6c0c405941e14b", GitTreeState:"clean", BuildDate:"2019-10-15T19:18:23Z", GoVersion:"go1.12.10", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.7", GitCommit:"6c143d35bb11d74970e7bc0b6c45b6bfdffc0bd4", GitTreeState:"clean", BuildDate:"2019-12-11T12:34:17Z", GoVersion:"go1.12.12", Compiler:"gc", Platform:"linux/amd64"}

Cloud provider or hardware configuration: vsphere
OS (e.g: cat /etc/os-release):

NAME="Container Linux by CoreOS"
ID=coreos
VERSION=2303.3.0
VERSION_ID=2303.3.0
BUILD_ID=2019-12-02-2049
PRETTY_NAME="Container Linux by CoreOS 2303.3.0 (Rhyolite)"
ANSI_COLOR="38;5;75"
HOME_URL="https://coreos.com/"
BUG_REPORT_URL="https://issues.coreos.com"
COREOS_BOARD="amd64-usr"

Kernel (e.g. uname -a):

Linux 4.19.86-coreos #1 SMP Mon Dec 2 20:13:38 -00 2019 x86_64 Intel(R) Xeon(R) CPU E5-2650L v4 @ 1.70GHz GenuineIntel GNU/Linux

Install tools: kubeadm
Network plugin and version (if this is a network-related bug):
Others:

About this issue

Original URL
State: closed
Created 4 years ago
Reactions: 1
Comments: 19 (6 by maintainers)

Most upvoted comments

/remove-lifecycle stale

Baykonur on Dec 21, 2020