rancher: Air-gapped provisioning of RKE2 cluster fails when vSphere Cloud Provider (CPI/CSI) enabled in cluster configuration
Issue description:
When provisioning an RKE2 Node Driver cluster with the vSphere Cloud Provider (i.e. CPI/CSI) configured in the cluster configuration (versus installed via Apps & Marketplace after cluster provisioning) the CPI and CSI charts do not use the configured system-default-registry
, but attempt to pull from DockerHub. In an air-gapped environment without access to DockerHub the CPI and CSI workloads therefore remain in a CrashLoopBackoff attempting to pull the image, and node provisioning does not complete.
Based on backend investigation, it doesn’t really have anything to do with the charts because of being able to edit as YAML, the user is able to set the global.cattle.systemDefaultRegistry
, which means that it’s able to use a private registry. It looks like the UI (or wherever this is configurable) should use it when provisioning.
There is a workaround (noted below).
Business impact:
Unable to provision RKE2 cluster with vSphere cloud provider set in cluster configuration if using a private registry in an air-gapped environment.
Troubleshooting steps:
N/A
Repro steps:
- Populate a private registry with the Rancher v2.6.0 images (https://rancher.com/docs/rancher/v2.6/en/installation/other-installation-methods/air-gap/populate-private-registry/)
- Install Rancher v2.6.0 on a single node RKE cluster with
--set systemDefaultRegistry
per air-gap installation docs https://rancher.com/docs/rancher/v2.6/en/installation/other-installation-methods/air-gap/install-rancher/ - Provision an RKE2 cluster using the vSphere Node Driver, and the private registry config, without setting the vSphere Cloud Provider enabled in the cluster configuration, and observe successful cluster provisioning. After cluster provisioning deploy the vSphere CSI and CPI charts from the Apps & Marketplace and observe chart workloads successfully deployed using private registry images.
- Provision an RKE cluster using the vSphere Node Driver, and the private registry config, and set the vSphere Cloud Provider to enabled in the cluster configuration, passing in configuration under Addons, and observe successful failed provisioning. Download SSH key for node and using kubectl on the node, observe CPI and CSI workloads failing attempting to pull image from DockerHub not private registry
Workaround:
Is workaround available and implemented? Yes
What is the workaround:
Install cluster without configuring vSphere Cloud Provider and install vSphere CPI/CSI charts after cluster provisioning
Actual behavior:
vSphere CSI/CPI workloads do not use configured system-default-registry
if configured at cluster provisioning time, preventing successful cluster provisioning in air-gapped environment
Expected behavior:
vSphere CSI/CPI workloads use configured system-default-registry
if configured at cluster provisioning time
Files, logs, traces:
N/A
Additional notes:
By clicking ‘Edit as YAML’ on the cluster configuration user could set global.cattle.systemDefaultRegistry
manually for the rancher-vsphere-cpi
and rancher-vsphere-csi
but expectation would be this uses the system-default-registry
automatically
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 15 (7 by maintainers)
Yeah, this needs to be fixed in the RKE2 vSphere provider Helm chart as described in https://github.com/rancher/rancher/issues/36467#issuecomment-1034809994