rancher: [vsphere] Network protocol profile ignored if cloud-init specified
- Background - I’m not sure if this is expected behavior, hence “question.”
We’re trying to automate the deployment of Kubernetes clusters in VSphere using network protocol profiles.
In our network we’re doing TLS interception/URL filtering so all traffic is being re-encrypted with our internal CA. This is causing RancherOS (i.e. the node being deployed) to throw certificate validation errors during system image download. If you try to solve this using a cloud-config file it seems to break the Network Protocol profile(s). e.g.: https://forums.rancher.com/t/rancheros-cloud-config-yml-root-ca/8076/4
What kind of request is this (question/bug/enhancement/feature request): question/bug?
Steps to reproduce (least amount of steps as possible):
- Setup rancher/rancher container on dedicated VM
- Setup VSphere requirements according to documentation (e.g. vsphere user, network protocol profile, etc.)
- Setup Node Template to use Vapp and Network Protocol Profile
- If any cloud-init file is specified the RancherOS VM(s) will be deploy and the network will not setup correctly
- If no cloud-init (and only vapp/protocol profile) is specified the VM network (eth0) is setup correctly, but the cluster deploy will fail due to certificate validation
- Rancher will eventually timeout waiting for SSH and will delete and retry creating the cluster.
Result: With Network Protocol Profile configured: Cloud-config:
#cloud-config
ssh_authorized_keys:
- ssh-rsa AAAAB ... E=
write_files:
- path: /opt/rancher/bin/start.sh
permissions: "0755"
owner: root
content: |+
#!/bin/sh
cat << _EOF_ >> /etc/ssl/certs/ca-certificates.crt
-----BEGIN CERTIFICATE-----
...
<intermediate_ca>
...
-----END CERTIFICATE-----
-----BEGIN CERTIFICATE-----
...
<ca>
...
-----END CERTIFICATE-----
_EOF_
If the cloud-config file is specified, the ros configuration looks like:
rancher:
environment:
EXTRA_CMDLINE: /init
services include:
open-vm-tools: true
state:
autoformat:
- /dev/sda
- /dev/vda
dev: LABEL=RANCHER_STATE
wait: true
ssh_authorized_keys: []
If no cloud-config is specified the ros config looks like:
hostname: tstkube1
rancher:
environment:
EXTRA_CMDLINE: /init
network:
dns:
nameservers:
- xxx.xxx.101.9
- xxx.xxx.101.11
search:
- <domain>
interfaces:
eth0:
addresses:
- <pooled_ip>
gateway: <network_gateway>
match: eth0
services_include:
open-vm-tools: true
state:
autoformat:
- /dev/sda
- /dev/vda
dev: LABEL=RANCHER_STATE
wait: true
...
Other details that may be helpful: The rancher/rancher container already has the internal CAs added, so the issue is only on the cluster deploy side.
Using a ‘blank’ cloud-config produces the same effect; which is making me think this is expected behavior? However, I haven’t seen anything in Rancher, RKE, or RancherOS documentation that specifies this (maybe I missed it).
I’ve also attempted sending a full ‘vsphereCloudProvider’ config, but that also failed. I’m not sure if this was an issue with the config itself or perhaps the names of the objects (there are spaces in our VSphere object names). For example:
cloud_provider:
name: vsphere
vsphereCloudProvider:
virtual_center:
<vcenter_node>:
user: vsphere.local\<test_user>
password: <password>
datacenters: CO Datacenter
workspace:
server: <vcenter_node>
folder: rancher-kubernetes
default-datastore: DATASTORE 01
datacenter: CO Datacenter
resourcepool-path: /CO Datacenter/host/Cluster 01/Resources/SERVICE GROUP
disk:
scsicontrollertype: pvscsi
network:
public-network: DvPG Guest <vlan#> xxx.xxx.230.160%2f27
Specifying the network manually also seems to fail. e.g:
#cloud-config
ssh_authorized_keys:
- ssh-rsa AAAA ... E=
rancher:
docker:
selinux_enabled: true
registry_mirror: "https://<server>:5000"
system_docker:
selinux_enabled: true
registry_mirror: "https://<server>:5000"
network:
interfaces:
eth0:
address: xxx.xxx.230.182/27
gateway: xxx.xxx.230.161
mtu: 1500
dhcp: false
dns:
override: true
nameservers:
- xxx.xxx.101.9
- xxx.xxx.101.11
search:
- <domain>
write_files:
- path: /opt/rancher/bin/start.sh
owner: root
permissions: "0755"
content: |+
#!/bin/sh
cat << _EOF_ >> /etc/ssl/certs/ca-certificates.crt
-----BEGIN CERTIFICATE-----
...
<intermediate>
...
-----END CERTIFICATE-----
-----BEGIN CERTIFICATE-----
...
<ca>
...
-----END CERTIFICATE-----
_EOF_
Environment information
- Rancher version (
rancher/rancher/rancher/serverimage tag or shown bottom left in the UI): rancher/rancher:v2.2.0-rc7 I’ve also tried rc4, and rc6. - Installation option (single install/HA): single install
Cluster information
-
Cluster type (Hosted/Infrastructure Provider/Custom/Imported): Infrastructure Provider - VCenter/VSphere 6.5
-
Machine type (cloud/VM/metal) and specifications (CPU/memory): VM - 2vcpu, 4GB RAM
-
Kubernetes version (use
kubectl version): n/a (cluster doesn’t finish deploying) -
Docker version (use
docker version):
Client:
Version: 18.09.3
API version: 1.39
Go version: go1.10.8
Git commit: 774a1f4
Built: Thu Feb 28 06:33:21 2019
OS/Arch: linux/amd64
Experimental: false
Server: Docker Engine - Community
Engine:
Version: 18.09.3
API version: 1.39 (minimum version 1.12)
Go version: go1.10.8
Git commit: 774a1f4
Built: Thu Feb 28 06:02:24 2019
OS/Arch: linux/amd64
Experimental: false
Is the expected behavior to only use cloud-init or only network protocol profiles? Is it not possible to use both?
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Reactions: 5
- Comments: 18
I’ve hit that exact issue, too. And yes, it also breaks when i include my cloud-config as base64 encoded string via the guestinfo settings. Then my network profile never gets applied.