rancher: [vsphere] Network protocol profile ignored if cloud-init specified

- Background - I’m not sure if this is expected behavior, hence “question.”

We’re trying to automate the deployment of Kubernetes clusters in VSphere using network protocol profiles.

In our network we’re doing TLS interception/URL filtering so all traffic is being re-encrypted with our internal CA. This is causing RancherOS (i.e. the node being deployed) to throw certificate validation errors during system image download. If you try to solve this using a cloud-config file it seems to break the Network Protocol profile(s). e.g.: https://forums.rancher.com/t/rancheros-cloud-config-yml-root-ca/8076/4

What kind of request is this (question/bug/enhancement/feature request): question/bug?

Steps to reproduce (least amount of steps as possible):

  • Setup rancher/rancher container on dedicated VM
  • Setup VSphere requirements according to documentation (e.g. vsphere user, network protocol profile, etc.)
  • Setup Node Template to use Vapp and Network Protocol Profile
  1. If any cloud-init file is specified the RancherOS VM(s) will be deploy and the network will not setup correctly
  2. If no cloud-init (and only vapp/protocol profile) is specified the VM network (eth0) is setup correctly, but the cluster deploy will fail due to certificate validation
  • Rancher will eventually timeout waiting for SSH and will delete and retry creating the cluster.

Result: With Network Protocol Profile configured: Cloud-config:

#cloud-config
ssh_authorized_keys:
  - ssh-rsa AAAAB ... E=

write_files:
  - path: /opt/rancher/bin/start.sh
    permissions: "0755"
    owner: root
    content: |+
      #!/bin/sh
      cat << _EOF_ >> /etc/ssl/certs/ca-certificates.crt
      -----BEGIN CERTIFICATE-----
      ...
      <intermediate_ca>
      ...
      -----END CERTIFICATE-----
      -----BEGIN CERTIFICATE-----
      ...
      <ca>
      ...
      -----END CERTIFICATE-----
      _EOF_

If the cloud-config file is specified, the ros configuration looks like:

rancher:
  environment:
    EXTRA_CMDLINE: /init
  services include:
    open-vm-tools: true
  state:
    autoformat:
    - /dev/sda
    - /dev/vda
    dev: LABEL=RANCHER_STATE
    wait: true
ssh_authorized_keys: []

If no cloud-config is specified the ros config looks like:

hostname: tstkube1
rancher:
  environment:
    EXTRA_CMDLINE: /init
  network:
    dns:
      nameservers:
      - xxx.xxx.101.9
      - xxx.xxx.101.11
      search:
      - <domain>
    interfaces:
      eth0:
        addresses:
         - <pooled_ip>
         gateway: <network_gateway>
         match: eth0
services_include:
  open-vm-tools: true
  state:
    autoformat:
    - /dev/sda
    - /dev/vda
    dev: LABEL=RANCHER_STATE
    wait: true
...

Other details that may be helpful: The rancher/rancher container already has the internal CAs added, so the issue is only on the cluster deploy side.

Using a ‘blank’ cloud-config produces the same effect; which is making me think this is expected behavior? However, I haven’t seen anything in Rancher, RKE, or RancherOS documentation that specifies this (maybe I missed it).

I’ve also attempted sending a full ‘vsphereCloudProvider’ config, but that also failed. I’m not sure if this was an issue with the config itself or perhaps the names of the objects (there are spaces in our VSphere object names). For example:

cloud_provider:
  name: vsphere
  vsphereCloudProvider:
    virtual_center:
      <vcenter_node>:
        user: vsphere.local\<test_user>
        password: <password>
        datacenters: CO Datacenter
    workspace:
      server: <vcenter_node>
      folder: rancher-kubernetes
      default-datastore: DATASTORE 01
      datacenter: CO Datacenter
      resourcepool-path: /CO Datacenter/host/Cluster 01/Resources/SERVICE GROUP
    disk:
      scsicontrollertype: pvscsi
    network:
      public-network: DvPG Guest <vlan#> xxx.xxx.230.160%2f27

Specifying the network manually also seems to fail. e.g:

#cloud-config
ssh_authorized_keys:
  - ssh-rsa AAAA ... E=

rancher:
  docker:
    selinux_enabled: true
    registry_mirror: "https://<server>:5000"
  system_docker:
    selinux_enabled: true
    registry_mirror: "https://<server>:5000"
  network:
    interfaces:
      eth0:
        address: xxx.xxx.230.182/27
        gateway: xxx.xxx.230.161
        mtu: 1500
        dhcp: false
    dns:
      override: true
      nameservers:
        - xxx.xxx.101.9
        - xxx.xxx.101.11
      search:
        - <domain>
    write_files:
      - path: /opt/rancher/bin/start.sh
        owner: root
        permissions: "0755"
        content: |+
          #!/bin/sh
          cat << _EOF_ >> /etc/ssl/certs/ca-certificates.crt
          -----BEGIN CERTIFICATE-----
          ...
          <intermediate>
          ...
          -----END CERTIFICATE-----
          -----BEGIN CERTIFICATE-----
          ...
          <ca>
          ...
          -----END CERTIFICATE-----
          _EOF_

Environment information

  • Rancher version (rancher/rancher/rancher/server image tag or shown bottom left in the UI): rancher/rancher:v2.2.0-rc7 I’ve also tried rc4, and rc6.
  • Installation option (single install/HA): single install

Cluster information

  • Cluster type (Hosted/Infrastructure Provider/Custom/Imported): Infrastructure Provider - VCenter/VSphere 6.5

  • Machine type (cloud/VM/metal) and specifications (CPU/memory): VM - 2vcpu, 4GB RAM

  • Kubernetes version (use kubectl version): n/a (cluster doesn’t finish deploying)

  • Docker version (use docker version):

Client:
 Version:           18.09.3
 API version:       1.39
 Go version:        go1.10.8
 Git commit:        774a1f4
 Built:             Thu Feb 28 06:33:21 2019
 OS/Arch:           linux/amd64
 Experimental:      false

Server: Docker Engine - Community
 Engine:
  Version:          18.09.3
  API version:      1.39 (minimum version 1.12)
  Go version:       go1.10.8
  Git commit:       774a1f4
  Built:            Thu Feb 28 06:02:24 2019
  OS/Arch:          linux/amd64
  Experimental:     false

Is the expected behavior to only use cloud-init or only network protocol profiles? Is it not possible to use both?

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Reactions: 5
  • Comments: 18

Most upvoted comments

I’ve hit that exact issue, too. And yes, it also breaks when i include my cloud-config as base64 encoded string via the guestinfo settings. Then my network profile never gets applied.