kops: Flatcar doesn't boot on OpenStack

/kind bug

1. What kops version are you running? The command kops version, will display this information. Client version: 1.26.3

2. What Kubernetes version are you running? kubectl version will print the version if a cluster is running or provide the Kubernetes version specified as a kops flag. Client Version: version.Info{Major:“1”, Minor:“26”, GitVersion:“v1.26.3”, GitCommit:“9e644106593f3f4aa98f8a84b23db5fa378900bd”, GitTreeState:“clean”, BuildDate:“2023-03-15T13:33:11Z”, GoVersion:“go1.19.7”, Compiler:“gc”, Platform:“darwin/arm64”}

Server Version: version.Info{Major:“1”, Minor:“25”, GitVersion:“v1.25.9”, GitCommit:“a1a87a0a2bcd605820920c6b0e618a8ab7d117d4”, GitTreeState:“clean”, BuildDate:“2023-04-12T12:08:36Z”, GoVersion:“go1.19.8”, Compiler:“gc”, Platform:“linux/amd64”}

3. What cloud provider are you using? OpenStack

4. What commands did you run? What is the simplest way to reproduce this issue?

kops create cluster \
          --cloud openstack \
          --name flat-test.k8s.local \
          --state s3://kops-poc \
          --zones az1 \
          --master-zones az1 \
          --network-cidr 10.10.0.0/16 \
          --image "Flatcar Container Linux 3510.2.0" \
          --master-count=3 \
          --node-count=3 \
          --node-size 3 \
          --master-size SCS-8V:8:100 \
          --etcd-storage-type __DEFAULT__ \
          --api-loadbalancer-type public \
          --topology private \
          --ssh-public-key /tmp/id_rsa.pub \
          --networking calico \
          --os-ext-net ext01 \
          --os-octavia=true \
          --os-octavia-provider="amphora"

kops update cluster --name flat-test.k8s.local --yes --admin
kops validate cluster --wait 15m --name flat-test.k8s.local

-> Timeout

5. What happened after the commands executed? Validation of the cluster never succeeds as systemd bootup of instances fails. A look at the console of the instances reveals that flatcars ignition-fetch.service fails to start:

error at line 1 col 2: invalid character 'C' looking for beginning of value

6. What did you expect to happen? Flatcar boots up normally.

7. Please provide your cluster manifest. Execute kops get --name my.example.com -o yaml to display your cluster manifest. You may want to remove your cluster name and other sensitive information.

apiVersion: kops.k8s.io/v1alpha2
kind: Cluster
metadata:
  creationTimestamp: null
  generation: 1
  name: flat-test.k8s.local
spec:
  api:
    loadBalancer:
      type: Public
  authorization:
    rbac: {}
  channel: stable
  cloudConfig:
    openstack:
      blockStorage:
        bs-version: v3
        ignore-volume-az: false
      loadbalancer:
        floatingNetwork: ext01
        floatingNetworkID: ce897d51-94d9-4d00-bff6-bf7589a65993
        method: ROUND_ROBIN
        provider: amphora
        useOctavia: true
      monitor:
        delay: 1m
        maxRetries: 3
        timeout: 30s
      router:
        externalNetwork: ext01
  cloudProvider: openstack
  configBase: s3://kops-poc/flat-test.k8s.local
  etcdClusters:
  - cpuRequest: 200m
    etcdMembers:
    - instanceGroup: control-plane-az1-1
      name: etcd-1
      volumeType: __DEFAULT__
    - instanceGroup: control-plane-az1-2
      name: etcd-2
      volumeType: __DEFAULT__
    - instanceGroup: control-plane-az1-3
      name: etcd-3
      volumeType: __DEFAULT__
    manager:
      env:
      - name: ETCD_LISTEN_METRICS_URLS
        value: http://0.0.0.0:8081
      - name: ETCD_METRICS
        value: basic
    memoryRequest: 100Mi
    name: main
  - cpuRequest: 100m
    etcdMembers:
    - instanceGroup: control-plane-az1-1
      name: etcd-1
      volumeType: __DEFAULT__
    - instanceGroup: control-plane-az1-2
      name: etcd-2
      volumeType: __DEFAULT__
    - instanceGroup: control-plane-az1-3
      name: etcd-3
      volumeType: __DEFAULT__
    manager:
      env:
      - name: ETCD_LISTEN_METRICS_URLS
        value: http://0.0.0.0:8082
      - name: ETCD_METRICS
        value: basic
    memoryRequest: 100Mi
    name: events
  iam:
    allowContainerRegistry: true
    legacy: false
  kubelet:
    anonymousAuth: false
  kubernetesApiAccess:
  - 0.0.0.0/0
  kubernetesVersion: 1.25.9
  masterPublicName: api.flat-test.k8s.local
  networkCIDR: 10.10.0.0/16
  networking:
    calico: {}
  nodePortAccess:
  - 10.10.0.0/16
  nonMasqueradeCIDR: 100.64.0.0/10
  sshAccess:
  - 0.0.0.0/0
  subnets:
  - cidr: 10.10.32.0/19
    name: az1
    type: Private
    zone: az1
  - cidr: 10.10.0.0/22
    name: utility-az1
    type: Private
    zone: az1
  topology:
    dns:
      type: Public
    masters: private
    nodes: private

---

apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: "2023-05-09T07:07:32Z"
  generation: 1
  labels:
    kops.k8s.io/cluster: flat-test.k8s.local
  name: control-plane-az1-1
spec:
  image: Flatcar Container Linux 3510.2.0
  machineType: SCS-8V:8:100
  maxSize: 1
  minSize: 1
  nodeLabels:
    kops.k8s.io/instancegroup: control-plane-az1-1
  role: Master
  subnets:
  - az1

---

apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: "2023-05-09T07:07:32Z"
  generation: 1
  labels:
    kops.k8s.io/cluster: flat-test.k8s.local
  name: control-plane-az1-2
spec:
  image: Flatcar Container Linux 3510.2.0
  machineType: SCS-8V:8:100
  maxSize: 1
  minSize: 1
  nodeLabels:
    kops.k8s.io/instancegroup: control-plane-az1-2
  role: Master
  subnets:
  - az1

---

apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: "2023-05-09T07:07:32Z"
  generation: 1
  labels:
    kops.k8s.io/cluster: flat-test.k8s.local
  name: control-plane-az1-3
spec:
  image: Flatcar Container Linux 3510.2.0
  machineType: SCS-8V:8:100
  maxSize: 1
  minSize: 1
  nodeLabels:
    kops.k8s.io/instancegroup: control-plane-az1-3
  role: Master
  subnets:
  - az1

---

apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: "2023-05-09T07:07:32Z"
  generation: 1
  labels:
    kops.k8s.io/cluster: flat-test.k8s.local
  name: nodes-az1
spec:
  image: Flatcar Container Linux 3510.2.0
  machineType: SCS-16V:32:100
  maxSize: 3
  minSize: 3
  nodeLabels:
    kops.k8s.io/instancegroup: nodes-az1
  packages:
  - nfs-common
  role: Node
  subnets:
  - az1

8. Anything else do we need to know? I compared the user data generated by kOps and other tools (Gardener) and they appear to be using a completely diffrent format. kOps:

Content-Type: multipart/mixed; boundary="MIMEBOUNDARY"
MIME-Version: 1.0

--MIMEBOUNDARY
Content-Disposition: attachment; filename="nodeup.sh"
Content-Transfer-Encoding: 7bit
Content-Type: text/x-shellscript
Mime-Version: 1.0

#!/bin/bash
set -o errexit
set -o nounset
set -o pipefail

NODEUP_URL_AMD64=https://artifacts.k8s.io/binaries/kops/1.26.3/linux/amd64/nodeup,https://github.com/kubernetes/kops/releases/download/v1.26.3/nodeup-linux-amd64
NODEUP_HASH_AMD64=973ba5b414c8c702a1c372d4c37f274f44315b28c52fb81ecfd19b68c98461de
NODEUP_URL_ARM64=https://artifacts.k8s.io/binaries/kops/1.26.3/linux/arm64/nodeup,https://github.com/kubernetes/kops/releases/download/v1.26.3/nodeup-linux-arm64
NODEUP_HASH_ARM64=cf36d2300445fc53052348e29f57749444e8d03b36fa4596208275e6c300b720

export OS_APPLICATION_CREDENTIAL_ID='REDACTED'
export OS_APPLICATION_CREDENTIAL_SECRET='REDACTED'
export OS_AUTH_URL='https://intern1.api.pco.get-cloud.io:5000'
export OS_DOMAIN_ID=''
export OS_DOMAIN_NAME=''
export OS_PROJECT_DOMAIN_ID=''
export OS_PROJECT_DOMAIN_NAME=''
export OS_PROJECT_ID=''
export OS_PROJECT_NAME=''
export OS_REGION_NAME='intern1'
export OS_TENANT_ID=''
export OS_TENANT_NAME=''
export S3_ACCESS_KEY_ID=REDACTED
export S3_ENDPOINT=https://de-2.s3.psmanaged.com
export S3_REGION=
export S3_SECRET_ACCESS_KEY=REDACTED




sysctl -w net.core.rmem_max=16777216 || true
sysctl -w net.core.wmem_max=16777216 || true
sysctl -w net.ipv4.tcp_rmem='4096 87380 16777216' || true
sysctl -w net.ipv4.tcp_wmem='4096 87380 16777216' || true


function ensure-install-dir() {
  INSTALL_DIR="/opt/kops"
  # On ContainerOS, we install under /var/lib/toolbox; /opt is ro and noexec
  if [[ -d /var/lib/toolbox ]]; then
    INSTALL_DIR="/var/lib/toolbox/kops"
  fi
  mkdir -p ${INSTALL_DIR}/bin
  mkdir -p ${INSTALL_DIR}/conf
  cd ${INSTALL_DIR}
}

# Retry a download until we get it. args: name, sha, urls
download-or-bust() {
  local -r file="$1"
  local -r hash="$2"
  local -r urls=( $(split-commas "$3") )

  if [[ -f "${file}" ]]; then
    if ! validate-hash "${file}" "${hash}"; then
      rm -f "${file}"
    else
      return 0
    fi
  fi

  while true; do
    for url in "${urls[@]}"; do
      commands=(
        "curl -f --compressed -Lo "${file}" --connect-timeout 20 --retry 6 --retry-delay 10"
        "wget --compression=auto -O "${file}" --connect-timeout=20 --tries=6 --wait=10"
        "curl -f -Lo "${file}" --connect-timeout 20 --retry 6 --retry-delay 10"
        "wget -O "${file}" --connect-timeout=20 --tries=6 --wait=10"
      )
      for cmd in "${commands[@]}"; do
        echo "Attempting download with: ${cmd} {url}"
        if ! (${cmd} "${url}"); then
          echo "== Download failed with ${cmd} =="
          continue
        fi
        if ! validate-hash "${file}" "${hash}"; then
          echo "== Hash validation of ${url} failed. Retrying. =="
          rm -f "${file}"
        else
          echo "== Downloaded ${url} (SHA256 = ${hash}) =="
          return 0
        fi
      done
    done

    echo "All downloads failed; sleeping before retrying"
    sleep 60
  done
}

validate-hash() {
  local -r file="$1"
  local -r expected="$2"
  local actual

  actual=$(sha256sum ${file} | awk '{ print $1 }') || true
  if [[ "${actual}" != "${expected}" ]]; then
    echo "== ${file} corrupted, hash ${actual} doesn't match expected ${expected} =="
    return 1
  fi
}

function split-commas() {
  echo $1 | tr "," "\n"
}

function download-release() {
  case "$(uname -m)" in
  x86_64*|i?86_64*|amd64*)
    NODEUP_URL="${NODEUP_URL_AMD64}"
    NODEUP_HASH="${NODEUP_HASH_AMD64}"
    ;;
  aarch64*|arm64*)
    NODEUP_URL="${NODEUP_URL_ARM64}"
    NODEUP_HASH="${NODEUP_HASH_ARM64}"
    ;;
  *)
    echo "Unsupported host arch: $(uname -m)" >&2
    exit 1
    ;;
  esac

  cd ${INSTALL_DIR}/bin
  download-or-bust nodeup "${NODEUP_HASH}" "${NODEUP_URL}"

  chmod +x nodeup

  echo "Running nodeup"
  # We can't run in the foreground because of https://github.com/docker/docker/issues/23793
  ( cd ${INSTALL_DIR}/bin; ./nodeup --install-systemd-unit --conf=${INSTALL_DIR}/conf/kube_env.yaml --v=8  )
}

####################################################################################

/bin/systemd-machine-id-setup || echo "failed to set up ensure machine-id configured"

echo "== nodeup node config starting =="
ensure-install-dir

cat > conf/cluster_spec.yaml << '__EOF_CLUSTER_SPEC'
cloudConfig:
  manageStorageClasses: true
containerRuntime: containerd
containerd:
  logLevel: info
  runc:
    version: 1.1.4
  version: 1.6.18
docker:
  skipInstall: true
encryptionConfig: null
etcdClusters:
  events:
    cpuRequest: 100m
    manager:
      env:
      - name: ETCD_LISTEN_METRICS_URLS
        value: http://0.0.0.0:8082
      - name: ETCD_METRICS
        value: basic
    memoryRequest: 100Mi
    version: 3.5.7
  main:
    cpuRequest: 200m
    manager:
      env:
      - name: ETCD_LISTEN_METRICS_URLS
        value: http://0.0.0.0:8081
      - name: ETCD_METRICS
        value: basic
    memoryRequest: 100Mi
    version: 3.5.7
kubeAPIServer:
  allowPrivileged: true
  anonymousAuth: false
  apiAudiences:
  - kubernetes.svc.default
  apiServerCount: 3
  authorizationMode: Node,RBAC
  bindAddress: 0.0.0.0
  cloudProvider: external
  enableAdmissionPlugins:
  - NamespaceLifecycle
  - LimitRanger
  - ServiceAccount
  - DefaultStorageClass
  - DefaultTolerationSeconds
  - MutatingAdmissionWebhook
  - ValidatingAdmissionWebhook
  - NodeRestriction
  - ResourceQuota
  etcdServers:
  - https://127.0.0.1:4001
  etcdServersOverrides:
  - /events#https://127.0.0.1:4002
  image: registry.k8s.io/kube-apiserver:v1.25.9@sha256:c8518e64657ff2b04501099d4d8d9dd402237df86a12f7cc09bf72c080fd9608
  kubeletPreferredAddressTypes:
  - InternalIP
  - Hostname
  - ExternalIP
  logLevel: 2
  requestheaderAllowedNames:
  - aggregator
  requestheaderExtraHeaderPrefixes:
  - X-Remote-Extra-
  requestheaderGroupHeaders:
  - X-Remote-Group
  requestheaderUsernameHeaders:
  - X-Remote-User
  securePort: 443
  serviceAccountIssuer: https://api.internal.flat-test.k8s.local
  serviceAccountJWKSURI: https://api.internal.flat-test.k8s.local/openid/v1/jwks
  serviceClusterIPRange: 100.64.0.0/13
  storageBackend: etcd3
kubeControllerManager:
  allocateNodeCIDRs: true
  attachDetachReconcileSyncPeriod: 1m0s
  cloudProvider: external
  clusterCIDR: 100.96.0.0/11
  clusterName: flat-test.k8s.local
  configureCloudRoutes: false
  image: registry.k8s.io/kube-controller-manager:v1.25.9@sha256:23a76a71f2b39189680def6edc30787e40a2fe66e29a7272a56b426d9b116229
  leaderElection:
    leaderElect: true
  logLevel: 2
  useServiceAccountCredentials: true
kubeProxy:
  clusterCIDR: 100.96.0.0/11
  cpuRequest: 100m
  image: registry.k8s.io/kube-proxy:v1.25.9@sha256:42fe09174a5eb6b8bace3036fe253ed7f06be31d9106211dcc4a09f9fa99c79a
  logLevel: 2
kubeScheduler:
  image: registry.k8s.io/kube-scheduler:v1.25.9@sha256:19712fa46b8277aafd416b75a3a3d90e133f44b8a4dae08e425279085dc29f7e
  leaderElection:
    leaderElect: true
  logLevel: 2
kubelet:
  anonymousAuth: false
  cgroupDriver: systemd
  cgroupRoot: /
  cloudProvider: external
  clusterDNS: 100.64.0.10
  clusterDomain: cluster.local
  enableDebuggingHandlers: true
  evictionHard: memory.available<100Mi,nodefs.available<10%,nodefs.inodesFree<5%,imagefs.available<10%,imagefs.inodesFree<5%
  kubeconfigPath: /var/lib/kubelet/kubeconfig
  logLevel: 2
  podInfraContainerImage: registry.k8s.io/pause:3.6@sha256:3d380ca8864549e74af4b29c10f9cb0956236dfb01c40ca076fb6c37253234db
  podManifestPath: /etc/kubernetes/manifests
  protectKernelDefaults: true
  registerSchedulable: true
  shutdownGracePeriod: 30s
  shutdownGracePeriodCriticalPods: 10s
  volumePluginDirectory: /var/lib/kubelet/volumeplugins/
masterKubelet:
  anonymousAuth: false
  cgroupDriver: systemd
  cgroupRoot: /
  cloudProvider: external
  clusterDNS: 100.64.0.10
  clusterDomain: cluster.local
  enableDebuggingHandlers: true
  evictionHard: memory.available<100Mi,nodefs.available<10%,nodefs.inodesFree<5%,imagefs.available<10%,imagefs.inodesFree<5%
  kubeconfigPath: /var/lib/kubelet/kubeconfig
  logLevel: 2
  podInfraContainerImage: registry.k8s.io/pause:3.6@sha256:3d380ca8864549e74af4b29c10f9cb0956236dfb01c40ca076fb6c37253234db
  podManifestPath: /etc/kubernetes/manifests
  protectKernelDefaults: true
  registerSchedulable: true
  shutdownGracePeriod: 30s
  shutdownGracePeriodCriticalPods: 10s
  volumePluginDirectory: /var/lib/kubelet/volumeplugins/

__EOF_CLUSTER_SPEC

cat > conf/kube_env.yaml << '__EOF_KUBE_ENV'
CloudProvider: openstack
ConfigBase: s3://kops-poc/flat-test.k8s.local
InstanceGroupName: control-plane-az1-1
InstanceGroupRole: ControlPlane
NodeupConfigHash: 4Zb8f/LBOyZeX/RqQIDBgk8UmkTUd+ANhlam8okLPgU=

__EOF_KUBE_ENV

download-release
echo "== nodeup node config done =="

Gardener:

#cloud-config

coreos:
  update:
    reboot_strategy: "off"
  units:
  - name: update-engine.service
    mask: true
    command: stop
  - name: locksmithd.service
    mask: true
    command: stop
  - name: cloud-config-downloader.service
    enable: true
    content: |-
      [Unit]
      Description=Downloads the actual cloud config from the Shoot API server and executes it
      After=docker.service docker.socket
      Wants=docker.socket
      [Service]
      Restart=always
      RestartSec=30
      RuntimeMaxSec=1200
      EnvironmentFile=/etc/environment
      ExecStart=/var/lib/cloud-config-downloader/download-cloud-config.sh
      [Install]
      WantedBy=multi-user.target
    command: start
  - name: run-command.service
    enable: true
    content: |
      [Unit]
      Description=Oneshot unit used to run a script on node start-up.
      Before=containerd.service kubelet.service
      [Service]
      Type=oneshot
      EnvironmentFile=/etc/environment
      ExecStart=/opt/bin/run-command.sh
      [Install]
      WantedBy=containerd.service kubelet.service
    command: start
  - name: enable-cgroupsv2.service
    enable: true
    content: |
      [Unit]
      Description=Oneshot unit used to patch the kubelet config for cgroupsv2.
      Before=containerd.service kubelet.service
      [Service]
      Type=oneshot
      EnvironmentFile=/etc/environment
      ExecStart=/opt/bin/configure-cgroupsv2.sh
      [Install]
      WantedBy=containerd.service kubelet.service
    command: start
write_files:
- encoding: b64
  content: REDACTED
  path: /var/lib/cloud-config-downloader/credentials/server
  permissions: "644"
- encoding: b64
  content: REDACTED
  path: /var/lib/cloud-config-downloader/credentials/ca.crt
  permissions: "644"
- encoding: b64
  content: REDACTED
  path: /var/lib/cloud-config-downloader/download-cloud-config.sh
  permissions: "744"
- content: REDACTED
  path: /var/lib/cloud-config-downloader/credentials/bootstrap-token
  permissions: "644"
- content: |
    [Service]
    SyslogIdentifier=containerd
    ExecStart=
    ExecStart=/bin/bash -c 'PATH="/run/torcx/unpack/docker/bin:$PATH" /run/torcx/unpack/docker/bin/containerd --config /etc/containerd/config.toml'
  path: /etc/systemd/system/containerd.service.d/11-exec_config.conf
  permissions: "0644"
- content: |
    #!/bin/bash

    CONTAINERD_CONFIG=/etc/containerd/config.toml

    ALTERNATE_LOGROTATE_PATH="/usr/bin/logrotate"

    # initialize default containerd config if does not exist
    if [ ! -s "$CONTAINERD_CONFIG" ]; then
        mkdir -p /etc/containerd/
        /run/torcx/unpack/docker/bin/containerd config default > "$CONTAINERD_CONFIG"
        chmod 0644 "$CONTAINERD_CONFIG"
    fi

    # if cgroups v2 are used, patch containerd configuration to use systemd cgroup driver
    if [[ -e /sys/fs/cgroup/cgroup.controllers ]]; then
        sed -i "s/SystemdCgroup *= *false/SystemdCgroup = true/" "$CONTAINERD_CONFIG"
    fi

    # provide kubelet with access to the containerd binaries in /run/torcx/unpack/docker/bin
    if [ ! -s /etc/systemd/system/kubelet.service.d/environment.conf ]; then
        mkdir -p /etc/systemd/system/kubelet.service.d/
        cat <<EOF | tee /etc/systemd/system/kubelet.service.d/environment.conf
    [Service]
    Environment="PATH=/run/torcx/unpack/docker/bin:$PATH"
    EOF
        chmod 0644 /etc/systemd/system/kubelet.service.d/environment.conf
        systemctl daemon-reload
    fi

    # some flatcar versions have logrotate at /usr/bin instead of /usr/sbin
    if [ -f "$ALTERNATE_LOGROTATE_PATH" ]; then
        sed -i "s;/usr/sbin/logrotate;$ALTERNATE_LOGROTATE_PATH;" /etc/systemd/system/containerd-logrotate.service
        systemctl daemon-reload
    fi
  path: /opt/bin/run-command.sh
  permissions: "0755"
- content: |
    #!/bin/bash

    KUBELET_CONFIG=/var/lib/kubelet/config/kubelet

    if [[ -e /sys/fs/cgroup/cgroup.controllers ]]; then
            echo "CGroups V2 are used!"
            echo "=> Patch kubelet to use systemd as cgroup driver"
            sed -i "s/cgroupDriver: cgroupfs/cgroupDriver: systemd/" "$KUBELET_CONFIG"
    else
            echo "No CGroups V2 used by system"
    fi
  path: /opt/bin/configure-cgroupsv2.sh
  permissions: "0755"

About this issue

  • Original URL
  • State: open
  • Created a year ago
  • Reactions: 1
  • Comments: 23 (12 by maintainers)

Most upvoted comments

I tested the newest Flatcar Alpha Image and kOps bootstrapped the cluster succesfully. 👍

Excellent. Thanks a lot @gabriel-samfira!

I think we can have both 2 & 3.

The short term solution would be to have MIME multipart support in coreos-cloudinit, but long term we will need to add ignition support to kops, as that is the idiomatic (and in some cases, the only) way to configure distros that use ignition.

I will open a separate issue for adding ignition support in kops.

The immediate issue reported here should be fixed (sans the additionalUserData option) once a stable release of flatcar is cut with the above mentioned fix. @Wieneo could you test out the image I linked to and confirm it works for you?