kops: Flatcar doesn't boot on OpenStack
/kind bug
1. What kops version are you running? The command kops version, will display
this information.
Client version: 1.26.3
2. What Kubernetes version are you running? kubectl version will print the
version if a cluster is running or provide the Kubernetes version specified as
a kops flag.
Client Version: version.Info{Major:“1”, Minor:“26”, GitVersion:“v1.26.3”, GitCommit:“9e644106593f3f4aa98f8a84b23db5fa378900bd”, GitTreeState:“clean”, BuildDate:“2023-03-15T13:33:11Z”, GoVersion:“go1.19.7”, Compiler:“gc”, Platform:“darwin/arm64”}
Server Version: version.Info{Major:“1”, Minor:“25”, GitVersion:“v1.25.9”, GitCommit:“a1a87a0a2bcd605820920c6b0e618a8ab7d117d4”, GitTreeState:“clean”, BuildDate:“2023-04-12T12:08:36Z”, GoVersion:“go1.19.8”, Compiler:“gc”, Platform:“linux/amd64”}
3. What cloud provider are you using? OpenStack
4. What commands did you run? What is the simplest way to reproduce this issue?
kops create cluster \
--cloud openstack \
--name flat-test.k8s.local \
--state s3://kops-poc \
--zones az1 \
--master-zones az1 \
--network-cidr 10.10.0.0/16 \
--image "Flatcar Container Linux 3510.2.0" \
--master-count=3 \
--node-count=3 \
--node-size 3 \
--master-size SCS-8V:8:100 \
--etcd-storage-type __DEFAULT__ \
--api-loadbalancer-type public \
--topology private \
--ssh-public-key /tmp/id_rsa.pub \
--networking calico \
--os-ext-net ext01 \
--os-octavia=true \
--os-octavia-provider="amphora"
kops update cluster --name flat-test.k8s.local --yes --admin
kops validate cluster --wait 15m --name flat-test.k8s.local
-> Timeout
5. What happened after the commands executed? Validation of the cluster never succeeds as systemd bootup of instances fails. A look at the console of the instances reveals that flatcars ignition-fetch.service fails to start:
error at line 1 col 2: invalid character 'C' looking for beginning of value
6. What did you expect to happen? Flatcar boots up normally.
7. Please provide your cluster manifest. Execute
kops get --name my.example.com -o yaml to display your cluster manifest.
You may want to remove your cluster name and other sensitive information.
apiVersion: kops.k8s.io/v1alpha2
kind: Cluster
metadata:
creationTimestamp: null
generation: 1
name: flat-test.k8s.local
spec:
api:
loadBalancer:
type: Public
authorization:
rbac: {}
channel: stable
cloudConfig:
openstack:
blockStorage:
bs-version: v3
ignore-volume-az: false
loadbalancer:
floatingNetwork: ext01
floatingNetworkID: ce897d51-94d9-4d00-bff6-bf7589a65993
method: ROUND_ROBIN
provider: amphora
useOctavia: true
monitor:
delay: 1m
maxRetries: 3
timeout: 30s
router:
externalNetwork: ext01
cloudProvider: openstack
configBase: s3://kops-poc/flat-test.k8s.local
etcdClusters:
- cpuRequest: 200m
etcdMembers:
- instanceGroup: control-plane-az1-1
name: etcd-1
volumeType: __DEFAULT__
- instanceGroup: control-plane-az1-2
name: etcd-2
volumeType: __DEFAULT__
- instanceGroup: control-plane-az1-3
name: etcd-3
volumeType: __DEFAULT__
manager:
env:
- name: ETCD_LISTEN_METRICS_URLS
value: http://0.0.0.0:8081
- name: ETCD_METRICS
value: basic
memoryRequest: 100Mi
name: main
- cpuRequest: 100m
etcdMembers:
- instanceGroup: control-plane-az1-1
name: etcd-1
volumeType: __DEFAULT__
- instanceGroup: control-plane-az1-2
name: etcd-2
volumeType: __DEFAULT__
- instanceGroup: control-plane-az1-3
name: etcd-3
volumeType: __DEFAULT__
manager:
env:
- name: ETCD_LISTEN_METRICS_URLS
value: http://0.0.0.0:8082
- name: ETCD_METRICS
value: basic
memoryRequest: 100Mi
name: events
iam:
allowContainerRegistry: true
legacy: false
kubelet:
anonymousAuth: false
kubernetesApiAccess:
- 0.0.0.0/0
kubernetesVersion: 1.25.9
masterPublicName: api.flat-test.k8s.local
networkCIDR: 10.10.0.0/16
networking:
calico: {}
nodePortAccess:
- 10.10.0.0/16
nonMasqueradeCIDR: 100.64.0.0/10
sshAccess:
- 0.0.0.0/0
subnets:
- cidr: 10.10.32.0/19
name: az1
type: Private
zone: az1
- cidr: 10.10.0.0/22
name: utility-az1
type: Private
zone: az1
topology:
dns:
type: Public
masters: private
nodes: private
---
apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
creationTimestamp: "2023-05-09T07:07:32Z"
generation: 1
labels:
kops.k8s.io/cluster: flat-test.k8s.local
name: control-plane-az1-1
spec:
image: Flatcar Container Linux 3510.2.0
machineType: SCS-8V:8:100
maxSize: 1
minSize: 1
nodeLabels:
kops.k8s.io/instancegroup: control-plane-az1-1
role: Master
subnets:
- az1
---
apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
creationTimestamp: "2023-05-09T07:07:32Z"
generation: 1
labels:
kops.k8s.io/cluster: flat-test.k8s.local
name: control-plane-az1-2
spec:
image: Flatcar Container Linux 3510.2.0
machineType: SCS-8V:8:100
maxSize: 1
minSize: 1
nodeLabels:
kops.k8s.io/instancegroup: control-plane-az1-2
role: Master
subnets:
- az1
---
apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
creationTimestamp: "2023-05-09T07:07:32Z"
generation: 1
labels:
kops.k8s.io/cluster: flat-test.k8s.local
name: control-plane-az1-3
spec:
image: Flatcar Container Linux 3510.2.0
machineType: SCS-8V:8:100
maxSize: 1
minSize: 1
nodeLabels:
kops.k8s.io/instancegroup: control-plane-az1-3
role: Master
subnets:
- az1
---
apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
creationTimestamp: "2023-05-09T07:07:32Z"
generation: 1
labels:
kops.k8s.io/cluster: flat-test.k8s.local
name: nodes-az1
spec:
image: Flatcar Container Linux 3510.2.0
machineType: SCS-16V:32:100
maxSize: 3
minSize: 3
nodeLabels:
kops.k8s.io/instancegroup: nodes-az1
packages:
- nfs-common
role: Node
subnets:
- az1
8. Anything else do we need to know? I compared the user data generated by kOps and other tools (Gardener) and they appear to be using a completely diffrent format. kOps:
Content-Type: multipart/mixed; boundary="MIMEBOUNDARY"
MIME-Version: 1.0
--MIMEBOUNDARY
Content-Disposition: attachment; filename="nodeup.sh"
Content-Transfer-Encoding: 7bit
Content-Type: text/x-shellscript
Mime-Version: 1.0
#!/bin/bash
set -o errexit
set -o nounset
set -o pipefail
NODEUP_URL_AMD64=https://artifacts.k8s.io/binaries/kops/1.26.3/linux/amd64/nodeup,https://github.com/kubernetes/kops/releases/download/v1.26.3/nodeup-linux-amd64
NODEUP_HASH_AMD64=973ba5b414c8c702a1c372d4c37f274f44315b28c52fb81ecfd19b68c98461de
NODEUP_URL_ARM64=https://artifacts.k8s.io/binaries/kops/1.26.3/linux/arm64/nodeup,https://github.com/kubernetes/kops/releases/download/v1.26.3/nodeup-linux-arm64
NODEUP_HASH_ARM64=cf36d2300445fc53052348e29f57749444e8d03b36fa4596208275e6c300b720
export OS_APPLICATION_CREDENTIAL_ID='REDACTED'
export OS_APPLICATION_CREDENTIAL_SECRET='REDACTED'
export OS_AUTH_URL='https://intern1.api.pco.get-cloud.io:5000'
export OS_DOMAIN_ID=''
export OS_DOMAIN_NAME=''
export OS_PROJECT_DOMAIN_ID=''
export OS_PROJECT_DOMAIN_NAME=''
export OS_PROJECT_ID=''
export OS_PROJECT_NAME=''
export OS_REGION_NAME='intern1'
export OS_TENANT_ID=''
export OS_TENANT_NAME=''
export S3_ACCESS_KEY_ID=REDACTED
export S3_ENDPOINT=https://de-2.s3.psmanaged.com
export S3_REGION=
export S3_SECRET_ACCESS_KEY=REDACTED
sysctl -w net.core.rmem_max=16777216 || true
sysctl -w net.core.wmem_max=16777216 || true
sysctl -w net.ipv4.tcp_rmem='4096 87380 16777216' || true
sysctl -w net.ipv4.tcp_wmem='4096 87380 16777216' || true
function ensure-install-dir() {
INSTALL_DIR="/opt/kops"
# On ContainerOS, we install under /var/lib/toolbox; /opt is ro and noexec
if [[ -d /var/lib/toolbox ]]; then
INSTALL_DIR="/var/lib/toolbox/kops"
fi
mkdir -p ${INSTALL_DIR}/bin
mkdir -p ${INSTALL_DIR}/conf
cd ${INSTALL_DIR}
}
# Retry a download until we get it. args: name, sha, urls
download-or-bust() {
local -r file="$1"
local -r hash="$2"
local -r urls=( $(split-commas "$3") )
if [[ -f "${file}" ]]; then
if ! validate-hash "${file}" "${hash}"; then
rm -f "${file}"
else
return 0
fi
fi
while true; do
for url in "${urls[@]}"; do
commands=(
"curl -f --compressed -Lo "${file}" --connect-timeout 20 --retry 6 --retry-delay 10"
"wget --compression=auto -O "${file}" --connect-timeout=20 --tries=6 --wait=10"
"curl -f -Lo "${file}" --connect-timeout 20 --retry 6 --retry-delay 10"
"wget -O "${file}" --connect-timeout=20 --tries=6 --wait=10"
)
for cmd in "${commands[@]}"; do
echo "Attempting download with: ${cmd} {url}"
if ! (${cmd} "${url}"); then
echo "== Download failed with ${cmd} =="
continue
fi
if ! validate-hash "${file}" "${hash}"; then
echo "== Hash validation of ${url} failed. Retrying. =="
rm -f "${file}"
else
echo "== Downloaded ${url} (SHA256 = ${hash}) =="
return 0
fi
done
done
echo "All downloads failed; sleeping before retrying"
sleep 60
done
}
validate-hash() {
local -r file="$1"
local -r expected="$2"
local actual
actual=$(sha256sum ${file} | awk '{ print $1 }') || true
if [[ "${actual}" != "${expected}" ]]; then
echo "== ${file} corrupted, hash ${actual} doesn't match expected ${expected} =="
return 1
fi
}
function split-commas() {
echo $1 | tr "," "\n"
}
function download-release() {
case "$(uname -m)" in
x86_64*|i?86_64*|amd64*)
NODEUP_URL="${NODEUP_URL_AMD64}"
NODEUP_HASH="${NODEUP_HASH_AMD64}"
;;
aarch64*|arm64*)
NODEUP_URL="${NODEUP_URL_ARM64}"
NODEUP_HASH="${NODEUP_HASH_ARM64}"
;;
*)
echo "Unsupported host arch: $(uname -m)" >&2
exit 1
;;
esac
cd ${INSTALL_DIR}/bin
download-or-bust nodeup "${NODEUP_HASH}" "${NODEUP_URL}"
chmod +x nodeup
echo "Running nodeup"
# We can't run in the foreground because of https://github.com/docker/docker/issues/23793
( cd ${INSTALL_DIR}/bin; ./nodeup --install-systemd-unit --conf=${INSTALL_DIR}/conf/kube_env.yaml --v=8 )
}
####################################################################################
/bin/systemd-machine-id-setup || echo "failed to set up ensure machine-id configured"
echo "== nodeup node config starting =="
ensure-install-dir
cat > conf/cluster_spec.yaml << '__EOF_CLUSTER_SPEC'
cloudConfig:
manageStorageClasses: true
containerRuntime: containerd
containerd:
logLevel: info
runc:
version: 1.1.4
version: 1.6.18
docker:
skipInstall: true
encryptionConfig: null
etcdClusters:
events:
cpuRequest: 100m
manager:
env:
- name: ETCD_LISTEN_METRICS_URLS
value: http://0.0.0.0:8082
- name: ETCD_METRICS
value: basic
memoryRequest: 100Mi
version: 3.5.7
main:
cpuRequest: 200m
manager:
env:
- name: ETCD_LISTEN_METRICS_URLS
value: http://0.0.0.0:8081
- name: ETCD_METRICS
value: basic
memoryRequest: 100Mi
version: 3.5.7
kubeAPIServer:
allowPrivileged: true
anonymousAuth: false
apiAudiences:
- kubernetes.svc.default
apiServerCount: 3
authorizationMode: Node,RBAC
bindAddress: 0.0.0.0
cloudProvider: external
enableAdmissionPlugins:
- NamespaceLifecycle
- LimitRanger
- ServiceAccount
- DefaultStorageClass
- DefaultTolerationSeconds
- MutatingAdmissionWebhook
- ValidatingAdmissionWebhook
- NodeRestriction
- ResourceQuota
etcdServers:
- https://127.0.0.1:4001
etcdServersOverrides:
- /events#https://127.0.0.1:4002
image: registry.k8s.io/kube-apiserver:v1.25.9@sha256:c8518e64657ff2b04501099d4d8d9dd402237df86a12f7cc09bf72c080fd9608
kubeletPreferredAddressTypes:
- InternalIP
- Hostname
- ExternalIP
logLevel: 2
requestheaderAllowedNames:
- aggregator
requestheaderExtraHeaderPrefixes:
- X-Remote-Extra-
requestheaderGroupHeaders:
- X-Remote-Group
requestheaderUsernameHeaders:
- X-Remote-User
securePort: 443
serviceAccountIssuer: https://api.internal.flat-test.k8s.local
serviceAccountJWKSURI: https://api.internal.flat-test.k8s.local/openid/v1/jwks
serviceClusterIPRange: 100.64.0.0/13
storageBackend: etcd3
kubeControllerManager:
allocateNodeCIDRs: true
attachDetachReconcileSyncPeriod: 1m0s
cloudProvider: external
clusterCIDR: 100.96.0.0/11
clusterName: flat-test.k8s.local
configureCloudRoutes: false
image: registry.k8s.io/kube-controller-manager:v1.25.9@sha256:23a76a71f2b39189680def6edc30787e40a2fe66e29a7272a56b426d9b116229
leaderElection:
leaderElect: true
logLevel: 2
useServiceAccountCredentials: true
kubeProxy:
clusterCIDR: 100.96.0.0/11
cpuRequest: 100m
image: registry.k8s.io/kube-proxy:v1.25.9@sha256:42fe09174a5eb6b8bace3036fe253ed7f06be31d9106211dcc4a09f9fa99c79a
logLevel: 2
kubeScheduler:
image: registry.k8s.io/kube-scheduler:v1.25.9@sha256:19712fa46b8277aafd416b75a3a3d90e133f44b8a4dae08e425279085dc29f7e
leaderElection:
leaderElect: true
logLevel: 2
kubelet:
anonymousAuth: false
cgroupDriver: systemd
cgroupRoot: /
cloudProvider: external
clusterDNS: 100.64.0.10
clusterDomain: cluster.local
enableDebuggingHandlers: true
evictionHard: memory.available<100Mi,nodefs.available<10%,nodefs.inodesFree<5%,imagefs.available<10%,imagefs.inodesFree<5%
kubeconfigPath: /var/lib/kubelet/kubeconfig
logLevel: 2
podInfraContainerImage: registry.k8s.io/pause:3.6@sha256:3d380ca8864549e74af4b29c10f9cb0956236dfb01c40ca076fb6c37253234db
podManifestPath: /etc/kubernetes/manifests
protectKernelDefaults: true
registerSchedulable: true
shutdownGracePeriod: 30s
shutdownGracePeriodCriticalPods: 10s
volumePluginDirectory: /var/lib/kubelet/volumeplugins/
masterKubelet:
anonymousAuth: false
cgroupDriver: systemd
cgroupRoot: /
cloudProvider: external
clusterDNS: 100.64.0.10
clusterDomain: cluster.local
enableDebuggingHandlers: true
evictionHard: memory.available<100Mi,nodefs.available<10%,nodefs.inodesFree<5%,imagefs.available<10%,imagefs.inodesFree<5%
kubeconfigPath: /var/lib/kubelet/kubeconfig
logLevel: 2
podInfraContainerImage: registry.k8s.io/pause:3.6@sha256:3d380ca8864549e74af4b29c10f9cb0956236dfb01c40ca076fb6c37253234db
podManifestPath: /etc/kubernetes/manifests
protectKernelDefaults: true
registerSchedulable: true
shutdownGracePeriod: 30s
shutdownGracePeriodCriticalPods: 10s
volumePluginDirectory: /var/lib/kubelet/volumeplugins/
__EOF_CLUSTER_SPEC
cat > conf/kube_env.yaml << '__EOF_KUBE_ENV'
CloudProvider: openstack
ConfigBase: s3://kops-poc/flat-test.k8s.local
InstanceGroupName: control-plane-az1-1
InstanceGroupRole: ControlPlane
NodeupConfigHash: 4Zb8f/LBOyZeX/RqQIDBgk8UmkTUd+ANhlam8okLPgU=
__EOF_KUBE_ENV
download-release
echo "== nodeup node config done =="
Gardener:
#cloud-config
coreos:
update:
reboot_strategy: "off"
units:
- name: update-engine.service
mask: true
command: stop
- name: locksmithd.service
mask: true
command: stop
- name: cloud-config-downloader.service
enable: true
content: |-
[Unit]
Description=Downloads the actual cloud config from the Shoot API server and executes it
After=docker.service docker.socket
Wants=docker.socket
[Service]
Restart=always
RestartSec=30
RuntimeMaxSec=1200
EnvironmentFile=/etc/environment
ExecStart=/var/lib/cloud-config-downloader/download-cloud-config.sh
[Install]
WantedBy=multi-user.target
command: start
- name: run-command.service
enable: true
content: |
[Unit]
Description=Oneshot unit used to run a script on node start-up.
Before=containerd.service kubelet.service
[Service]
Type=oneshot
EnvironmentFile=/etc/environment
ExecStart=/opt/bin/run-command.sh
[Install]
WantedBy=containerd.service kubelet.service
command: start
- name: enable-cgroupsv2.service
enable: true
content: |
[Unit]
Description=Oneshot unit used to patch the kubelet config for cgroupsv2.
Before=containerd.service kubelet.service
[Service]
Type=oneshot
EnvironmentFile=/etc/environment
ExecStart=/opt/bin/configure-cgroupsv2.sh
[Install]
WantedBy=containerd.service kubelet.service
command: start
write_files:
- encoding: b64
content: REDACTED
path: /var/lib/cloud-config-downloader/credentials/server
permissions: "644"
- encoding: b64
content: REDACTED
path: /var/lib/cloud-config-downloader/credentials/ca.crt
permissions: "644"
- encoding: b64
content: REDACTED
path: /var/lib/cloud-config-downloader/download-cloud-config.sh
permissions: "744"
- content: REDACTED
path: /var/lib/cloud-config-downloader/credentials/bootstrap-token
permissions: "644"
- content: |
[Service]
SyslogIdentifier=containerd
ExecStart=
ExecStart=/bin/bash -c 'PATH="/run/torcx/unpack/docker/bin:$PATH" /run/torcx/unpack/docker/bin/containerd --config /etc/containerd/config.toml'
path: /etc/systemd/system/containerd.service.d/11-exec_config.conf
permissions: "0644"
- content: |
#!/bin/bash
CONTAINERD_CONFIG=/etc/containerd/config.toml
ALTERNATE_LOGROTATE_PATH="/usr/bin/logrotate"
# initialize default containerd config if does not exist
if [ ! -s "$CONTAINERD_CONFIG" ]; then
mkdir -p /etc/containerd/
/run/torcx/unpack/docker/bin/containerd config default > "$CONTAINERD_CONFIG"
chmod 0644 "$CONTAINERD_CONFIG"
fi
# if cgroups v2 are used, patch containerd configuration to use systemd cgroup driver
if [[ -e /sys/fs/cgroup/cgroup.controllers ]]; then
sed -i "s/SystemdCgroup *= *false/SystemdCgroup = true/" "$CONTAINERD_CONFIG"
fi
# provide kubelet with access to the containerd binaries in /run/torcx/unpack/docker/bin
if [ ! -s /etc/systemd/system/kubelet.service.d/environment.conf ]; then
mkdir -p /etc/systemd/system/kubelet.service.d/
cat <<EOF | tee /etc/systemd/system/kubelet.service.d/environment.conf
[Service]
Environment="PATH=/run/torcx/unpack/docker/bin:$PATH"
EOF
chmod 0644 /etc/systemd/system/kubelet.service.d/environment.conf
systemctl daemon-reload
fi
# some flatcar versions have logrotate at /usr/bin instead of /usr/sbin
if [ -f "$ALTERNATE_LOGROTATE_PATH" ]; then
sed -i "s;/usr/sbin/logrotate;$ALTERNATE_LOGROTATE_PATH;" /etc/systemd/system/containerd-logrotate.service
systemctl daemon-reload
fi
path: /opt/bin/run-command.sh
permissions: "0755"
- content: |
#!/bin/bash
KUBELET_CONFIG=/var/lib/kubelet/config/kubelet
if [[ -e /sys/fs/cgroup/cgroup.controllers ]]; then
echo "CGroups V2 are used!"
echo "=> Patch kubelet to use systemd as cgroup driver"
sed -i "s/cgroupDriver: cgroupfs/cgroupDriver: systemd/" "$KUBELET_CONFIG"
else
echo "No CGroups V2 used by system"
fi
path: /opt/bin/configure-cgroupsv2.sh
permissions: "0755"
About this issue
- Original URL
- State: open
- Created a year ago
- Reactions: 1
- Comments: 23 (12 by maintainers)
I tested the newest Flatcar Alpha Image and kOps bootstrapped the cluster succesfully. 👍
Excellent. Thanks a lot @gabriel-samfira!
I think we can have both 2 & 3.
The short term solution would be to have MIME multipart support in
coreos-cloudinit, but long term we will need to addignitionsupport tokops, as that is the idiomatic (and in some cases, the only) way to configure distros that useignition.I will open a separate issue for adding
ignitionsupport inkops.The immediate issue reported here should be fixed (sans the
additionalUserDataoption) once a stable release of flatcar is cut with the above mentioned fix. @Wieneo could you test out the image I linked to and confirm it works for you?