kubeone: Upgrading using cilium CNI to Kubernetes 1.23.3 leads to problems
What happened: After upgrading to 1.23.3 on Hetzner I see hcloud-csi-nodes caught in a CrashLoopBackoff on all control plane nodes
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 15m default-scheduler Successfully assigned kube-system/hcloud-csi-node-45r2p to prod-001-control-plane-3
Normal Pulled 15m kubelet Container image "k8s.gcr.io/sig-storage/csi-node-driver-registrar:v2.3.0" already present on machine
Normal Created 15m kubelet Created container csi-node-driver-registrar
Normal Started 15m kubelet Started container csi-node-driver-registrar
Normal Pulled 15m kubelet Successfully pulled image "docker.io/hetznercloud/hcloud-csi-driver:1.6.0" in 1.03741359s
Normal Pulling 15m kubelet Pulling image "k8s.gcr.io/sig-storage/livenessprobe:v2.4.0"
Normal Pulled 15m kubelet Successfully pulled image "k8s.gcr.io/sig-storage/livenessprobe:v2.4.0" in 412.469459ms
Normal Created 15m kubelet Created container liveness-probe
Normal Started 15m kubelet Started container liveness-probe
Warning Unhealthy 14m kubelet Liveness probe failed: Get "http://10.244.2.54:9808/healthz": dial tcp 10.244.2.54:9808: i/o timeout (Client.Timeout exceeded while awaiting headers)
Normal Pulling 14m (x2 over 15m) kubelet Pulling image "docker.io/hetznercloud/hcloud-csi-driver:1.6.0"
Normal Killing 14m kubelet Container hcloud-csi-driver failed liveness probe, will be restarted
Normal Created 14m (x2 over 15m) kubelet Created container hcloud-csi-driver
Normal Started 14m (x2 over 15m) kubelet Started container hcloud-csi-driver
Normal Pulled 14m kubelet Successfully pulled image "docker.io/hetznercloud/hcloud-csi-driver:1.6.0" in 1.039115037s
Warning Unhealthy 10m (x30 over 14m) kubelet Liveness probe failed: Get "http://10.244.2.54:9808/healthz": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
Warning BackOff 8s (x55 over 13m) kubelet Back-off restarting failed container
…and machine-controller failing:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 19m default-scheduler Successfully assigned kube-system/machine-controller-77d6cfb8f7-kmxxf to prod-001-control-plane-3
Normal Killing 19m kubelet Container machine-controller failed liveness probe, will be restarted
Normal Created 19m (x2 over 19m) kubelet Created container machine-controller
Normal Started 19m (x2 over 19m) kubelet Started container machine-controller
Warning Unhealthy 19m (x14 over 19m) kubelet Readiness probe failed: Get "http://10.244.2.162:8085/healthz": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
Warning Unhealthy 18m (x4 over 19m) kubelet Liveness probe failed: Get "http://10.244.2.162:8085/readyz": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
Normal Pulled 14m (x7 over 19m) kubelet Container image "docker.io/kubermatic/machine-controller:v1.43.0" already present on machine
Warning BackOff 4m34s (x51 over 17m) kubelet Back-off restarting failed container
…as well as core-dns on the control plane
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 23m default-scheduler Successfully assigned kube-system/coredns-79c48b58b-nzvjn to prod-001-control-plane-3
Normal Pulled 23m kubelet Container image "k8s.gcr.io/coredns/coredns:v1.8.6" already present on machine
Normal Created 23m kubelet Created container coredns
Normal Started 23m kubelet Started container coredns
Warning Unhealthy 21m (x4 over 22m) kubelet Liveness probe failed: Get "http://10.244.2.87:8080/health": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
Warning Unhealthy 13m (x83 over 23m) kubelet Readiness probe failed: Get "http://10.244.2.87:8181/ready": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
Warning DNSConfigForming 3m19s (x64 over 23m) kubelet Nameserver limits were exceeded, some nameservers have been omitted, the applied nameserver line is: 2a01:4ff:ff00::add:1 2a01:4ff:ff00::add:2 185.12.64.2
…and cluster-autoscaler
F0221 15:27:39.795533 1 main.go:420] Failed to get nodes from apiserver: Get "https://10.96.0.1:443/api/v1/nodes": dial tcp 10.96.0.1:443: i/o timeout
I suppose I am overlooking something?
What is the expected behavior:
How to reproduce the issue:
- Deploy a 1.22.7 cluster to Hetzner.
- Update to 1.23.3
Information about the environment:
KubeOne version (kubeone version): 1.4.0
Operating system: Ubuntu
Provider you’re deploying cluster on: Hetzner
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 15 (9 by maintainers)
@mdll please see https://github.com/kubermatic/kubeone/issues/2021