kubeone: Upgrading using cilium CNI to Kubernetes 1.23.3 leads to problems

What happened: After upgrading to 1.23.3 on Hetzner I see hcloud-csi-nodes caught in a CrashLoopBackoff on all control plane nodes

Events:
  Type     Reason     Age                 From               Message
  ----     ------     ----                ----               -------
  Normal   Scheduled  15m                 default-scheduler  Successfully assigned kube-system/hcloud-csi-node-45r2p to prod-001-control-plane-3
  Normal   Pulled     15m                 kubelet            Container image "k8s.gcr.io/sig-storage/csi-node-driver-registrar:v2.3.0" already present on machine
  Normal   Created    15m                 kubelet            Created container csi-node-driver-registrar
  Normal   Started    15m                 kubelet            Started container csi-node-driver-registrar
  Normal   Pulled     15m                 kubelet            Successfully pulled image "docker.io/hetznercloud/hcloud-csi-driver:1.6.0" in 1.03741359s
  Normal   Pulling    15m                 kubelet            Pulling image "k8s.gcr.io/sig-storage/livenessprobe:v2.4.0"
  Normal   Pulled     15m                 kubelet            Successfully pulled image "k8s.gcr.io/sig-storage/livenessprobe:v2.4.0" in 412.469459ms
  Normal   Created    15m                 kubelet            Created container liveness-probe
  Normal   Started    15m                 kubelet            Started container liveness-probe
  Warning  Unhealthy  14m                 kubelet            Liveness probe failed: Get "http://10.244.2.54:9808/healthz": dial tcp 10.244.2.54:9808: i/o timeout (Client.Timeout exceeded while awaiting headers)
  Normal   Pulling    14m (x2 over 15m)   kubelet            Pulling image "docker.io/hetznercloud/hcloud-csi-driver:1.6.0"
  Normal   Killing    14m                 kubelet            Container hcloud-csi-driver failed liveness probe, will be restarted
  Normal   Created    14m (x2 over 15m)   kubelet            Created container hcloud-csi-driver
  Normal   Started    14m (x2 over 15m)   kubelet            Started container hcloud-csi-driver
  Normal   Pulled     14m                 kubelet            Successfully pulled image "docker.io/hetznercloud/hcloud-csi-driver:1.6.0" in 1.039115037s
  Warning  Unhealthy  10m (x30 over 14m)  kubelet            Liveness probe failed: Get "http://10.244.2.54:9808/healthz": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
  Warning  BackOff    8s (x55 over 13m)   kubelet            Back-off restarting failed container

…and machine-controller failing:

Events:
  Type     Reason     Age                   From               Message
  ----     ------     ----                  ----               -------
  Normal   Scheduled  19m                   default-scheduler  Successfully assigned kube-system/machine-controller-77d6cfb8f7-kmxxf to prod-001-control-plane-3
  Normal   Killing    19m                   kubelet            Container machine-controller failed liveness probe, will be restarted
  Normal   Created    19m (x2 over 19m)     kubelet            Created container machine-controller
  Normal   Started    19m (x2 over 19m)     kubelet            Started container machine-controller
  Warning  Unhealthy  19m (x14 over 19m)    kubelet            Readiness probe failed: Get "http://10.244.2.162:8085/healthz": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
  Warning  Unhealthy  18m (x4 over 19m)     kubelet            Liveness probe failed: Get "http://10.244.2.162:8085/readyz": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
  Normal   Pulled     14m (x7 over 19m)     kubelet            Container image "docker.io/kubermatic/machine-controller:v1.43.0" already present on machine
  Warning  BackOff    4m34s (x51 over 17m)  kubelet            Back-off restarting failed container

…as well as core-dns on the control plane

Events:
  Type     Reason            Age                   From               Message
  ----     ------            ----                  ----               -------
  Normal   Scheduled         23m                   default-scheduler  Successfully assigned kube-system/coredns-79c48b58b-nzvjn to prod-001-control-plane-3
  Normal   Pulled            23m                   kubelet            Container image "k8s.gcr.io/coredns/coredns:v1.8.6" already present on machine
  Normal   Created           23m                   kubelet            Created container coredns
  Normal   Started           23m                   kubelet            Started container coredns
  Warning  Unhealthy         21m (x4 over 22m)     kubelet            Liveness probe failed: Get "http://10.244.2.87:8080/health": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
  Warning  Unhealthy         13m (x83 over 23m)    kubelet            Readiness probe failed: Get "http://10.244.2.87:8181/ready": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
  Warning  DNSConfigForming  3m19s (x64 over 23m)  kubelet            Nameserver limits were exceeded, some nameservers have been omitted, the applied nameserver line is: 2a01:4ff:ff00::add:1 2a01:4ff:ff00::add:2 185.12.64.2

…and cluster-autoscaler

 F0221 15:27:39.795533       1 main.go:420] Failed to get nodes from apiserver: Get "https://10.96.0.1:443/api/v1/nodes": dial tcp 10.96.0.1:443: i/o timeout 

I suppose I am overlooking something?

What is the expected behavior:

How to reproduce the issue:

  1. Deploy a 1.22.7 cluster to Hetzner.
  2. Update to 1.23.3

Information about the environment: KubeOne version (kubeone version): 1.4.0 Operating system: Ubuntu Provider you’re deploying cluster on: Hetzner

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 15 (9 by maintainers)

Most upvoted comments