kubernetes: Kubernetes API Server cannot be started after improper reboot
What happened?
Hello! I’ve got a simple Kubernetes cluster (version 1.23.1) with 1 - master node and 2 - worker nodes. Everything worked fine until I’ve accidentally rebooted my host computer with all Virtual machines. After VMs were started again I see that kubelet starts running as before, but commands such as:
kubectl get nodes
,kubectl get pods
return “The connection to the server myhost:6443 was refused - did you specify the right host or port?”. Then I’ve tried:systemctl status kubelet
,journalctl -xeu kubelet
and didn’t see something special. It writes all the time that “Error getting node” err=“node "master01" not found” and periodically: Error syncing pod, skipping" err="failed to "StartContainer" for "etcd" with CrashLoopBackOff. If I remember, such messages emerged sometimes even when the cluster worked.
Then, I’ve decided to check netstat: netstat -tupan | grep LISTEN
and see that Recv-Q increases and record :::6443 disappears after Recv-Q incremented to 25-30.
tcp6 17 0 :::6443 :::* LISTEN -
Whenever I’ve tried to execute systemctl restart kubelet
record :::6443 appears, Recv-Q incremented and removed again from netstat result.
/sig api-machinery /sig kind/bug
What did you expect to happen?
Reboot nodes and everything will work as before.
How can we reproduce it (as minimally and precisely as possible)?
In my case: VM: VMWare Player 16. I’ve got one master node and two worker nodes. When I’ve accidentally turned off my host machine without previously turning off or suspend my VMs (power outage in my case, when properly turn off the host it may preserve the state of the VMs and then successfully restore them). After all, when you turn them on, you’ll see that kubelet started and working, but API server is down (netstat doesn’t show it in the result of listening services).
Anything else we need to know?
No response
Kubernetes version
Cloud provider
OS version
Install tools
Container runtime (CRI) and and version (if applicable)
Related plugins (CNI, CSI, …) and versions (if applicable)
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 18 (10 by maintainers)
Thanks, I’ve already reinitialized the cluster. I’ll try to reproduce the issue and let you know about results.