kubernetes: Kubeadm init stuck on [init] This might take a minute or longer if the control plane images have to be pulled.
Is this a BUG REPORT or FEATURE REQUEST?:
/kind bug
/sig bug
What happened: Kubeadm init hangs at:
[init] Using Kubernetes version: v1.9.4 [init] Using Authorization modes: [Node RBAC] [preflight] Running pre-flight checks. [WARNING FileExisting-crictl]: crictl not found in system path [preflight] Starting the kubelet service [certificates] Generated ca certificate and key. [certificates] Generated apiserver certificate and key. [certificates] apiserver serving cert is signed for DNS names [master kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 192.168.56.60] [certificates] Generated apiserver-kubelet-client certificate and key. [certificates] Generated sa key and public key. [certificates] Generated front-proxy-ca certificate and key. [certificates] Generated front-proxy-client certificate and key. [certificates] Valid certificates and keys now exist in “/etc/kubernetes/pki” [kubeconfig] Wrote KubeConfig file to disk: “admin.conf” [kubeconfig] Wrote KubeConfig file to disk: “kubelet.conf” [kubeconfig] Wrote KubeConfig file to disk: “controller-manager.conf” [kubeconfig] Wrote KubeConfig file to disk: “scheduler.conf” [controlplane] Wrote Static Pod manifest for component kube-apiserver to “/etc/kubernetes/manifests/kube-apiserver.yaml” [controlplane] Wrote Static Pod manifest for component kube-controller-manager to “/etc/kubernetes/manifests/kube-controller-manager.yaml” [controlplane] Wrote Static Pod manifest for component kube-scheduler to “/etc/kubernetes/manifests/kube-scheduler.yaml” [etcd] Wrote Static Pod manifest for a local etcd instance to “/etc/kubernetes/manifests/etcd.yaml” [init] Waiting for the kubelet to boot up the control plane as Static Pods from directory “/etc/kubernetes/manifests”. [init] This might take a minute or longer if the control plane images have to be pulled. …
What you expected to happen: Cluster initialized. --> Kubernetes master inilized
How to reproduce it (as minimally and precisely as possible): Install docker Install kubeadm, kubelet, kubectl use kubeadm init command
Anything else we need to know?: Since yesterday when initializing a cluster it automatically used Kubernetes version: v1.9.4. I tried forcing kubeadm to use --kubernetes-version=v1.9.3 but I still have the same issue. Last week it was fine when I reset my Kubernetes cluster and reinitialize it again. I found the issue yesterday when I wanted to reset my cluster again and reinitialize it and it got stuck.
I tried yum update, to update all my software, but still the same issue I used Kubernetes v1.9.3 and after updating today… I’m using Kubernetes v1.9.4
Environment:
-
Kubernetes version (use
kubectl version
): Client Version: version.Info{Major:“1”, Minor:“9”, GitVersion:“v1.9.4”, GitCommit:“bee2d1505c4fe820744d26d41ecd3fdd4a3d6546”, GitTreeState:“clean”, BuildDate:“2018-03-12T16:29:47Z”, GoVersion:“go1.9.3”, Compiler:“gc”, Platform:“linux/amd64”} -
OS (e.g. from /etc/os-release): NAME=“CentOS Linux” VERSION=“7 (Core)” ID=“centos” ID_LIKE=“rhel fedora” VERSION_ID=“7” PRETTY_NAME=“CentOS Linux 7 (Core)” ANSI_COLOR=“0;31” CPE_NAME=“cpe:/o:centos:centos:7” HOME_URL=“https://www.centos.org/” BUG_REPORT_URL=“https://bugs.centos.org/”
CENTOS_MANTISBT_PROJECT=“CentOS-7” CENTOS_MANTISBT_PROJECT_VERSION=“7” REDHAT_SUPPORT_PRODUCT=“centos” REDHAT_SUPPORT_PRODUCT_VERSION=“7”
-
Kernel (e.g.
uname -a
): Linux master 3.10.0-514.el7.x86_64 #1 SMP Tue Nov 22 16:42:41 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux -
Install tools: Kubeadm
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Reactions: 14
- Comments: 71 (11 by maintainers)
This issue often comes back on each new release of Kubernetes version. I still don’t know what causes this problem. But since I opened this ticket the kubeadm init gets stuck on each version release and after 3 or 4 days kubeadm init works again like an angel fixes the problem.
I followed advices from this link First make sure you have switched off swap with
sudo swapoff -a
Then add the following line if it does not exist to/etc/systemd/system/kubelet.service.d/10-kubeadm.conf
Environment="KUBELET_EXTRA_ARGS=--fail-swap-on=false"
Restart the kubelet service and docker service withsystemctl restart docker && systemctl restart kubelet.service
Now run
kubeadm init
Today I was trying to setup kubernetes cluster on Raspberry PI and encountered the same issue. I think the problem is that on apiserver fails all the time to finish it’s own configuration during two minutes, and as the result kubelet killing it all the time. And after 4-5 minutes kubeadm also timeouts.
To fix this issue I have used next strategy. Once kubeadm enters init stage( “[init] This might take a minute or longer if the control plane images have to be pulled” is printed) I immediately update kubeapiserver manifest file by running next command:
sed -i ‘s/failureThreshold: 8/failureThreshold: 20/g’ /etc/kubernetes/manifests/kube-apiserver.yaml
Then I have just killed current kube-apiserver container (docker kill).
After that it took near 3 minutes for apiserver to actually startup and kubeadm managed to continue its work.
I’m getting the same issue, running on a Raspberry Pi 3 Model B+
Here are the logs:
It hangs here and it is frustrating!
@jmreicha yeah thats what I found with 1.10.2 as well. The switch of etcd from HTTP to HTTPS seems to be the main source of this issue. I tried starting a etcd instance using docker and the same options as the manifest and it all ran fine. I was also able to
docker exec
into the container and run the health check command without issue. Unfortunately doing the samedocker exec
into the kubelet managed container is more hit and miss, sometimes it would work, usually after it had just started and sometimes it would error out with a grpc timeout. Usually when the grpc timeouts where happeninglsof
would show a large number of connections betweenetcd
andapiserver
though the logs wouldn’t suggest they were actually talking to each other. After a short period of time I think the etcd kubelet health check fails would cause the kubelet to shutdown the etcd instance, the logs on the etcd appear to suggest the instance is being instructed to shutdown rather than say crashing. I’ve never been able to make enough sense out of the apiserver logs to work out whats actually going on with it.I know the crypto on ARM64 is a bit slow (lacking ASM implementations), I believe Golang is working on that at the moment, but that probably won’t land until 1.11 at least and it looks like etcd is still on Go 1.8.5. I’ve been wondering if the reduced speed of the crypto is therefore exceeding some hardwired TLS timeouts in etcd and k8s.
I’m not quite sure what is killing etcd though, I’ve got the exited etcd container log below, just need to work out the source of its “exit (0)” status
1.9.5 has the same issue possible to investigate the issue? what are steps we have to take in the future when an update comes out?
Out of curiosity, I just hooked up a pi 3B+ I forgot I had and tried installing the master on it. Interestingly, the master came up using k8s version 1.10.2 and Docker 18.04 but k8s 1.10.3 seems to still be broken using the RPi.
Then I was able to join the remaining Pine64s to the cluster as workers. This isn’t an ideal setup but at least gets me a 1.10 cluster for now. Still don’t know what’s different between Pine64 and RPi packages/hardware and why it decided to work on RPi but thought it might be helpful for others.
I dug a little bit deeper into this but am still stuck. It looks like the
manifests/etcd.yaml
andmanifests/kube-apiserver.yaml
configs were changed between 1.9 and 1.10.I ran a
--dry-run
using both the 1.9 and 1.10 versions of kubeadm and it seems like the etcd health check was changed as well as turning on certificate auth. At this point I’m thinking that this change is what is causing the issue. For example,1.9 etcd.yaml
1.10 etcd.yaml
The kube-apiserver has also been updated to use this https etcd, instead of the http version that is used in 1.9.
I can get the
kubeadm init
to finish bootstrapping by creating a config file and overriding all of the etcd urls with http endpoints but the etcd and apiserver containers still crashloop.Unfortunately I’m not sure how to fix this, but would love to get it figured out.
Downgrading
docker-ce
to17.x.x
version solves the problem.sudo aptitude install docker-ce=17.12.1~ce-0~raspbian
For me the problem was etcd tried to bind to the ip address looked up through
localhost.somedomain
. Commenting out the search line in/etc/resolv.conf
worked:I am also having same issues for a couple of weeks now 😦 Been trying many versions (Raspbian/Docker/K8s) and no luck on raspberry Pi2 with all the obvious things checked 10 times.
Below are my logs for apiserver and etcd within docker (other things crash as 6443 no longer listening after api-server stops)
Just to confirm I have:
If anyone has any insight (into either further troubleshooting steps or the issue) it would be much appreciated…
Docker info:
kubernetes version… (have tried various versions incl 1.10.2, 1.10.1, 1.10.0, 1.9.x, …)
Example of exact output of kubeadm init (leading up to hang + after it eventually times out)
Effective Install process (/boot/cmdline.txt is already updated in my disk images as are adding the ssh authorized_key etc…)…
I’ve experienced the same, but only after rebooting the box. It was all ok when I’ve created the CentOS instance in OpenStack and installed k8s on it. But when I tried to install k8s after rebooting the instance, or I rebooted after k8s install, k8s did not work anymore/ install hung as described above. Apiserver was trying to come up, and then timed out an stopped. It turned out that the problem was SElinux as described here: https://github.com/kubernetes/kubeadm/issues/417 It all works fine after I’ve set
SELINUX=permissive
in/etc/selinux/config
I’ve turned up the logging verbosity on kubelet, seeing these probe messages now
It looks like this might be related to the
healthcheck-client.crt
not having a SAN field for IP127.0.0.1
, but I’m not sure. Need to create a similar certificate to the existing one and try that, will check that in the morning.kubeadm init --apiserver-advertise-address 192.168.33.11 --pod-network-cidr=192.168.0.0/16 [init] Using Kubernetes version: v1.13.1 [preflight] Running pre-flight checks error execution phase preflight: [preflight] Some fatal errors occurred: [ERROR NumCPU]: the number of available CPUs 1 is less than the required 2
I am getting this error when i am running kubeadm init on master or any other machine. Previously it was worked well.
os ubuntu 16.04, 18.04, kubeadm verson 1.10.5 add no_proxy at /etc/environment solved my problem! no_proxy=“localhost,127.0.0.1,…”
@FFFEGO - Thanks Vlad, I tried fresh from that guide still same issue.
Maybe my internet connection is too slow @~8mbps ???
I saw a few messages about under voltage in journalctl but after increasing to better power supply, and repeating, I still have same issue (under-voltage messages gone though)
What changes did you make to switch the cgroup driver? I’ve tried the following docker override
and passing
--cgroup-driver=systemd
(with and without--runtime-cgroups=/systemd/system.slice --kubelet-cgroups=/systemd/system.slice
) to kubelet, and still ended up with the stalledkubeadm init
.on a side note, does anyone know how we can make kubeadm logs more verbose to understand what/why its actually hosed ? many thanks
Same for v1.10.0 and v1.11.0 too.