kubernetes: kubeadm 1.8.0 init fails with "/var/lib/kubelet is not empty"
Is this a BUG REPORT or FEATURE REQUEST?:
/kind bug
What happened:
On a fresh Ubuntu 16.04.3 system booted from the official cloud image, kubeadm init
fails because /var/lib/kubelet
exists.
root@kubemaster:~# kubeadm init
[kubeadm] WARNING: kubeadm is in beta, please do not use it for production clusters.
[init] Using Kubernetes version: v1.8.0
[init] Using Authorization modes: [Node RBAC]
[preflight] Running pre-flight checks
[preflight] Some fatal errors occurred:
/var/lib/kubelet is not empty
[preflight] If you know what you are doing, you can skip pre-flight checks with `--skip-preflight-checks`
What you expected to happen: kubeadm successfully initializes the cluster
How to reproduce it (as minimally and precisely as possible):
- Boot a new VM from the latest Ubuntu Cloud image
apt-get install -y apt-transport-https docker.io
- Follow the kubeadm installation instructions
kubeadm init
Anything else we need to know?:
Contents of /var/lib/kubelet
:
/var/lib/kubelet
/var/lib/kubelet/pki
/var/lib/kubelet/pki/kubelet.crt
/var/lib/kubelet/pki/kubelet.key
Environment:
- Kubernetes version (use
kubectl version
):root@kubemaster:~# kubectl version Client Version: version.Info{Major:"1", Minor:"8", GitVersion:"v1.8.0", GitCommit:"6e937839ac04a38cac63e6a7a306c5d035fe7b0a", GitTreeState:"clean", BuildDate:"2017-09-28T22:57:57Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"} The connection to the server localhost:8080 was refused - did you specify the right host or port? root@kubemaster:~# apt search kube Sorting... Done Full Text Search... Done kubeadm/kubernetes-xenial,now 1.8.0-01 amd64 [installed] Kubernetes Cluster Bootstrapping Tool kubectl/kubernetes-xenial,now 1.8.0-00 amd64 [installed,automatic] Kubernetes Command Line Tool kubelet/kubernetes-xenial,now 1.8.0-00 amd64 [installed,automatic] Kubernetes Node Agent kubernetes-cni/kubernetes-xenial,now 0.5.1-00 amd64 [installed,automatic] Kubernetes CNI
- Cloud provider or hardware configuration: Hyper-V generation 1 virtual machine
- OS (e.g. from /etc/os-release):
NAME="Ubuntu" VERSION="16.04.3 LTS (Xenial Xerus)" ID=ubuntu ID_LIKE=debian PRETTY_NAME="Ubuntu 16.04.3 LTS" VERSION_ID="16.04" HOME_URL="http://www.ubuntu.com/" SUPPORT_URL="http://help.ubuntu.com/" BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/" VERSION_CODENAME=xenial UBUNTU_CODENAME=xenial
- Kernel (e.g.
uname -a
):Linux kubemaster 4.4.0-96-generic #119-Ubuntu SMP Tue Sep 12 14:59:54 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
- Install tools: none
- Others: none
About this issue
- Original URL
- State: closed
- Created 7 years ago
- Reactions: 7
- Comments: 43 (21 by maintainers)
Commits related to this issue
- Merge pull request #53436 from liggitt/kubeadm-init Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github... — committed to kubernetes/kubernetes by deleted user 7 years ago
- Merge pull request #53317 from liggitt/fix-kubelet-cert-dir Automatic merge from submit-queue (batch tested with PRs 53317, 52186). If you want to cherry-pick this change to another branch, please fo... — committed to kubernetes/kubernetes by deleted user 7 years ago
Cause
This is related to the location where the kubelet persists its certificates while running in the background, waiting for config:
kubeadm init
causes files to be generated into the folderkubeadm
expects to be emptysince
kubeadm
expects there to be a running kubelet prior tokubeadm init
being called, it shouldn’t expect the kubelet’s--root-dir
folder to be emptyWorkaround
if you are scripting bootstrapping a known clean machine, there are a few possible workarounds until https://github.com/kubernetes/kubernetes/pull/53317 is released in 1.8.1 (any of the following work around this issue):
--skip-preflight-checks=true
/var/lib/kubelet/pki
prior to running the init or join commandkubeadm reset
prior to running init or joinResolution
addressed as part of https://github.com/kubernetes/kubernetes/pull/53317
@liggitt @jpetazzo
If i wipe out the /var/lib/kubelet/pki directory and do not restart kubelet, the kubeadm init process hangs with
`kubelet-check] The HTTP call equal to ‘curl -sSL http://localhost:10255/healthz/syncloop’ failed with error: Get http://localhost:10255/healthz/syncloop: dial tcp [::1]:10255: getsockopt: connection refused. [kubelet-check] It seems like the kubelet isn’t running or healthy. [kubelet-check] The HTTP call equal to ‘curl -sSL http://localhost:10255/healthz’ failed with error: Get http://localhost:10255/healthz: dial tcp [::1]:10255: getsockopt: connection refused. [kubelet-check] It seems like the kubelet isn’t running or healthy. [kubelet-check] The HTTP call equal to ‘curl -sSL http://localhost:10255/healthz/syncloop’ failed with error: Get http://localhost:10255/healthz/syncloop: dial tcp [::1]:10255: getsockopt: connection refused. [kubelet-check] It seems like the kubelet isn’t running or healthy. [kubelet-check] The HTTP call equal to ‘curl -sSL http://localhost:10255/healthz’ failed with error: Get http://localhost:10255/healthz: dial tcp [::1]:10255: getsockopt: connection refused.
Unfortunately, an error has occurred: timed out waiting for the condition
This error is likely caused by that: - The kubelet is not running - The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled) - There is no internet connection; so the kubelet can’t pull the following control plane images: - gcr.io/google_containers/kube-apiserver-amd64:v1.8.0 - gcr.io/google_containers/kube-controller-manager-amd64:v1.8.0 - gcr.io/google_containers/kube-scheduler-amd64:v1.8.0
You can troubleshoot this for example with the following commands if you’re on a systemd-powered system: - ‘systemctl status kubelet’ - ‘journalctl -xeu kubelet’`
When i check the status of kubelet, it’s restarting and never starts because
So, i am guessing my only option is to skip preflight checks, let me try that out…
Correct. The
kubeadm
instructions start the kubelet and let it run in a crash loop in the background, waiting for config. In that state, the kubelet is free to write to its directory containing state (/var/lib/kubelet
), sokubeadm
should not require that directory to be empty in order to runkubeadm init
@wjrogers I did
kubeadm reset && kubeadm init
it worked for me!this is fixed in v1.8.1
Hi All,
Need help here
current i’m installing Kubernetes version: v1.8.0 using the command “kubeadm reset && kubeadm init” it went hanged state. systemctl status kubelet log showing that
Oct 04 20:15:02 cluster-1 kubelet[18203]: W1004 20:15:02.263263 18203 cni.go:196] Unable to update cni config: No networks found in /etc/cni/net.d Oct 04 20:15:02 cluster-1 kubelet[18203]: E1004 20:15:02.263758 18203 kubelet.go:2095] Container runtime network not ready: NetworkReady=false reason:Net…nitialized Oct 04 20:15:02 cluster-1 kubelet[18203]: E1004 20:15:02.326988 18203 summary.go:92] Failed to get system container stats for “/system.slice/kubelet.serv…t.service” Oct 04 20:15:02 cluster-1 kubelet[18203]: E1004 20:15:02.327015 18203 summary.go:92] Failed to get system container stats for “/system.slice/docker.servi…r.service” Oct 04 20:15:07 cluster-1 kubelet[18203]: W1004 20:15:07.265211 18203 cni.go:196] Unable to update cni config: No networks found in /etc/cni/net.d Oct 04 20:15:07 cluster-1 kubelet[18203]: E1004 20:15:07.265869 18203 kubelet.go:2095] Container runtime network not ready: NetworkReady=false reason:Net…nitialized Oct 04 20:15:12 cluster-1 kubelet[18203]: W1004 20:15:12.267033 18203 cni.go:196] Unable to update cni config: No networks found in /etc/cni/net.d Oct 04 20:15:12 cluster-1 kubelet[18203]: E1004 20:15:12.267222 18203 kubelet.go:2095] Container runtime network not ready: NetworkReady=false reason:Net…nitialized Oct 04 20:15:12 cluster-1 kubelet[18203]: E1004 20:15:12.332802 18203 summary.go:92] Failed to get system container stats for “/system.slice/kubelet.serv…t.service” Oct 04 20:15:12 cluster-1 kubelet[18203]: E1004 20:15:12.332830 18203 summary.go:92] Failed to get system container stats for “/system.slice/docker.servi…r.service” Warning: kubelet.service changed on disk. Run ‘systemctl daemon-reload’ to reload units.
After that i’m tried to weave it failed with error
kubectl apply -f “https://cloud.weave.works/k8s/net?k8s-version=$(kubectl version | base64 | tr -d ‘\n’)” The connection to the server localhost:8080 was refused - did you specify the right host or port? W1004 18:53:33.930337 16144 factory_object_mapping.go:423] Failed to download OpenAPI (Get http://localhost:8080/swagger-2.0.0.pb-v1: dial tcp [::1]:8080: getsockopt: connection refused), falling back to swagger The connection to the server localhost:8080 was refused - did you specify the right host or port?
Please anyone can help me on this
(Reposting my earlier tip since it worked like a charm for my automated deployment scripts.)
For the time being, before running
kubeadmin init
orkubeadmin join
, I’ll:/etc/kubernetes/kubelet.conf
or/etc/kubernetes/admin.conf
existssystemctl stop kubelet
), and wipe out/var/lib/kubelet/pki
(rm -rf /var/lib/kubelet/pki
)This might or might not be appropriate for your use-cases, but for mine it works great (I’m automatically deploying hundreds of k8s clusters of training purposes). Stopping kubelet is necessary to avoid race conditions where it would recreate the
pki
directory before you runkubeadm
.there are plenty of other pre-flight checks that are valuable and you don’t want to skip, it’s just the check for an empty
/var/lib/kubelet
that is incorrect.if you are scripting bootstrapping a known clean machine, there are a couple possible workarounds until https://github.com/kubernetes/kubernetes/pull/53317 is released in 1.8.1:
kubeadm init
/kubeadm join
skipping preflight checks/var/lib/kubelet/pki
, then runkubeadm init
/kubeadm join
Everything was done by: https://kubernetes.io/docs/setup/independent/create-cluster-kubeadm/ but now it is even getting better: The HTTP call equal to ‘curl -sSL http://localhost:10255/healthz’ failed with error: Get http://localhost:10255/healthz: dial tcp [::1]:10255: getsockopt: connection refused you should remove 1.8 and put it do “back to the drawing board” stadium. It is useless even to test this 1.8