kubernetes: kubeadm 1.8.0 init fails with "/var/lib/kubelet is not empty"
Is this a BUG REPORT or FEATURE REQUEST?:
/kind bug
What happened:
On a fresh Ubuntu 16.04.3 system booted from the official cloud image, kubeadm init fails because /var/lib/kubelet exists.
root@kubemaster:~# kubeadm init
[kubeadm] WARNING: kubeadm is in beta, please do not use it for production clusters.
[init] Using Kubernetes version: v1.8.0
[init] Using Authorization modes: [Node RBAC]
[preflight] Running pre-flight checks
[preflight] Some fatal errors occurred:
        /var/lib/kubelet is not empty
[preflight] If you know what you are doing, you can skip pre-flight checks with `--skip-preflight-checks`
What you expected to happen: kubeadm successfully initializes the cluster
How to reproduce it (as minimally and precisely as possible):
- Boot a new VM from the latest Ubuntu Cloud image
 apt-get install -y apt-transport-https docker.io- Follow the kubeadm installation instructions
 kubeadm init
Anything else we need to know?:
Contents of /var/lib/kubelet:
/var/lib/kubelet
/var/lib/kubelet/pki
/var/lib/kubelet/pki/kubelet.crt
/var/lib/kubelet/pki/kubelet.key
Environment:
- Kubernetes version (use 
kubectl version):root@kubemaster:~# kubectl version Client Version: version.Info{Major:"1", Minor:"8", GitVersion:"v1.8.0", GitCommit:"6e937839ac04a38cac63e6a7a306c5d035fe7b0a", GitTreeState:"clean", BuildDate:"2017-09-28T22:57:57Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"} The connection to the server localhost:8080 was refused - did you specify the right host or port? root@kubemaster:~# apt search kube Sorting... Done Full Text Search... Done kubeadm/kubernetes-xenial,now 1.8.0-01 amd64 [installed] Kubernetes Cluster Bootstrapping Tool kubectl/kubernetes-xenial,now 1.8.0-00 amd64 [installed,automatic] Kubernetes Command Line Tool kubelet/kubernetes-xenial,now 1.8.0-00 amd64 [installed,automatic] Kubernetes Node Agent kubernetes-cni/kubernetes-xenial,now 0.5.1-00 amd64 [installed,automatic] Kubernetes CNI - Cloud provider or hardware configuration: Hyper-V generation 1 virtual machine
 - OS (e.g. from /etc/os-release):
NAME="Ubuntu" VERSION="16.04.3 LTS (Xenial Xerus)" ID=ubuntu ID_LIKE=debian PRETTY_NAME="Ubuntu 16.04.3 LTS" VERSION_ID="16.04" HOME_URL="http://www.ubuntu.com/" SUPPORT_URL="http://help.ubuntu.com/" BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/" VERSION_CODENAME=xenial UBUNTU_CODENAME=xenial - Kernel (e.g. 
uname -a):Linux kubemaster 4.4.0-96-generic #119-Ubuntu SMP Tue Sep 12 14:59:54 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux - Install tools: none
 - Others: none
 
About this issue
- Original URL
 - State: closed
 - Created 7 years ago
 - Reactions: 7
 - Comments: 43 (21 by maintainers)
 
Commits related to this issue
- Merge pull request #53436 from liggitt/kubeadm-init Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github... — committed to kubernetes/kubernetes by deleted user 7 years ago
 - Merge pull request #53317 from liggitt/fix-kubelet-cert-dir Automatic merge from submit-queue (batch tested with PRs 53317, 52186). If you want to cherry-pick this change to another branch, please fo... — committed to kubernetes/kubernetes by deleted user 7 years ago
 
Cause
This is related to the location where the kubelet persists its certificates while running in the background, waiting for config:
kubeadm initcauses files to be generated into the folderkubeadmexpects to be emptysince
kubeadmexpects there to be a running kubelet prior tokubeadm initbeing called, it shouldn’t expect the kubelet’s--root-dirfolder to be emptyWorkaround
if you are scripting bootstrapping a known clean machine, there are a few possible workarounds until https://github.com/kubernetes/kubernetes/pull/53317 is released in 1.8.1 (any of the following work around this issue):
--skip-preflight-checks=true/var/lib/kubelet/pkiprior to running the init or join commandkubeadm resetprior to running init or joinResolution
addressed as part of https://github.com/kubernetes/kubernetes/pull/53317
@liggitt @jpetazzo
If i wipe out the /var/lib/kubelet/pki directory and do not restart kubelet, the kubeadm init process hangs with
`kubelet-check] The HTTP call equal to ‘curl -sSL http://localhost:10255/healthz/syncloop’ failed with error: Get http://localhost:10255/healthz/syncloop: dial tcp [::1]:10255: getsockopt: connection refused. [kubelet-check] It seems like the kubelet isn’t running or healthy. [kubelet-check] The HTTP call equal to ‘curl -sSL http://localhost:10255/healthz’ failed with error: Get http://localhost:10255/healthz: dial tcp [::1]:10255: getsockopt: connection refused. [kubelet-check] It seems like the kubelet isn’t running or healthy. [kubelet-check] The HTTP call equal to ‘curl -sSL http://localhost:10255/healthz/syncloop’ failed with error: Get http://localhost:10255/healthz/syncloop: dial tcp [::1]:10255: getsockopt: connection refused. [kubelet-check] It seems like the kubelet isn’t running or healthy. [kubelet-check] The HTTP call equal to ‘curl -sSL http://localhost:10255/healthz’ failed with error: Get http://localhost:10255/healthz: dial tcp [::1]:10255: getsockopt: connection refused.
Unfortunately, an error has occurred: timed out waiting for the condition
This error is likely caused by that: - The kubelet is not running - The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled) - There is no internet connection; so the kubelet can’t pull the following control plane images: - gcr.io/google_containers/kube-apiserver-amd64:v1.8.0 - gcr.io/google_containers/kube-controller-manager-amd64:v1.8.0 - gcr.io/google_containers/kube-scheduler-amd64:v1.8.0
You can troubleshoot this for example with the following commands if you’re on a systemd-powered system: - ‘systemctl status kubelet’ - ‘journalctl -xeu kubelet’`
When i check the status of kubelet, it’s restarting and never starts because
So, i am guessing my only option is to skip preflight checks, let me try that out…
Correct. The
kubeadminstructions start the kubelet and let it run in a crash loop in the background, waiting for config. In that state, the kubelet is free to write to its directory containing state (/var/lib/kubelet), sokubeadmshould not require that directory to be empty in order to runkubeadm init@wjrogers I did
kubeadm reset && kubeadm initit worked for me!this is fixed in v1.8.1
Hi All,
Need help here
current i’m installing Kubernetes version: v1.8.0 using the command “kubeadm reset && kubeadm init” it went hanged state. systemctl status kubelet log showing that
Oct 04 20:15:02 cluster-1 kubelet[18203]: W1004 20:15:02.263263 18203 cni.go:196] Unable to update cni config: No networks found in /etc/cni/net.d Oct 04 20:15:02 cluster-1 kubelet[18203]: E1004 20:15:02.263758 18203 kubelet.go:2095] Container runtime network not ready: NetworkReady=false reason:Net…nitialized Oct 04 20:15:02 cluster-1 kubelet[18203]: E1004 20:15:02.326988 18203 summary.go:92] Failed to get system container stats for “/system.slice/kubelet.serv…t.service” Oct 04 20:15:02 cluster-1 kubelet[18203]: E1004 20:15:02.327015 18203 summary.go:92] Failed to get system container stats for “/system.slice/docker.servi…r.service” Oct 04 20:15:07 cluster-1 kubelet[18203]: W1004 20:15:07.265211 18203 cni.go:196] Unable to update cni config: No networks found in /etc/cni/net.d Oct 04 20:15:07 cluster-1 kubelet[18203]: E1004 20:15:07.265869 18203 kubelet.go:2095] Container runtime network not ready: NetworkReady=false reason:Net…nitialized Oct 04 20:15:12 cluster-1 kubelet[18203]: W1004 20:15:12.267033 18203 cni.go:196] Unable to update cni config: No networks found in /etc/cni/net.d Oct 04 20:15:12 cluster-1 kubelet[18203]: E1004 20:15:12.267222 18203 kubelet.go:2095] Container runtime network not ready: NetworkReady=false reason:Net…nitialized Oct 04 20:15:12 cluster-1 kubelet[18203]: E1004 20:15:12.332802 18203 summary.go:92] Failed to get system container stats for “/system.slice/kubelet.serv…t.service” Oct 04 20:15:12 cluster-1 kubelet[18203]: E1004 20:15:12.332830 18203 summary.go:92] Failed to get system container stats for “/system.slice/docker.servi…r.service” Warning: kubelet.service changed on disk. Run ‘systemctl daemon-reload’ to reload units.
After that i’m tried to weave it failed with error
kubectl apply -f “https://cloud.weave.works/k8s/net?k8s-version=$(kubectl version | base64 | tr -d ‘\n’)” The connection to the server localhost:8080 was refused - did you specify the right host or port? W1004 18:53:33.930337 16144 factory_object_mapping.go:423] Failed to download OpenAPI (Get http://localhost:8080/swagger-2.0.0.pb-v1: dial tcp [::1]:8080: getsockopt: connection refused), falling back to swagger The connection to the server localhost:8080 was refused - did you specify the right host or port?
Please anyone can help me on this
(Reposting my earlier tip since it worked like a charm for my automated deployment scripts.)
For the time being, before running
kubeadmin initorkubeadmin join, I’ll:/etc/kubernetes/kubelet.confor/etc/kubernetes/admin.confexistssystemctl stop kubelet), and wipe out/var/lib/kubelet/pki(rm -rf /var/lib/kubelet/pki)This might or might not be appropriate for your use-cases, but for mine it works great (I’m automatically deploying hundreds of k8s clusters of training purposes). Stopping kubelet is necessary to avoid race conditions where it would recreate the
pkidirectory before you runkubeadm.there are plenty of other pre-flight checks that are valuable and you don’t want to skip, it’s just the check for an empty
/var/lib/kubeletthat is incorrect.if you are scripting bootstrapping a known clean machine, there are a couple possible workarounds until https://github.com/kubernetes/kubernetes/pull/53317 is released in 1.8.1:
kubeadm init/kubeadm joinskipping preflight checks/var/lib/kubelet/pki, then runkubeadm init/kubeadm joinEverything was done by: https://kubernetes.io/docs/setup/independent/create-cluster-kubeadm/ but now it is even getting better: The HTTP call equal to ‘curl -sSL http://localhost:10255/healthz’ failed with error: Get http://localhost:10255/healthz: dial tcp [::1]:10255: getsockopt: connection refused you should remove 1.8 and put it do “back to the drawing board” stadium. It is useless even to test this 1.8