kubernetes: kubeadm (v1.9.0-alpha.0.723..) hangs while trying to communicate to kubelet

Is this a BUG REPORT or FEATURE REQUEST?:

Uncomment only one, leave it on its own line:

kind bug

This is to report a hang observed during ‘kubeadm init’

What happened:

Suspecting kubeadm is not able to communicate with kubelet during ‘kubeadm init’

bash-4.2# kubeadm init [kubeadm] WARNING: kubeadm is in beta, please do not use it for production clusters. [init] Using Kubernetes version: v1.7.6 [init] Using Authorization modes: [Node RBAC] [preflight] Running pre-flight checks [preflight] WARNING: docker version is greater than the most recently validated version. Docker version: 17.03.1-ce. Max validated version: 1.12 [preflight] WARNING: Connection to “https://10.196.28.252:6443” uses proxy “https://www-proxy.us.oracle.com:80”. If that is not intended, adjust your proxy settings [preflight] WARNING: Running with swap on is not supported. Please disable swap or set kubelet’s --fail-swap-on flag to false. [preflight] WARNING: socat not found in system path [preflight] Starting the kubelet service [kubeadm] WARNING: starting in 1.8, tokens expire after 24 hours by default (if you require a non-expiring token use --token-ttl 0) [certificates] Generated ca certificate and key. [certificates] Generated apiserver certificate and key. [certificates] apiserver serving cert is signed for DNS names [slc12mss kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 10.196.28.252] [certificates] Generated apiserver-kubelet-client certificate and key. [certificates] Generated sa key and public key. [certificates] Generated front-proxy-ca certificate and key. [certificates] Generated front-proxy-client certificate and key. [certificates] Valid certificates and keys now exist in “/etc/kubernetes/pki” [kubeconfig] Wrote KubeConfig file to disk: “admin.conf” [kubeconfig] Wrote KubeConfig file to disk: “kubelet.conf” [kubeconfig] Wrote KubeConfig file to disk: “controller-manager.conf” [kubeconfig] Wrote KubeConfig file to disk: “scheduler.conf” [controlplane] Wrote Static Pod manifest for component kube-apiserver to “/etc/kubernetes/manifests/kube-apiserver.yaml” [controlplane] Wrote Static Pod manifest for component kube-controller-manager to “/etc/kubernetes/manifests/kube-controller-manager.yaml” [controlplane] Wrote Static Pod manifest for component kube-scheduler to “/etc/kubernetes/manifests/kube-scheduler.yaml” [etcd] Wrote Static Pod manifest for a local etcd instance to “/etc/kubernetes/manifests/etcd.yaml” [init] Waiting for the kubelet to boot up the control plane as Static Pods from directory “/etc/kubernetes/manifests” [init] This often takes around a minute; or longer if the control plane images have to be pulled.

Unfortunately, an error has occurred: timed out waiting for the condition

This error is likely caused by that: - The kubelet is not running - The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled) - There is no internet connection; so the kubelet can’t pull the following control plane images: - gcr.io/google_containers/kube-apiserver-amd64:v1.7.6 - gcr.io/google_containers/kube-controller-manager-amd64:v1.7.6 - gcr.io/google_containers/kube-scheduler-amd64:v1.7.6

You can troubleshoot this for example with the following commands if you’re on a systemd-powered system: - ‘systemctl status kubelet’ - ‘journalctl -xeu kubelet’ couldn’t initialize a Kubernetes cluster

I verified that kubelet is up and running bash-4.2# systemctl status kubelet.service ● kubelet.service - kubelet: The Kubernetes Node Agent Loaded: loaded (/etc/systemd/system/kubelet.service; enabled; vendor preset: disabled) Drop-In: /etc/systemd/system/kubelet.service.d └─10-kubeadm.conf Active: active (running) since Tue 2017-09-19 02:13:10 PDT; 1min 49s ago Docs: http://kubernetes.io/docs/ Main PID: 81037 (kubelet) Memory: 23.9M CGroup: /system.slice/kubelet.service └─81037 /opt/bin/kubelet --network-plugin=cni --cni-conf-dir=/etc/cni/net.d --cni-bin-dir=/opt/cni/bin --cgroup-driver=cgroupfs --fail-swap-on=false --experimental-fail-swap-on=false --port=102…

Sep 19 02:13:20 slc12mss kubelet[81037]: I0919 02:13:20.965731 81037 kubelet_node_status.go:276] Setting node annotation to enable volume controller attach/detach …

What you expected to happen: kubeadm is expected to communcate with kubelet, create kube container instances to form a Kubernetes Master Cluster and issue a token, that can be used to by pods to join to the created cluster

How to reproduce it (as minimally and precisely as possible): $ git clone https://github.com/kubernetes/kubernetes $ cd kubernetes ; make quick-release

Created the files /etc/systemd/system/kubelet.service and /etc/systemd/system/kubelet.service.d/10-kubeadm.conf as systemctl will use the same

$ /usr/local/packages/aime/install/run_as_root “systemctl enable kubelet && systemctl start kubelet” $ /usr/local/packages/aime/install/run_as_root “kubeadm init”

Anything else we need to know?: Similar issue faced in kubernetes 1.6.0 version - https://github.com/kubernetes/kubernetes/issues/43815

Environment:

  • Kubernetes version (use kubectl version): bash-4.2# kubectl version Client Version: version.Info{Major:“1”, Minor:“9+”, GitVersion:“v1.9.0-alpha.0.723+8ca1d9f19ba2c7”, GitCommit:“8ca1d9f19ba2c75dfbca058ecfeb11d4b39d9e9f”, GitTreeState:“clean”, BuildDate:“2017-09-18T07:31:58Z”, GoVersion:“go1.8.3”, Compiler:“gc”, Platform:“linux/amd64”}

  • Cloud provider or hardware configuration**: LINUX.X64 OEL7.2 VM box

  • OS (e.g. from /etc/os-release): NAME=“Oracle Linux Server” VERSION=“7.2” ID=“ol” VERSION_ID=“7.2”

  • Kernel (e.g. uname -a): bash-4.2# uname -a Linux slc12mss 4.1.12-61.1.16.el7uek.x86_64 #2 SMP Fri Oct 21 14:23:20 PDT 2016 x86_64 x86_64 x86_64 GNU/Linux

  • Install tools:

  • Others:

About this issue

  • Original URL
  • State: closed
  • Created 7 years ago
  • Comments: 17 (3 by maintainers)

Most upvoted comments

@msbl3004 potentially you’re missing proxy settings for docker, so docker is not able to pull images, thus not able to start pods. However, it’s not good to handle other case within closed issue.