kubernetes: Master doesn't comes up in k8s HA environment after restart

Is this a BUG REPORT or FEATURE REQUEST?:

Uncomment only one, leave it on its own line: In HA environment, when we restart any master it doesn’t goes into Ready state. /kind bug /kind feature

What happened: I have setup kubernetes cluster with 3 masters and restarted one master. It doesn’t come up in the ready state.

I have been facing issue when I restart a master on multi-master setup, kubeapi-server container doesn’t comes up and exited each time.

docker ps -a :

CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 497942533ee4 e774f647e259 “kube-apiserver --…” 22 seconds ago Exited (1) 22 seconds ago k8s_kube-apiserver_kube-apiserver-master-0_kube-system_d3aaa8056353bc9b507cbedc05d5f67c_19 960e46922482 0dcb3dea0db1 “kube-scheduler --…” 31 minutes ago Up 31 minutes k8s_kube-scheduler_kube-scheduler-master-0_kube-system_aa8d5cab3ea096315de0c2003230d4f9_1 2d63d4ee4232 f3fcd0775c4e “kube-controller-m…” 31 minutes ago Up 31 minutes k8s_kube-controller-manager_kube-controller-manager-master-0_kube-system_10a7eee8641d4566988a1b7fd115d16a_1 8a18d04572dd k8s.gcr.io/pause-amd64:3.1 “/pause” 31 minutes ago Up 31 minutes k8s_POD_kube-controller-manager-master-0_kube-system_10a7eee8641d4566988a1b7fd115d16a_1 040eb45a248b k8s.gcr.io/pause-amd64:3.1 “/pause” 31 minutes ago Up 31 minutes k8s_POD_kube-scheduler-master-0_kube-system_aa8d5cab3ea096315de0c2003230d4f9_1 5632a03da568 k8s.gcr.io/pause-amd64:3.1 “/pause” 31 minutes ago Up 31 minutes k8s_POD_kube-apiserver-master-0_kube-system_d3aaa8056353bc9b507cbedc05d5f67c_1 6ac21acfaf48 0dcb3dea0db1 “kube-scheduler --…” 57 minutes ago Exited (2) 36 minutes ago k8s_kube-scheduler_kube-scheduler-master-0_kube-system_aa8d5cab3ea096315de0c2003230d4f9_0 41d738c549d0 f3fcd0775c4e “kube-controller-m…” 57 minutes ago Exited (2) 36 minutes ago k8s_kube-controller-manager_kube-controller-manager-master-0_kube-system_10a7eee8641d4566988a1b7fd115d16a_0 57efd2fd6b61 k8s.gcr.io/pause-amd64:3.1 “/pause” 57 minutes ago Exited (0) 36 minutes ago k8s_POD_kube-scheduler-master-0_kube-system_aa8d5cab3ea096315de0c2003230d4f9_0 3137cea21806 k8s.gcr.io/pause-amd64:3.1 “/pause” 57 minutes ago Exited (0) 36 minutes ago k8s_POD_kube-controller-manager-master-0_kube-system_10a7eee8641d4566988a1b7fd115d16a_0 [root@master-0 centos]#

Logs of the container shows:

[root@master-0 centos]# docker logs 497942533ee4 Flag --insecure-port has been deprecated, This flag will be removed in a future version. Flag --admission-control has been deprecated, Use --enable-admission-plugins or --disable-admission-plugins instead. Will be removed in a future version. I0520 11:03:58.909248 1 server.go:135] Version: v1.10.2 I0520 11:03:58.909959 1 server.go:724] external host was not specified, using 10.0.1.234 unable to load server certificate: open /etc/kubernetes/pki/apiserver.crt: permission denied

Kubelet logs:

May 20 10:53:33 master-0 kubelet[1391]: I0520 10:53:33.697085 1391 kuberuntime_manager.go:513] Container {Name:kube-apiserver Image:k8s.gcr.io/kube-apiserver-amd64:v1.10.2 Command:[kube-apiserver --endpoint-reconciler-type=lease --advertise-address=10.0.1.234 --service-account-key-file=/etc/kubernetes/pki/sa.pub --tls-cert-file=/etc/kubernetes/pki/apiserver.crt --enable-bootstrap-token-auth=true --requestheader-group-headers=X-Remote-Group --requestheader-allowed-names=front-proxy-client --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname --secure-port=6443 --requestheader-client-ca-file=/etc/kubernetes/pki/front-proxy-ca.crt --proxy-client-cert-file=/etc/kubernetes/pki/front-proxy-client.crt --insecure-port=0 --allow-privileged=true --client-ca-file=/etc/kubernetes/pki/ca.crt --service-cluster-ip-range=10.96.0.0/12 --tls-private-key-file=/etc/kubernetes/pki/apiserver.key --kubelet-client-certificate=/etc/kubernetes/pki/apiserver-kubelet-client.crt --kubelet-client-key=/etc/kubernetes/pki/apiserver-kubelet-client.key --proxy-client-key-file=/etc/kubernetes/pki/front-proxy-client.key --admission-control=NamespaceLifecycle,LimitRanger,ServiceAccount,DefaultStorageClass,DefaultTolerationSeconds,NodeRestriction,MutatingAdmissionWebhook,ValidatingAdmissionWebhook,ResourceQuota --requestheader-username-headers=X-Remote-User --requestheader-extra-headers-prefix=X-Remote-Extra- --authorization-mode=Node,RBAC --etcd-servers=http://10.0.1.234:2379,http://10.0.1.61:2379,http://10.0.1.125:2379 --etcd-cafile=/etc/kubernetes/pki/etcd/ca.pem --etcd-certfile=/etc/kubernetes/pki/etcd/client.pem --etcd-keyfile=/etc/kubernetes/pki/etcd/client-key.pem] Args:[] WorkingDir: Ports:[] EnvFrom:[] Env:[] Resources:{Limits:map[] Requests:map[cpu:{i:{value:250 scale:-3} d:{Dec:<nil>} s:250m Format:DecimalSI}]} VolumeMounts:[{Name:ca-certs ReadOnly:true MountPath:/etc/ssl/certs SubPath: MountPropagation:<nil>} {Name:ca-certs-etc-pki ReadOnly:true MountPath:/etc/pki SubPath: MountPropagation:<nil>} {Name:k8s-certs ReadOnly:true MountPath:/etc/kubernetes/pki SubPath: MountPropagation:<nil>}] VolumeDevices:[] LivenessProbe:&Probe{Handler:Handler{Exec:nil,HTTPGet:&HTTPGetAction{Path:/healthz,Port:6443,Host:10.0.1.234,Scheme:HTTPS,HTTPHeaders:[],},TCPSocket:nil,},InitialDelaySeconds:15,TimeoutSeconds:15,PeriodSeconds:10,SuccessThreshold:1,FailureThreshold:8,} ReadinessProbe:nil Lifecycle:nil TerminationMessagePath:/dev/termination-log TerminationMessagePolicy:File ImagePullPolicy:IfNotPresent SecurityContext:nil Stdin:false StdinOnce:false TTY:false} is dead, but RestartPolicy says that we should restart it. May 20 10:53:33 master-0 kubelet[1391]: I0520 10:53:33.697215 1391 kuberuntime_manager.go:757] checking backoff for container “kube-apiserver” in pod “kube-apiserver-master-0_kube-system(d3aaa8056353bc9b507cbedc05d5f67c)” May 20 10:53:33 master-0 kubelet[1391]: I0520 10:53:33.697325 1391 kuberuntime_manager.go:767] Back-off 5m0s restarting failed container=kube-apiserver pod=kube-apiserver-master-0_kube-system(d3aaa8056353bc9b507cbedc05d5f67c) May 20 10:53:33 master-0 kubelet[1391]: E0520 10:53:33.697351 1391 pod_workers.go:186] Error syncing pod d3aaa8056353bc9b507cbedc05d5f67c (“kube-apiserver-master-0_kube-system(d3aaa8056353bc9b507cbedc05d5f67c)”), skipping: failed to “StartContainer” for “kube-apiserver” with CrashLoopBackOff: “Back-off 5m0s restarting failed container=kube-apiserver pod=kube-apiserver-master-0_kube-system(d3aaa8056353bc9b507cbedc05d5f67c)” May 20 10:53:33 master-0 kubelet[1391]: W0520 10:53:33.801782 1391 cni.go:171] Unable to update cni config: No networks found in /etc/cni/net.d May 20 10:53:33 master-0 kubelet[1391]: E0520 10:53:33.801960 1391 kubelet.go:2125] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized May 20 10:53:34 master-0 kubelet[1391]: E0520 10:53:34.095612 1391 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/kubelet.go:460: Failed to list *v1.Node: Get https://10.0.1.234:6443/api/v1/nodes?fieldSelector=metadata.name%3Dmaster-0&limit=500&resourceVersion=0: dial tcp 10.0.1.234:6443: getsockopt: connection refused

[root@master-0 centos]# kubectl get nodes The connection to the server 10.0.1.234:6443 was refused - did you specify the right host or port?

I am using kubernetes version 1.10.2.

What you expected to happen: I expected to master comes into ready state How to reproduce it (as minimally and precisely as possible): Setup 3 master setup following the K8S HA document and restart any master: https://kubernetes.io/docs/setup/independent/high-availability/

Anything else we need to know?:

Environment:

  • Kubernetes version (use kubectl version): 1.10.2
  • Cloud provider or hardware configuration: AWS
  • OS (e.g. from /etc/os-release): centos 7
  • Kernel (e.g. uname -a): Linux master-0 3.10.0-862.2.3.el7.x86_64 #1 SMP Wed May 9 18:05:47 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
  • Install tools: kubeadm
  • Others:

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Comments: 19 (5 by maintainers)

Most upvoted comments

@Petrox it looks like it’s possibly a problem with SELinux config. Check it here:

λ cat /etc/selinux/config

# This file controls the state of SELinux on the system.
# SELINUX= can take one of these three values:
#     enforcing - SELinux security policy is enforced.
#     permissive - SELinux prints warnings instead of enforcing.
#     disabled - No SELinux policy is loaded.
SELINUX=enforcing
# SELINUXTYPE= can take one of three two values:
#     targeted - Targeted processes are protected,
#     minimum - Modification of targeted policy. Only selected processes are protected.
#     mls - Multi Level Security protection.
SELINUXTYPE=targeted

If yours says SELINUX=enforcing, you’ll need to change it for cgroups to play nicely. Execute the following:

sed -i -e 's/SELINUX=enforcing/SELINUX=permissive/g' /etc/selinux/config

Just for the record:

centos 7.5 + kubeadm (clean install and following the kubeadm tutorials) has this problem today.

# rpm -qa |grep kubeadm
kubeadm-1.11.2-0.x86_64
# rpm -qa |grep docker
docker-1.13.1-74.git6e3bb8e.el7.centos.x86_64
docker-client-1.13.1-74.git6e3bb8e.el7.centos.x86_64
docker-common-1.13.1-74.git6e3bb8e.el7.centos.x86_64

The solution is to NOT START kubelet before kubeadm init, but you MUST have it enabled:

sudo systemctl stop kubelet
sudo systemctl enable kubelet
sudo kubeadm reset
sudo kubeadm init -v 5 --pod-network-cidr=10.244.0.0/16 --apiserver-advertise-address 1.2.3.4

sidenote: that kubelet / docker has different cgroup driver by default, and it does work fine with this workaround…

…until you try to reboot when the same error starts again.

NOTE: I also had to edit the kubelet systemctl config to add the --cgroup-driver=systemd parameter to the KUBELET_KUBECONFIG_ARGS environment variable:

Before editing:

cat /etc/systemd/system/kubelet.service.d/10-kubeadm.conf
# Note: This dropin only works with kubeadm and kubelet v1.11+
[Service]
Environment="KUBELET_KUBECONFIG_ARGS=--bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf"
# [truncated]

After editing:

λ cat /etc/systemd/system/kubelet.service.d/10-kubeadm.conf
# Note: This dropin only works with kubeadm and kubelet v1.11+
[Service]
Environment="KUBELET_KUBECONFIG_ARGS=--bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --cgroup-driver=systemd"
# [truncated]

You can use the following command to do that:

sed -i -e 's/.conf"/.conf --cgroup-driver=systemd"/' /etc/systemd/system/kubelet.service.d/10-kubeadm.conf