kubernetes: Master doesn't comes up in k8s HA environment after restart
Is this a BUG REPORT or FEATURE REQUEST?:
Uncomment only one, leave it on its own line: In HA environment, when we restart any master it doesn’t goes into Ready state. /kind bug /kind feature
What happened: I have setup kubernetes cluster with 3 masters and restarted one master. It doesn’t come up in the ready state.
I have been facing issue when I restart a master on multi-master setup, kubeapi-server container doesn’t comes up and exited each time.
docker ps -a :
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 497942533ee4 e774f647e259 “kube-apiserver --…” 22 seconds ago Exited (1) 22 seconds ago k8s_kube-apiserver_kube-apiserver-master-0_kube-system_d3aaa8056353bc9b507cbedc05d5f67c_19 960e46922482 0dcb3dea0db1 “kube-scheduler --…” 31 minutes ago Up 31 minutes k8s_kube-scheduler_kube-scheduler-master-0_kube-system_aa8d5cab3ea096315de0c2003230d4f9_1 2d63d4ee4232 f3fcd0775c4e “kube-controller-m…” 31 minutes ago Up 31 minutes k8s_kube-controller-manager_kube-controller-manager-master-0_kube-system_10a7eee8641d4566988a1b7fd115d16a_1 8a18d04572dd k8s.gcr.io/pause-amd64:3.1 “/pause” 31 minutes ago Up 31 minutes k8s_POD_kube-controller-manager-master-0_kube-system_10a7eee8641d4566988a1b7fd115d16a_1 040eb45a248b k8s.gcr.io/pause-amd64:3.1 “/pause” 31 minutes ago Up 31 minutes k8s_POD_kube-scheduler-master-0_kube-system_aa8d5cab3ea096315de0c2003230d4f9_1 5632a03da568 k8s.gcr.io/pause-amd64:3.1 “/pause” 31 minutes ago Up 31 minutes k8s_POD_kube-apiserver-master-0_kube-system_d3aaa8056353bc9b507cbedc05d5f67c_1 6ac21acfaf48 0dcb3dea0db1 “kube-scheduler --…” 57 minutes ago Exited (2) 36 minutes ago k8s_kube-scheduler_kube-scheduler-master-0_kube-system_aa8d5cab3ea096315de0c2003230d4f9_0 41d738c549d0 f3fcd0775c4e “kube-controller-m…” 57 minutes ago Exited (2) 36 minutes ago k8s_kube-controller-manager_kube-controller-manager-master-0_kube-system_10a7eee8641d4566988a1b7fd115d16a_0 57efd2fd6b61 k8s.gcr.io/pause-amd64:3.1 “/pause” 57 minutes ago Exited (0) 36 minutes ago k8s_POD_kube-scheduler-master-0_kube-system_aa8d5cab3ea096315de0c2003230d4f9_0 3137cea21806 k8s.gcr.io/pause-amd64:3.1 “/pause” 57 minutes ago Exited (0) 36 minutes ago k8s_POD_kube-controller-manager-master-0_kube-system_10a7eee8641d4566988a1b7fd115d16a_0 [root@master-0 centos]#
Logs of the container shows:
[root@master-0 centos]# docker logs 497942533ee4 Flag --insecure-port has been deprecated, This flag will be removed in a future version. Flag --admission-control has been deprecated, Use --enable-admission-plugins or --disable-admission-plugins instead. Will be removed in a future version. I0520 11:03:58.909248 1 server.go:135] Version: v1.10.2 I0520 11:03:58.909959 1 server.go:724] external host was not specified, using 10.0.1.234 unable to load server certificate: open /etc/kubernetes/pki/apiserver.crt: permission denied
Kubelet logs:
May 20 10:53:33 master-0 kubelet[1391]: I0520 10:53:33.697085 1391 kuberuntime_manager.go:513] Container {Name:kube-apiserver Image:k8s.gcr.io/kube-apiserver-amd64:v1.10.2 Command:[kube-apiserver --endpoint-reconciler-type=lease --advertise-address=10.0.1.234 --service-account-key-file=/etc/kubernetes/pki/sa.pub --tls-cert-file=/etc/kubernetes/pki/apiserver.crt --enable-bootstrap-token-auth=true --requestheader-group-headers=X-Remote-Group --requestheader-allowed-names=front-proxy-client --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname --secure-port=6443 --requestheader-client-ca-file=/etc/kubernetes/pki/front-proxy-ca.crt --proxy-client-cert-file=/etc/kubernetes/pki/front-proxy-client.crt --insecure-port=0 --allow-privileged=true --client-ca-file=/etc/kubernetes/pki/ca.crt --service-cluster-ip-range=10.96.0.0/12 --tls-private-key-file=/etc/kubernetes/pki/apiserver.key --kubelet-client-certificate=/etc/kubernetes/pki/apiserver-kubelet-client.crt --kubelet-client-key=/etc/kubernetes/pki/apiserver-kubelet-client.key --proxy-client-key-file=/etc/kubernetes/pki/front-proxy-client.key --admission-control=NamespaceLifecycle,LimitRanger,ServiceAccount,DefaultStorageClass,DefaultTolerationSeconds,NodeRestriction,MutatingAdmissionWebhook,ValidatingAdmissionWebhook,ResourceQuota --requestheader-username-headers=X-Remote-User --requestheader-extra-headers-prefix=X-Remote-Extra- --authorization-mode=Node,RBAC --etcd-servers=http://10.0.1.234:2379,http://10.0.1.61:2379,http://10.0.1.125:2379 --etcd-cafile=/etc/kubernetes/pki/etcd/ca.pem --etcd-certfile=/etc/kubernetes/pki/etcd/client.pem --etcd-keyfile=/etc/kubernetes/pki/etcd/client-key.pem] Args:[] WorkingDir: Ports:[] EnvFrom:[] Env:[] Resources:{Limits:map[] Requests:map[cpu:{i:{value:250 scale:-3} d:{Dec:<nil>} s:250m Format:DecimalSI}]} VolumeMounts:[{Name:ca-certs ReadOnly:true MountPath:/etc/ssl/certs SubPath: MountPropagation:<nil>} {Name:ca-certs-etc-pki ReadOnly:true MountPath:/etc/pki SubPath: MountPropagation:<nil>} {Name:k8s-certs ReadOnly:true MountPath:/etc/kubernetes/pki SubPath: MountPropagation:<nil>}] VolumeDevices:[] LivenessProbe:&Probe{Handler:Handler{Exec:nil,HTTPGet:&HTTPGetAction{Path:/healthz,Port:6443,Host:10.0.1.234,Scheme:HTTPS,HTTPHeaders:[],},TCPSocket:nil,},InitialDelaySeconds:15,TimeoutSeconds:15,PeriodSeconds:10,SuccessThreshold:1,FailureThreshold:8,} ReadinessProbe:nil Lifecycle:nil TerminationMessagePath:/dev/termination-log TerminationMessagePolicy:File ImagePullPolicy:IfNotPresent SecurityContext:nil Stdin:false StdinOnce:false TTY:false} is dead, but RestartPolicy says that we should restart it. May 20 10:53:33 master-0 kubelet[1391]: I0520 10:53:33.697215 1391 kuberuntime_manager.go:757] checking backoff for container “kube-apiserver” in pod “kube-apiserver-master-0_kube-system(d3aaa8056353bc9b507cbedc05d5f67c)” May 20 10:53:33 master-0 kubelet[1391]: I0520 10:53:33.697325 1391 kuberuntime_manager.go:767] Back-off 5m0s restarting failed container=kube-apiserver pod=kube-apiserver-master-0_kube-system(d3aaa8056353bc9b507cbedc05d5f67c) May 20 10:53:33 master-0 kubelet[1391]: E0520 10:53:33.697351 1391 pod_workers.go:186] Error syncing pod d3aaa8056353bc9b507cbedc05d5f67c (“kube-apiserver-master-0_kube-system(d3aaa8056353bc9b507cbedc05d5f67c)”), skipping: failed to “StartContainer” for “kube-apiserver” with CrashLoopBackOff: “Back-off 5m0s restarting failed container=kube-apiserver pod=kube-apiserver-master-0_kube-system(d3aaa8056353bc9b507cbedc05d5f67c)” May 20 10:53:33 master-0 kubelet[1391]: W0520 10:53:33.801782 1391 cni.go:171] Unable to update cni config: No networks found in /etc/cni/net.d May 20 10:53:33 master-0 kubelet[1391]: E0520 10:53:33.801960 1391 kubelet.go:2125] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized May 20 10:53:34 master-0 kubelet[1391]: E0520 10:53:34.095612 1391 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/kubelet.go:460: Failed to list *v1.Node: Get https://10.0.1.234:6443/api/v1/nodes?fieldSelector=metadata.name%3Dmaster-0&limit=500&resourceVersion=0: dial tcp 10.0.1.234:6443: getsockopt: connection refused
[root@master-0 centos]# kubectl get nodes The connection to the server 10.0.1.234:6443 was refused - did you specify the right host or port?
I am using kubernetes version 1.10.2.
What you expected to happen: I expected to master comes into ready state How to reproduce it (as minimally and precisely as possible): Setup 3 master setup following the K8S HA document and restart any master: https://kubernetes.io/docs/setup/independent/high-availability/
Anything else we need to know?:
Environment:
- Kubernetes version (use
kubectl version
): 1.10.2 - Cloud provider or hardware configuration: AWS
- OS (e.g. from /etc/os-release): centos 7
- Kernel (e.g.
uname -a
): Linux master-0 3.10.0-862.2.3.el7.x86_64 #1 SMP Wed May 9 18:05:47 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux - Install tools: kubeadm
- Others:
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Comments: 19 (5 by maintainers)
@Petrox it looks like it’s possibly a problem with SELinux config. Check it here:
If yours says
SELINUX=enforcing
, you’ll need to change it for cgroups to play nicely. Execute the following:Just for the record:
centos 7.5 + kubeadm (clean install and following the kubeadm tutorials) has this problem today.
The solution is to NOT START kubelet before kubeadm init, but you MUST have it enabled:
sidenote: that kubelet / docker has different cgroup driver by default, and it does work fine with this workaround…
…until you try to reboot when the same error starts again.
NOTE: I also had to edit the kubelet systemctl config to add the
--cgroup-driver=systemd
parameter to theKUBELET_KUBECONFIG_ARGS
environment variable:Before editing:
After editing:
You can use the following command to do that: