kubernetes: DaemonSet PODs randomly fail to start on management nodes after reboot
BUG REPORT Kubernetes version v1.4.3+coreos.0
Environment:
- Cloud provider or hardware configuration: Bare metal CoreOS installation. 3 management nodes + 2 worker nodes.
- OS (e.g. from /etc/os-release):
CoreOS stable (1185.3.0)
- Kernel
4.7.3-coreos-r2
What happened: DaemonSet pods randomly fail to start on management nodes after nodes reboot.
> kubectl get pods -o wide -a -l app=ds-gluster-slow-01
NAME READY STATUS RESTARTS AGE IP NODE
gluster-slow-01-fgpk3 1/1 Running 0 19h 10.54.147.5 10.54.147.5
gluster-slow-01-payvs 0/1 MatchNodeSelector 0 10h <none> 10.54.147.6
gluster-slow-01-zdr0t 1/1 Running 0 19h 10.54.147.4 10.54.147.4
10.54.147.4, 10.54.147.5, 10.54.147.6 are the management nodes.
The only way to start the pod again is to delete the pod. The MatchNodeSelector
happens once every 2-3 reboots. For remaining reboots the pods are started properly.
How to reproduce it: Create a DaemonSet and assign it to management nodes based on user defined labels. Here is our configuration:
Hosts labels:
NAME STATUS AGE LABELS
10.54.147.10 Ready 52d beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/hostname=10.54.147.10,type=infra
10.54.147.11 Ready 52d beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/hostname=10.54.147.11,type=infra
10.54.147.4 Ready 53d beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,gluster=slow-01,kubernetes.io/hostname=10.54.147.4,type=management
10.54.147.5 Ready 53d beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,gluster=slow-01,kubernetes.io/hostname=10.54.147.5,type=management
10.54.147.6 Ready,SchedulingDisabled 47d beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,gluster=slow-01,kubernetes.io/hostname=10.54.147.6,type=management
DaemonSet definition:
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
labels:
name: gluster-slow-01
name: gluster-slow-01
namespace: default
spec:
selector:
matchLabels:
app: ds-gluster-slow-01
template:
metadata:
labels:
app: ds-gluster-slow-01
spec:
containers:
- image: internal-registry/mic/infra/glusterfs:1
name: gluster-slow-01
livenessProbe:
tcpSocket:
port: 24007
initialDelaySeconds: 30
timeoutSeconds: 1
securityContext:
privileged: true
volumeMounts:
- mountPath: /mnt/localdisk
name: gluster-localdisk
- mountPath: /var/lib/glusterd
name: gluster-varlib
- mountPath: /dev
name: gluster-dev
- mountPath: /sys/fs/cgroup
name: gluster-cgroup
dnsPolicy: ClusterFirst
hostNetwork: true
nodeSelector:
gluster: slow-01
restartPolicy: Always
securityContext: {}
terminationGracePeriodSeconds: 1
volumes:
- hostPath:
path: /mnt/localdisk
name: gluster-localdisk
- hostPath:
path: /dev
name: gluster-dev
- hostPath:
path: /sys/fs/cgroup
name: gluster-cgroup
- hostPath:
path: /var/lib/glusterd
name: gluster-varlib
Anything else do we need to know: The kube-scheduler yaml file for managment nodes is defined as follows:
apiVersion: v1
kind: Pod
metadata:
name: kube-scheduler
namespace: kube-system
spec:
hostNetwork: true
containers:
- name: kube-scheduler
image: quay.io/coreos/hyperkube:v1.4.3_coreos.0
command:
- /hyperkube
- scheduler
- --master=http://127.0.0.1:8080
- --leader-elect=true
livenessProbe:
httpGet:
host: 127.0.0.1
path: /healthz
port: 10251
initialDelaySeconds: 15
timeoutSeconds: 1
The top log lines from the kube-scheduler pod are:
E1109 06:51:42.487688 1 leaderelection.go:252] error retrieving endpoint: Get http://127.0.0.1:8080/api/v1/namespaces/kube-system/endpoints/kube-scheduler: dial tcp 127.0.0.1:8080: getsockopt: connection refused
E1109 06:51:42.662848 1 reflector.go:214] k8s.io/kubernetes/plugin/pkg/scheduler/factory/factory.go:394: Failed to list *api.Node: Get http://127.0.0.1:8080/api/v1/nodes?resourceVersion=0: dial tcp 127.0.0.1:8080: getsockopt: connection refused
E1109 06:51:42.662918 1 reflector.go:214] k8s.io/kubernetes/plugin/pkg/scheduler/factory/factory.go:404: Failed to list *api.Service: Get http://127.0.0.1:8080/api/v1/services?resourceVersion=0: dial tcp 127.0.0.1:8080: getsockopt: connection refused
E1109 06:51:42.662958 1 reflector.go:214] k8s.io/kubernetes/plugin/pkg/scheduler/factory/factory.go:391: Failed to list *api.Pod: Get http://127.0.0.1:8080/api/v1/pods?fieldSelector=spec.nodeName%21%3D%2Cstatus.phase%21%3DFailed%2Cstatus.phase%21%3DSucceeded&resourceVersion=0: dial tcp 127.0.0.1:8080: getsockopt: connection refused
E1109 06:51:42.663004 1 reflector.go:214] k8s.io/kubernetes/plugin/pkg/scheduler/factory/factory.go:388: Failed to list *api.Pod: Get http://127.0.0.1:8080/api/v1/pods?fieldSelector=spec.nodeName%3D%2Cstatus.phase%21%3DFailed%2Cstatus.phase%21%3DSucceeded&resourceVersion=0: dial tcp 127.0.0.1:8080: getsockopt: connection refused
E1109 06:51:42.663124 1 reflector.go:214] k8s.io/kubernetes/plugin/pkg/scheduler/factory/factory.go:414: Failed to list *extensions.ReplicaSet: Get http://127.0.0.1:8080/apis/extensions/v1beta1/replicasets?resourceVersion=0: dial tcp 127.0.0.1:8080: getsockopt: connection refused
E1109 06:51:42.663170 1 reflector.go:214] k8s.io/kubernetes/plugin/pkg/scheduler/factory/factory.go:409: Failed to list *api.ReplicationController: Get http://127.0.0.1:8080/api/v1/replicationcontrollers?resourceVersion=0: dial tcp 127.0.0.1:8080: getsockopt: connection refused
E1109 06:51:42.663212 1 reflector.go:214] k8s.io/kubernetes/plugin/pkg/scheduler/factory/factory.go:399: Failed to list *api.PersistentVolumeClaim: Get http://127.0.0.1:8080/api/v1/persistentvolumeclaims?resourceVersion=0: dial tcp 127.0.0.1:8080: getsockopt: connection refused
E1109 06:51:42.663257 1 reflector.go:214] k8s.io/kubernetes/plugin/pkg/scheduler/factory/factory.go:398: Failed to list *api.PersistentVolume: Get http://127.0.0.1:8080/api/v1/persistentvolumes?resourceVersion=0: dial tcp 127.0.0.1:8080: getsockopt: connection refused
I1109 06:51:45.940628 1 leaderelection.go:295] lock is held by sid-kb-004 and has not yet expired
I1109 06:51:48.664098 1 leaderelection.go:295] lock is held by sid-kb-004 and has not yet expired
I suspect it is a race condition during the management pods initialization. The kube-scheduler
is started before the kube-apiserver
pod. It may not be handling well the fact that it can’t connect to the Kubernetes api service.
I tried to workaround it by adding the init-containers
section:
apiVersion: v1
kind: Pod
metadata:
name: kube-scheduler
namespace: kube-system
annotations:
pod.beta.kubernetes.io/init-containers: '[
{
"image": "quay.io/coreos/hyperkube:v1.4.3_coreos.0",
"name": "wait-for-master",
"command": [ "/bin/bash", "-c", "while ! timeout 1 bash -c \"/kubectl --server=http://127.0.0.1:8080/ cluster-info\"; do sleep 1; done" ]
}
]'
spec:
hostNetwork: true
containers:
- name: kube-scheduler
image: quay.io/coreos/hyperkube:v1.4.3_coreos.0
command:
- /hyperkube
- scheduler
- --master=http://127.0.0.1:8080
- --leader-elect=true
livenessProbe:
httpGet:
host: 127.0.0.1
path: /healthz
port: 10251
initialDelaySeconds: 15
timeoutSeconds: 1
but it looks like init-containers only executes during POD creation. When the POD is restarted it does not execute.
My bug is similar to https://github.com/kubernetes/kubernetes/issues/31123, but provides more data.
About this issue
- Original URL
- State: closed
- Created 8 years ago
- Reactions: 3
- Comments: 40 (23 by maintainers)
@timstclair v1.5.1, with calico network. It occurs whenever I restart the VM, etcd daemonset of calico will fail. I will dig deeper to make sure.
The other strange thing is that I cannot reproduce this bug with manual reboots. I tried nearly 5-6 times, but everything is ok. Did not changed anything in configuration at all.