kubernetes: DaemonSet PODs randomly fail to start on management nodes after reboot

BUG REPORT Kubernetes version v1.4.3+coreos.0

Environment:

Cloud provider or hardware configuration: Bare metal CoreOS installation. 3 management nodes + 2 worker nodes.
OS (e.g. from /etc/os-release): CoreOS stable (1185.3.0)
Kernel 4.7.3-coreos-r2

What happened: DaemonSet pods randomly fail to start on management nodes after nodes reboot.

> kubectl get pods -o wide -a -l app=ds-gluster-slow-01
NAME                    READY     STATUS              RESTARTS   AGE       IP            NODE
gluster-slow-01-fgpk3   1/1       Running             0          19h       10.54.147.5   10.54.147.5
gluster-slow-01-payvs   0/1       MatchNodeSelector   0          10h       <none>        10.54.147.6
gluster-slow-01-zdr0t   1/1       Running             0          19h       10.54.147.4   10.54.147.4

10.54.147.4, 10.54.147.5, 10.54.147.6 are the management nodes. The only way to start the pod again is to delete the pod. The MatchNodeSelector happens once every 2-3 reboots. For remaining reboots the pods are started properly.

How to reproduce it: Create a DaemonSet and assign it to management nodes based on user defined labels. Here is our configuration:

Hosts labels:

NAME           STATUS                     AGE       LABELS
10.54.147.10   Ready                      52d       beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/hostname=10.54.147.10,type=infra
10.54.147.11   Ready                      52d       beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/hostname=10.54.147.11,type=infra
10.54.147.4    Ready                      53d       beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,gluster=slow-01,kubernetes.io/hostname=10.54.147.4,type=management
10.54.147.5    Ready                      53d       beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,gluster=slow-01,kubernetes.io/hostname=10.54.147.5,type=management
10.54.147.6    Ready,SchedulingDisabled   47d       beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,gluster=slow-01,kubernetes.io/hostname=10.54.147.6,type=management

DaemonSet definition:

apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
  labels:
    name: gluster-slow-01
  name: gluster-slow-01
  namespace: default
spec:
  selector:
    matchLabels:
      app: ds-gluster-slow-01
  template:
    metadata:
      labels:
        app: ds-gluster-slow-01
    spec:
      containers:
      - image: internal-registry/mic/infra/glusterfs:1
        name: gluster-slow-01
        livenessProbe:
          tcpSocket:
            port: 24007
          initialDelaySeconds: 30
          timeoutSeconds: 1
        securityContext:
          privileged: true
        volumeMounts:
        - mountPath: /mnt/localdisk
          name: gluster-localdisk
        - mountPath: /var/lib/glusterd
          name: gluster-varlib
        - mountPath: /dev
          name: gluster-dev
        - mountPath: /sys/fs/cgroup
          name: gluster-cgroup
      dnsPolicy: ClusterFirst
      hostNetwork: true
      nodeSelector:
        gluster: slow-01
      restartPolicy: Always
      securityContext: {}
      terminationGracePeriodSeconds: 1
      volumes:
      - hostPath:
          path: /mnt/localdisk
        name: gluster-localdisk
      - hostPath:
          path: /dev
        name: gluster-dev
      - hostPath:
          path: /sys/fs/cgroup
        name: gluster-cgroup
      - hostPath:
          path: /var/lib/glusterd
        name: gluster-varlib

Anything else do we need to know: The kube-scheduler yaml file for managment nodes is defined as follows:

apiVersion: v1
kind: Pod
metadata:
  name: kube-scheduler
  namespace: kube-system
spec:
  hostNetwork: true
  containers:
  - name: kube-scheduler
    image: quay.io/coreos/hyperkube:v1.4.3_coreos.0
    command:
    - /hyperkube
    - scheduler
    - --master=http://127.0.0.1:8080
    - --leader-elect=true
    livenessProbe:
      httpGet:
        host: 127.0.0.1
        path: /healthz
        port: 10251
      initialDelaySeconds: 15
      timeoutSeconds: 1

The top log lines from the kube-scheduler pod are:

E1109 06:51:42.487688       1 leaderelection.go:252] error retrieving endpoint: Get http://127.0.0.1:8080/api/v1/namespaces/kube-system/endpoints/kube-scheduler: dial tcp 127.0.0.1:8080: getsockopt: connection refused
E1109 06:51:42.662848       1 reflector.go:214] k8s.io/kubernetes/plugin/pkg/scheduler/factory/factory.go:394: Failed to list *api.Node: Get http://127.0.0.1:8080/api/v1/nodes?resourceVersion=0: dial tcp 127.0.0.1:8080: getsockopt: connection refused
E1109 06:51:42.662918       1 reflector.go:214] k8s.io/kubernetes/plugin/pkg/scheduler/factory/factory.go:404: Failed to list *api.Service: Get http://127.0.0.1:8080/api/v1/services?resourceVersion=0: dial tcp 127.0.0.1:8080: getsockopt: connection refused
E1109 06:51:42.662958       1 reflector.go:214] k8s.io/kubernetes/plugin/pkg/scheduler/factory/factory.go:391: Failed to list *api.Pod: Get http://127.0.0.1:8080/api/v1/pods?fieldSelector=spec.nodeName%21%3D%2Cstatus.phase%21%3DFailed%2Cstatus.phase%21%3DSucceeded&resourceVersion=0: dial tcp 127.0.0.1:8080: getsockopt: connection refused
E1109 06:51:42.663004       1 reflector.go:214] k8s.io/kubernetes/plugin/pkg/scheduler/factory/factory.go:388: Failed to list *api.Pod: Get http://127.0.0.1:8080/api/v1/pods?fieldSelector=spec.nodeName%3D%2Cstatus.phase%21%3DFailed%2Cstatus.phase%21%3DSucceeded&resourceVersion=0: dial tcp 127.0.0.1:8080: getsockopt: connection refused
E1109 06:51:42.663124       1 reflector.go:214] k8s.io/kubernetes/plugin/pkg/scheduler/factory/factory.go:414: Failed to list *extensions.ReplicaSet: Get http://127.0.0.1:8080/apis/extensions/v1beta1/replicasets?resourceVersion=0: dial tcp 127.0.0.1:8080: getsockopt: connection refused
E1109 06:51:42.663170       1 reflector.go:214] k8s.io/kubernetes/plugin/pkg/scheduler/factory/factory.go:409: Failed to list *api.ReplicationController: Get http://127.0.0.1:8080/api/v1/replicationcontrollers?resourceVersion=0: dial tcp 127.0.0.1:8080: getsockopt: connection refused
E1109 06:51:42.663212       1 reflector.go:214] k8s.io/kubernetes/plugin/pkg/scheduler/factory/factory.go:399: Failed to list *api.PersistentVolumeClaim: Get http://127.0.0.1:8080/api/v1/persistentvolumeclaims?resourceVersion=0: dial tcp 127.0.0.1:8080: getsockopt: connection refused
E1109 06:51:42.663257       1 reflector.go:214] k8s.io/kubernetes/plugin/pkg/scheduler/factory/factory.go:398: Failed to list *api.PersistentVolume: Get http://127.0.0.1:8080/api/v1/persistentvolumes?resourceVersion=0: dial tcp 127.0.0.1:8080: getsockopt: connection refused
I1109 06:51:45.940628       1 leaderelection.go:295] lock is held by sid-kb-004 and has not yet expired
I1109 06:51:48.664098       1 leaderelection.go:295] lock is held by sid-kb-004 and has not yet expired

I suspect it is a race condition during the management pods initialization. The kube-scheduler is started before the kube-apiserver pod. It may not be handling well the fact that it can’t connect to the Kubernetes api service. I tried to workaround it by adding the init-containers section:

apiVersion: v1
kind: Pod
metadata:
  name: kube-scheduler
  namespace: kube-system
  annotations:
    pod.beta.kubernetes.io/init-containers: '[
      {
        "image": "quay.io/coreos/hyperkube:v1.4.3_coreos.0",
        "name": "wait-for-master",
        "command": [ "/bin/bash", "-c", "while ! timeout 1 bash -c \"/kubectl --server=http://127.0.0.1:8080/ cluster-info\"; do sleep 1; done" ]
      }
    ]'
spec:
  hostNetwork: true
  containers:
  - name: kube-scheduler
    image: quay.io/coreos/hyperkube:v1.4.3_coreos.0
    command:
    - /hyperkube
    - scheduler
    - --master=http://127.0.0.1:8080
    - --leader-elect=true
    livenessProbe:
      httpGet:
        host: 127.0.0.1
        path: /healthz
        port: 10251
      initialDelaySeconds: 15
      timeoutSeconds: 1

but it looks like init-containers only executes during POD creation. When the POD is restarted it does not execute.

My bug is similar to https://github.com/kubernetes/kubernetes/issues/31123, but provides more data.

About this issue

Original URL
State: closed
Created 8 years ago
Reactions: 3
Comments: 40 (23 by maintainers)

Commits related to this issue

Merge pull request #40330 from janetkuo/kill-failed-daemon-pods Automatic merge from submit-queue DaemonSet controller actively kills failed pods (to recreate them) Ref #36482, @erictune @yujuhong ... — committed to kubernetes/kubernetes by deleted user 7 years ago

Most upvoted comments

@timstclair v1.5.1, with calico network. It occurs whenever I restart the VM, etcd daemonset of calico will fail. I will dig deeper to make sure.

kubeadm version: version.Info{Major:"1", Minor:"6+", GitVersion:"v1.6.0-alpha.0.2074+a092d8e0f95f52", GitCommit:"a092d8e0f95f5200f7ae2cba45c75ab42da36537", GitTreeState:"clean", BuildDate:"2016-12-13T17:03:18Z", GoVersion:"go1.7.4", Compiler:"gc", Platform:"linux/amd64"}
k8s version: Client Version: version.Info{Major:"1", Minor:"5", GitVersion:"v1.5.1", GitCommit:"82450d03cb057bab0950214ef122b67c83fb11df", GitTreeState:"clean", BuildDate:"2016-12-14T00:57:05Z", GoVersion:"go1.7.4", Compiler:"gc", Platform:"linux/amd64"}

resouer on Jan 6, 2017

The other strange thing is that I cannot reproduce this bug with manual reboots. I tried nearly 5-6 times, but everything is ok. Did not changed anything in configuration at all.

tyranron on Jan 11, 2017