origin: Pods are not started when defined with DaemonSet - MatchNodeSelector failed

Daemon set is applied but cannot run pods.

Version
$ oc version
oc v3.9.0+0e3d24c-14
kubernetes v1.9.1+a0ce1bc657
features: Basic-Auth GSSAPI Kerberos SPNEGO




Server https://ecIP.compute-1.amazonaws.com:8443
openshift v3.9.0+0e3d24c-14
kubernetes v1.9.1+a0ce1bc657

Nodes:

oc get nodes -owide
NAME                            STATUS    ROLES     AGE       VERSION             EXTERNAL-IP   OS-IMAGE                KERNEL-VERSION              CONTAINER-RUNTIME
ip-172-31-38-104.ec2.internal   Ready     <none>    11m       v1.9.1+a0ce1bc657   <none>        CentOS Linux 7 (Core)   3.10.0-862.2.3.el7.x86_64   docker://1.13.1
ip-172-31-44-49.ec2.internal    Ready     master    12m       v1.9.1+a0ce1bc657   <none>        CentOS Linux 7 (Core)   3.10.0-862.el7.x86_64       docker://1.13.1

Lables are applied:

oc get nodes --show-labels
NAME                            STATUS    ROLES     AGE       VERSION             LABELS
ip-172-31-38-104.ec2.internal   Ready     <none>    15m       v1.9.1+a0ce1bc657   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/hostname=ip-172-31-38-104.ec2.internal,region=infra,type=infra
ip-172-31-44-49.ec2.internal    Ready     master    16m       v1.9.1+a0ce1bc657   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/hostname=ip-172-31-44-49.ec2.internal,node-role.kubernetes.io/master=true
Steps To Reproduce
  1. Deploy ds
  2. Pods are insanely fast recreated but not ever being run.
Current Result
oc describe ds/agent
Name:           agent
Selector:       app=agent
Node-Selector:  <none>
Labels:         app=agent
Annotations:    <none>
Desired Number of Nodes Scheduled: 2
Current Number of Nodes Scheduled: 2
Number of Nodes Scheduled with Up-to-date Pods: 2
Number of Nodes Scheduled with Available Pods: 0
Number of Nodes Misscheduled: 0
Pods Status:  0 Running / 2 Waiting / 0 Succeeded / 0 Failed
Pod Template:
  Labels:           app=agent
  Service Account:  admin
  Containers:
   agent:
    Image:  docker-registry.default.svc:5000/agent/agent
    Port:   <none>
    Limits:
      cpu:     1500m
      memory:  512Mi
    Requests:
      cpu:      500m
      memory:   256Mi
    Liveness:   exec [echo noop] delay=60s timeout=5s period=10s #success=1 #failure=5
    Readiness:  exec [echo noop] delay=60s timeout=5s period=10s #success=1 #failure=5
    Environment:
      AGENT_PORT:  42655
      ZONE:                       cluster
      AGENT_ENDPOINT:             test-test.com
      AGENT_ENDPOINT_PORT:        443
      AGENT_KEY:                  <set to the key 'key' in secret agent-secret'>  Optional: false
    Mounts:
      /dev from dev (rw)
      /etc/machine-id from machine-id (rw)
      /root/configuration.yaml from configuration (rw)
      /sys from sys (rw)
      /var/log from log (rw)
      /var/run/docker.sock from run (rw)
   agent-leader-elector:
    Image:  docker-registry.default.svc:5000/agent/leader-elector:0.5
    Port:   <none>
    Args:
      --election=agent
      --http=0.0.0.0:42655
    Requests:
      cpu:        100m
      memory:     64Mi
    Liveness:     http-get http://:42655/ delay=30s timeout=10s period=10s #success=1 #failure=5
    Readiness:    http-get http://:42655/ delay=30s timeout=10s period=10s #success=1 #failure=5
    Environment:  <none>
    Mounts:       <none>
  Volumes:
   dev:
    Type:          HostPath (bare host directory volume)
    Path:          /dev
    HostPathType:
   run:
    Type:          HostPath (bare host directory volume)
    Path:          /var/run/docker.sock
    HostPathType:
   sys:
    Type:          HostPath (bare host directory volume)
    Path:          /sys
    HostPathType:
   log:
    Type:          HostPath (bare host directory volume)
    Path:          /var/log
    HostPathType:
   machine-id:
    Type:          HostPath (bare host directory volume)
    Path:          /etc/machine-id
    HostPathType:
   configuration:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:     configuration
    Optional:  false
Events:
  Type     Reason            Age   From                  Message
  ----     ------            ----  ----                  -------
  Normal   SuccessfulCreate  3m    daemonset-controller  Created pod: agent-m6lwr
  Normal   SuccessfulCreate  3m    daemonset-controller  Created pod:-agent-vchgg
  Warning  FailedDaemonPod   3m    daemonset-controller  Found failed daemon pod agent/agent-vchgg on node ip-172-31-44-49.ec2.internal, will try to kill it
  Warning  FailedDaemonPod   3m    daemonset-controller  Found failed daemon pod-agent/-agent-m6lwr on node ip-172-31-38-104.ec2.internal, will try to kill it
  Normal   SuccessfulDelete  3m    daemonset-controller  Deleted pod:-agent-m6lwr
  Normal   SuccessfulDelete  3m    daemonset-controller  Deleted pod:-agent-vchgg
  Normal   SuccessfulCreate  3m    daemonset-controller  Created pod:-agent-4788q
  Normal   SuccessfulCreate  3m    daemonset-controller  Created pod:-agent-cq8jc
  Warning  FailedDaemonPod   3m    daemonset-controller  Found failed daemon pod-agent/-agent-4788q on node ip-172-31-44-49.ec2.internal, will try to kill it
  Normal   SuccessfulDelete  3m    daemonset-controller  Deleted pod:-agent-cq8jc
  Warning  FailedDaemonPod   3m    daemonset-controller  Found failed daemon pod-agent/-agent-cq8jc on node ip-172-31-38-104.ec2.internal, will try to kill it
  Normal   SuccessfulCreate  3m    daemonset-controller  Created pod:-agent-xbstb
  Normal   SuccessfulDelete  3m    daemonset-controller  Deleted pod:-agent-4788q
  Warning  FailedDaemonPod   3m    daemonset-controller  Found failed daemon pod-agent/-agent-xbstb on node ip-172-31-44-49.ec2.internal, will try to kill it
  Normal   SuccessfulCreate  3m    daemonset-controller  Created pod: agent-vd7sw
  Normal   SuccessfulDelete  3m    daemonset-controller  Deleted pod:-agent-xbstb
  Warning  FailedDaemonPod   3m    daemonset-controller  Found failed daemon pod-agent/-agent-vd7sw on node ip-172-31-38-104.ec2.internal, will try to kill it
  Normal   SuccessfulCreate  3m    daemonset-controller  Created pod:-agent-4v4wd
  Normal   SuccessfulDelete  3m    daemonset-controller  Deleted pod:-agent-vd7sw
  Warning  FailedDaemonPod   3m    daemonset-controller  Found failed daemon pod-agent/-agent-4v4wd on node ip-172-31-44-49.ec2.internal, will try to kill it
  Normal   SuccessfulCreate  3m    daemonset-controller  Created pod:-agent-qxcqw
  Normal   SuccessfulDelete  3m    daemonset-controller  Deleted pod:-agent-4v4wd
  Normal   SuccessfulCreate  3m    daemonset-controller  Created pod:-agent-q286h
  Warning  FailedDaemonPod   3m    daemonset-controller  Found failed daemon pod-agent/-agent-qxcqw on node ip-172-31-38-104.ec2.internal, will try to kill it
  Normal   SuccessfulDelete  3m    daemonset-controller  Deleted pod:-agent-qxcqw
Expected Result

To run pods normally.

if I create pod:

apiVersion: v1
kind: Pod
metadata:
  name: busybox
  namespace: default
spec:
  containers:
    - image: busybox
      command:
        - sleep
        - "3600"
      imagePullPolicy: Always
      name: busybox
      restartPolicy: Always

It will work without issues but registering pods with daemonset is not possible. Also, deploying eg, Jenkins from Catalog will also fail.

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Comments: 15 (6 by maintainers)

Most upvoted comments

For me this issue got resolved with help from @sabre1041. I had to set following annotation:

$  oc annotate namespace pipeline openshift.io/node-selector=""

where pipeline is the namespace I was trying to start DaemonSet in.

Here’s an entry in master-config.yaml:

projectConfig:
  defaultNodeSelector: node-role.kubernetes.io/compute=true

Does this entry prevent starting of pods on nodes that don’t have the role compute?

@dusansusic If you’re having trouble catching one, just do:

oc get pods -o yaml

which will show yaml output of all pods (which hopefully will include one of your disappearing pods on one of your runs).