origin: Pods are not started when defined with DaemonSet - MatchNodeSelector failed
Daemon set is applied but cannot run pods.
Version
$ oc version
oc v3.9.0+0e3d24c-14
kubernetes v1.9.1+a0ce1bc657
features: Basic-Auth GSSAPI Kerberos SPNEGO
Server https://ecIP.compute-1.amazonaws.com:8443
openshift v3.9.0+0e3d24c-14
kubernetes v1.9.1+a0ce1bc657
Nodes:
oc get nodes -owide
NAME STATUS ROLES AGE VERSION EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
ip-172-31-38-104.ec2.internal Ready <none> 11m v1.9.1+a0ce1bc657 <none> CentOS Linux 7 (Core) 3.10.0-862.2.3.el7.x86_64 docker://1.13.1
ip-172-31-44-49.ec2.internal Ready master 12m v1.9.1+a0ce1bc657 <none> CentOS Linux 7 (Core) 3.10.0-862.el7.x86_64 docker://1.13.1
Lables are applied:
oc get nodes --show-labels
NAME STATUS ROLES AGE VERSION LABELS
ip-172-31-38-104.ec2.internal Ready <none> 15m v1.9.1+a0ce1bc657 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/hostname=ip-172-31-38-104.ec2.internal,region=infra,type=infra
ip-172-31-44-49.ec2.internal Ready master 16m v1.9.1+a0ce1bc657 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/hostname=ip-172-31-44-49.ec2.internal,node-role.kubernetes.io/master=true
Steps To Reproduce
- Deploy ds
- Pods are insanely fast recreated but not ever being run.
Current Result
oc describe ds/agent
Name: agent
Selector: app=agent
Node-Selector: <none>
Labels: app=agent
Annotations: <none>
Desired Number of Nodes Scheduled: 2
Current Number of Nodes Scheduled: 2
Number of Nodes Scheduled with Up-to-date Pods: 2
Number of Nodes Scheduled with Available Pods: 0
Number of Nodes Misscheduled: 0
Pods Status: 0 Running / 2 Waiting / 0 Succeeded / 0 Failed
Pod Template:
Labels: app=agent
Service Account: admin
Containers:
agent:
Image: docker-registry.default.svc:5000/agent/agent
Port: <none>
Limits:
cpu: 1500m
memory: 512Mi
Requests:
cpu: 500m
memory: 256Mi
Liveness: exec [echo noop] delay=60s timeout=5s period=10s #success=1 #failure=5
Readiness: exec [echo noop] delay=60s timeout=5s period=10s #success=1 #failure=5
Environment:
AGENT_PORT: 42655
ZONE: cluster
AGENT_ENDPOINT: test-test.com
AGENT_ENDPOINT_PORT: 443
AGENT_KEY: <set to the key 'key' in secret agent-secret'> Optional: false
Mounts:
/dev from dev (rw)
/etc/machine-id from machine-id (rw)
/root/configuration.yaml from configuration (rw)
/sys from sys (rw)
/var/log from log (rw)
/var/run/docker.sock from run (rw)
agent-leader-elector:
Image: docker-registry.default.svc:5000/agent/leader-elector:0.5
Port: <none>
Args:
--election=agent
--http=0.0.0.0:42655
Requests:
cpu: 100m
memory: 64Mi
Liveness: http-get http://:42655/ delay=30s timeout=10s period=10s #success=1 #failure=5
Readiness: http-get http://:42655/ delay=30s timeout=10s period=10s #success=1 #failure=5
Environment: <none>
Mounts: <none>
Volumes:
dev:
Type: HostPath (bare host directory volume)
Path: /dev
HostPathType:
run:
Type: HostPath (bare host directory volume)
Path: /var/run/docker.sock
HostPathType:
sys:
Type: HostPath (bare host directory volume)
Path: /sys
HostPathType:
log:
Type: HostPath (bare host directory volume)
Path: /var/log
HostPathType:
machine-id:
Type: HostPath (bare host directory volume)
Path: /etc/machine-id
HostPathType:
configuration:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: configuration
Optional: false
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SuccessfulCreate 3m daemonset-controller Created pod: agent-m6lwr
Normal SuccessfulCreate 3m daemonset-controller Created pod:-agent-vchgg
Warning FailedDaemonPod 3m daemonset-controller Found failed daemon pod agent/agent-vchgg on node ip-172-31-44-49.ec2.internal, will try to kill it
Warning FailedDaemonPod 3m daemonset-controller Found failed daemon pod-agent/-agent-m6lwr on node ip-172-31-38-104.ec2.internal, will try to kill it
Normal SuccessfulDelete 3m daemonset-controller Deleted pod:-agent-m6lwr
Normal SuccessfulDelete 3m daemonset-controller Deleted pod:-agent-vchgg
Normal SuccessfulCreate 3m daemonset-controller Created pod:-agent-4788q
Normal SuccessfulCreate 3m daemonset-controller Created pod:-agent-cq8jc
Warning FailedDaemonPod 3m daemonset-controller Found failed daemon pod-agent/-agent-4788q on node ip-172-31-44-49.ec2.internal, will try to kill it
Normal SuccessfulDelete 3m daemonset-controller Deleted pod:-agent-cq8jc
Warning FailedDaemonPod 3m daemonset-controller Found failed daemon pod-agent/-agent-cq8jc on node ip-172-31-38-104.ec2.internal, will try to kill it
Normal SuccessfulCreate 3m daemonset-controller Created pod:-agent-xbstb
Normal SuccessfulDelete 3m daemonset-controller Deleted pod:-agent-4788q
Warning FailedDaemonPod 3m daemonset-controller Found failed daemon pod-agent/-agent-xbstb on node ip-172-31-44-49.ec2.internal, will try to kill it
Normal SuccessfulCreate 3m daemonset-controller Created pod: agent-vd7sw
Normal SuccessfulDelete 3m daemonset-controller Deleted pod:-agent-xbstb
Warning FailedDaemonPod 3m daemonset-controller Found failed daemon pod-agent/-agent-vd7sw on node ip-172-31-38-104.ec2.internal, will try to kill it
Normal SuccessfulCreate 3m daemonset-controller Created pod:-agent-4v4wd
Normal SuccessfulDelete 3m daemonset-controller Deleted pod:-agent-vd7sw
Warning FailedDaemonPod 3m daemonset-controller Found failed daemon pod-agent/-agent-4v4wd on node ip-172-31-44-49.ec2.internal, will try to kill it
Normal SuccessfulCreate 3m daemonset-controller Created pod:-agent-qxcqw
Normal SuccessfulDelete 3m daemonset-controller Deleted pod:-agent-4v4wd
Normal SuccessfulCreate 3m daemonset-controller Created pod:-agent-q286h
Warning FailedDaemonPod 3m daemonset-controller Found failed daemon pod-agent/-agent-qxcqw on node ip-172-31-38-104.ec2.internal, will try to kill it
Normal SuccessfulDelete 3m daemonset-controller Deleted pod:-agent-qxcqw
Expected Result
To run pods normally.
if I create pod:
apiVersion: v1
kind: Pod
metadata:
name: busybox
namespace: default
spec:
containers:
- image: busybox
command:
- sleep
- "3600"
imagePullPolicy: Always
name: busybox
restartPolicy: Always
It will work without issues but registering pods with daemonset is not possible. Also, deploying eg, Jenkins from Catalog will also fail.
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Comments: 15 (6 by maintainers)
For me this issue got resolved with help from @sabre1041. I had to set following annotation:
where
pipelineis the namespace I was trying to start DaemonSet in.Here’s an entry in
master-config.yaml:Does this entry prevent starting of pods on nodes that don’t have the role
compute?@dusansusic If you’re having trouble catching one, just do:
which will show yaml output of all pods (which hopefully will include one of your disappearing pods on one of your runs).