kubernetes: DaemonSet doesn't run in all nodes
Using v1.2.0-beta.1. Deployed a DaemonSet with no node selector, but it’s not running in all of them.
The two that are running are the ones with SchedulingDisabled.
$ kubectl get nodes
NAME STATUS AGE
100.64.32.234 Ready 8d
100.64.32.71 Ready,SchedulingDisabled 5m
100.64.33.77 Ready,SchedulingDisabled 19m
100.64.33.82 Ready 2d
$ kubectl describe daemonset kube-proxy
Name: kube-proxy
Image(s): calpicow/hyperkube:v1.2.0-beta.1-custom
Selector: name in (kube-proxy)
Node-Selector: <none>
Labels: name=kube-proxy
Desired Number of Nodes Scheduled: 2
Current Number of Nodes Scheduled: 2
Number of Nodes Misscheduled: 0
Pods Status: 2 Running / 0 Waiting / 0 Succeeded / 0 Failed
About this issue
- Original URL
- State: closed
- Created 8 years ago
- Comments: 44 (24 by maintainers)
Commits related to this issue
- Merge pull request #23463 from mikedanese/ds-event Automatic merge from submit-queue add an event for when a daemonset can't place a pod due to insufficen… …t resource or port conflict https://g... — committed to kubernetes/kubernetes by k8s-github-robot 8 years ago
- Merge pull request #23013 from gnufied/fix-dangling-volumes UPSTREAM: 78595: Add dangling volumes as uncertain Origin-commit: 4ed130610c4f2ef849b54f6c53f5689fd0175ad1 — committed to openshift/kubernetes by k8s-publishing-bot 5 years ago
Just ran into the same issue with K8s 1.6.4. I had a node being OOD and repaired it manually, when it came back healthy the ds was not scheduled there and the ds controller did not even try that.
Fixed it using @ankon’s comment above (https://github.com/kubernetes/kubernetes/issues/23013#issuecomment-296596687)
This is issue is really bad when the ds in question is for example Calico, which is needed for the pod networking.
Seeing this with Openshift 3.7 / K8s 1.7
EDIT: root cause for me was related to a taint on some nodes.
For people ending here with a 1.5 cluster and dreading to replace nodes: it might help to just recreate the daemonset itself by using something like
This worked for me to bring back a missing kube2iam pod on a node. Unfortunately I don’t have the logs any more to see why it got lost in the first place.
I’m seeing a similar issue.
I had a disk full issue on a bunch of nodes (unrelated). Some nodes had the daemonsets removed. Other’s didn’t.
The issue is, once I’ve fixed this, I can’t have the nodes reschedule DaemonSets. Short of deleting the node and then restarting kubelet, which isn’t much fun.
I’m seeing this same behavior in my 1.2 cluster. I have 4 nodes in a cluster, all of which have sufficient space available, but the DS is reporting “desired” and “current” counts of 2. What’s worse is that things were properly working a few days ago when I rolled this out, but sometime in the last few days, 2 of the nodes lost their DS pods and they haven’t come back.
Just hit this problem with v1.14.1. Deployed some identical servers (apart from hostname/IP obviously) from the same configuration management but one was not getting DaemonSets scheduled on it.
Comment https://github.com/kubernetes/kubernetes/issues/23013#issuecomment-206503020 resolved the issue for us. Still strange it happened and why it only happened to one of them.
We ran into this problem with v1.6.13
The instructions in this comment make the daemonset pods start at all nodes. But even after a delete and recreate, I think the DaemonSet is still left in a wrong state
Even though all the pods have started daemonset thinks there are 0 desired replicas.
@mikedanese It is still an issue for me:
I have 4 nodes, 1 master and 3 slaves.
I deploy following daemonset:
It deploys, but has DESIRED set to 3, and not 4:
What’s the weirdest of all, it deploys on master node and just two slaves…
1.2.2 seems to be stable for me - my DaemonSets are still running pods on each node after a few days.
https://github.com/kubernetes/kubernetes/issues/23934 😃
Ok - more troubleshooting with Kelsey on slack…deleting the problem nodes by hand, then restarting kubelet on those nodes seemed to fix the issue. The DS scheduled the remaining node once the kubelet had registered itself. Guessing a bad cache somewhere.