kubernetes: daemonset wrongly reports unavailable pods
Is this a BUG REPORT or FEATURE REQUEST?:
Uncomment only one, leave it on its own line:
/kind bug
/kind feature
What happened: kubectl rollout status on a daemonset sometimes is stuck forever. The daemonset’s status reports unavailable pods even when all pods are running and ready.
$ kubectl -n monitoring rollout status ds <redacted>
Waiting for rollout to finish: 1 of 2 updated pods are available...
Here’s the status section of the daemonset:
status:
currentNumberScheduled: 2
desiredNumberScheduled: 2
numberAvailable: 1
numberMisscheduled: 0
numberReady: 2
numberUnavailable: 1
observedGeneration: 1
updatedNumberScheduled: 2
Here’s the status sections of the pods:
status:
conditions:
- lastProbeTime: null
lastTransitionTime: 2017-09-25T21:13:05Z
status: "True"
type: Initialized
- lastProbeTime: null
lastTransitionTime: 2017-09-25T21:16:14Z
status: "True"
type: Ready
- lastProbeTime: null
lastTransitionTime: 2017-09-25T21:14:24Z
status: "True"
type: PodScheduled
status:
conditions:
- lastProbeTime: null
lastTransitionTime: 2017-09-25T21:13:04Z
status: "True"
type: Initialized
- lastProbeTime: null
lastTransitionTime: 2017-09-25T21:16:02Z
status: "True"
type: Ready
- lastProbeTime: null
lastTransitionTime: 2017-09-25T21:14:28Z
status: "True"
type: PodScheduled
$ date
Mon Sep 25 22:18:39 UTC 2017
What you expected to happen: kubectl rollout status should exit successfully when the rollout is complete. The daemonset should report all pods as available.
How to reproduce it (as minimally and precisely as possible): Can’t reproduce it reliably, but it happened with this simple daemonset:
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
annotations: <redacted>
name: <redacted>
namespace: <redacted>
spec:
minReadySeconds: 30
template:
metadata:
annotations: <redacted>
labels: <redacted>
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: cloud.google.com/gke-nodepool
operator: NotIn
values:
- <redacted>
containers:
- args: <redacted>
image: <redacted>
imagePullPolicy: Always
name: <redacted>
ports: <redacted>
resources: {}
volumeMounts: <redacted>
dnsPolicy: ClusterFirstWithHostNet
hostNetwork: true
volumes:
- downwardAPI:
items:
- fieldRef:
apiVersion: v1
fieldPath: metadata.labels
path: labels
- fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
path: namespace
name: podinfo
updateStrategy:
rollingUpdate:
maxUnavailable: 3
type: RollingUpdate
Anything else we need to know?:
Environment:
- Kubernetes version (use
kubectl version
):
Client Version: version.Info{Major:"1", Minor:"7", GitVersion:"v1.7.5", GitCommit:"17d7182a7ccbb167074be7a87f0a68bd00d58d97", GitTreeState:"clean", BuildDate:"2017-08-31T09:14:02Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"7", GitVersion:"v1.7.5", GitCommit:"17d7182a7ccbb167074be7a87f0a68bd00d58d97", GitTreeState:"clean", BuildDate:"2017-08-31T08:56:23Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}
- Cloud provider or hardware configuration**: GKE
- OS (e.g. from /etc/os-release): COS
- Kernel (e.g.
uname -a
): - Install tools: GKE
- Others:
About this issue
- Original URL
- State: open
- Created 7 years ago
- Reactions: 7
- Comments: 36 (13 by maintainers)
Observed in one of our clusters as well, where the reported number of available/ready pods seems to be wrong.
This happened to 2 other DaemonSets running in this cluster as well. We “fixed” one of them by changing the DaemonSet pod spec which triggered a rolling update. After the update the counts were correct. We’ve not been able to find any misbehaving node or other signs that indicate something is wrong with the cluster.
The DaemonSet has been in this state for 3-4 hours now.
some information:
Please tell me if I can provide any more information.
i have the exact same here. K8s
1.15.3
. I can fix every DaemonSet with that issue by runningkubectl rollout restart ds <DaemonSetName>
. It then get’s rerolled and all is fine.@bbgobie I know, right!? It would be nice if it would tell us which pod is “not ready” in its mind. (even though they all report as ready). We just ended up restarting all the pods as per my message a few posts up. I have not seen any problems (with that) on our 1.16.3 cluster.
/reopen
Third times the charm?