kubernetes: Regression: mirror pods get deleted and recreated repeated due to spec mismatch
During debugging #8637, I noticed that one of services pods: monitoring and logging pods stuck at pending sometimes forever:
$ cluster/kubectl.sh get pods --namespace="default"
fluentd-elasticsearch-kubernetes-minion-9q50 kubernetes-minion-9q50/ <none> Pending About an hour
fluentd-elasticsearch gcr.io/google_containers/fluentd-elasticsearch:1.5
...
$ cluster/kubectl.sh get pods -o yaml fluentd-elasticsearch-kubernetes-minion-9q50
apiVersion: v1beta3
kind: Pod
metadata:
annotations:
kubernetes.io/config.mirror: mirror
kubernetes.io/config.source: file
creationTimestamp: 2015-05-21T21:01:21Z
name: fluentd-elasticsearch-kubernetes-minion-9q50
namespace: default
resourceVersion: "33083"
selfLink: /api/v1beta3/namespaces/default/pods/fluentd-elasticsearch-kubernetes-minion-9q50
uid: 884b09be-fffc-11e4-92b6-42010af084a4
spec:
containers:
- capabilities: {}
env:
- name: FLUENTD_ARGS
value: -qq
image: gcr.io/google_containers/fluentd-elasticsearch:1.5
imagePullPolicy: IfNotPresent
name: fluentd-elasticsearch
resources:
limits:
cpu: 100m
securityContext:
capabilities: {}
privileged: false
terminationMessagePath: /dev/termination-log
volumeMounts:
- mountPath: /varlog
name: varlog
- mountPath: /var/lib/docker/containers
name: containers
dnsPolicy: ClusterFirst
host: kubernetes-minion-9q50
restartPolicy: Always
serviceAccount: ""
volumes:
- hostPath:
path: /var/log
name: varlog
- hostPath:
path: /var/lib/docker/containers
name: containers
status:
phase: Pending
I logged into the node, and found the container is running happily on the node:
# docker ps -a | grep fluentd-elasticsearch-kubernetes-minion-9q50
5219fed9570b gcr.io/google_containers/fluentd-elasticsearch:1.5 "\"/bin/sh -c '/usr/ 2 hours ago Up 2 hours k8s_fluentd-elasticsearch.c99175f6_fluentd-elasticsearch-kubernetes-minion-9q50_default_a8d3815def5de29fd315adf5d9fc5acc_3feacf95
68e799daa598 gcr.io/google_containers/pause:0.8.0 "/pause" 2 hours ago Up 2 hours k8s_POD.e4cc795_fluentd-elasticsearch-kubernetes-minion-9q50_default_a8d3815def5de29fd315adf5d9fc5acc_e6a32500
a029f8710c93 gcr.io/google_containers/pause:0.8.0 "/pause" 2 hours ago k8s_POD.e4cc795_fluentd-elasticsearch-kubernetes-minion-9q50_default_a8d3815def5de29fd315adf5d9fc5acc_82b1abf7
7b7c2c97aa50 gcr.io/google_containers/fluentd-elasticsearch:1.5 "\"/bin/sh -c '/usr/ 2 hours ago Exited (143) 2 hours ago k8s_fluentd-elasticsearch.c99175f6_fluentd-elasticsearch-kubernetes-minion-9q50_default_a8d3815def5de29fd315adf5d9fc5acc_f47f189b
fe98a27d7687 gcr.io/google_containers/pause:0.8.0 "/pause" 2 hours ago Exited (0) 2 hours ago k8s_POD.e4cc795_fluentd-elasticsearch-kubernetes-minion-9q50_default_a8d3815def5de29fd315adf5d9fc5acc_d4c77781
Looks like status is never reported from kubelet. I checked kubelet log, then I found:
E0521 21:01:18.661917 2486 kubelet.go:1074] Deleting mirror pod "fluentd-elasticsearch-kubernetes-minion-9q50_default" because it is outdated
W0521 21:01:19.175759 2486 status_manager.go:60] Failed to updated pod status: error updating status for pod "fluentd-elasticsearch-kubernetes-minion-9q50": pods "fluentd-elasticsearch-kubernetes-minion-9q50" not found
Why we cannot delete mirror pod? and Created a new one? Because this regression, once we run into such state, the rest of test won’t be triggered, and eventually timeout / failed.
About this issue
- Original URL
- State: closed
- Created 9 years ago
- Comments: 19 (16 by maintainers)
Commits related to this issue
- Apply fix for #8642 to GracefulDeletion — committed to smarterclayton/kubernetes by smarterclayton 9 years ago
- Apply fix for #8642 to GracefulDeletion — committed to smarterclayton/kubernetes by smarterclayton 9 years ago
- Apply fix for #8642 to GracefulDeletion — committed to smarterclayton/kubernetes by smarterclayton 9 years ago
- Apply fix for #8642 to GracefulDeletion — committed to smarterclayton/kubernetes by smarterclayton 9 years ago
- Apply fix for #8642 to GracefulDeletion — committed to smarterclayton/kubernetes by smarterclayton 9 years ago
Although this issues was closed, it still appears in my environment: