kubernetes: Pod ready condition is not always updated after readiness gates condition changes
What happened: I have ReplicationController with 2 replicas. Pods have deafined custom readinessGates condition. Sometimes after readinessGates condition gets changed to True, pod ready condition does not get updated.
What you expected to happen: pod condition type==ready status to be always updated as soon as defined readiness gates status gets changed
How to reproduce it (as minimally and precisely as possible): it happens once in hundred deployments.
Anything else we need to know?:
this is how pod status looks just after RG condition status gets patched
status: class V1PodStatus {
conditions: [class V1PodCondition {
lastProbeTime: null
lastTransitionTime: 2019-12-20T15:03:54.000+01:00
message: null
reason: null
status: True
type: Initialized
}, class V1PodCondition {
lastProbeTime: null
lastTransitionTime: 2019-12-20T15:03:54.000+01:00
message: the status of pod readiness gate "szwadron/kuro" is not "True", but False
reason: ReadinessGatesNotReady
status: False
type: Ready
}, class V1PodCondition {
lastProbeTime: null
lastTransitionTime: 2019-12-20T15:03:57.000+01:00
message: null
reason: null
status: True
type: ContainersReady
}, class V1PodCondition {
lastProbeTime: null
lastTransitionTime: 2019-12-20T15:03:54.000+01:00
message: null
reason: null
status: True
type: PodScheduled
}, class V1PodCondition {
lastProbeTime: 2019-12-20T15:04:00.000+01:00
lastTransitionTime: 2019-12-20T15:04:00.000+01:00
message: All tests passed
reason: TestSucceeded
status: True
type: szwadron/kuro
}]
containerStatuses: [class V1ContainerStatus {
containerID: docker://e933d59a4984e551285e564ea66cb26d11e102f4d3adbb779bfd7e85f2b41e04
image: mycompanyhub/szwadron/overseer-webserver:overseer-communication_0c9f59ab-6e78-409a-94db-fa747a4721e2
imageID: docker-pullable://mycompanyhub/szwadron/overseer-webserver@sha256:03039e0c721baa399b48f0c83d6501ea9dc080892baa8debe219d5919aa29d12
lastState: class V1ContainerState {
running: null
terminated: null
waiting: null
}
name: app
ready: true
restartCount: 0
state: class V1ContainerState {
running: class V1ContainerStateRunning {
startedAt: 2019-12-20T15:03:55.000+01:00
}
terminated: null
waiting: null
}
}]
hostIP: 10.3.49.137
initContainerStatuses: null
message: null
nominatedNodeName: null
phase: Running
podIP: 10.3.238.1
qosClass: Burstable
reason: null
startTime: 2019-12-20T15:03:54.000+01:00
}
overall pod ready condition is false. it stays like that until something other touches ReplicationController od Pod. In my situation it was scaling ReplicationController to 0 replicas (as fail was detected). It happend at 15:06:56. Right after that pod condition changed into ready true. Check lastTransitionTime in following status:
status: class V1PodStatus {
conditions: [class V1PodCondition {
lastProbeTime: 2019-12-20T15:04:00.000+01:00
lastTransitionTime: 2019-12-20T15:04:00.000+01:00
message: All tests passed
reason: TestSucceeded
status: True
type: szwadron/kuro
}, class V1PodCondition {
lastProbeTime: null
lastTransitionTime: 2019-12-20T15:03:54.000+01:00
message: null
reason: null
status: True
type: Initialized
}, class V1PodCondition {
lastProbeTime: null
lastTransitionTime: 2019-12-20T15:06:56.000+01:00
message: null
reason: null
status: True
type: Ready
}, class V1PodCondition {
lastProbeTime: null
lastTransitionTime: 2019-12-20T15:03:57.000+01:00
message: null
reason: null
status: True
type: ContainersReady
}, class V1PodCondition {
lastProbeTime: null
lastTransitionTime: 2019-12-20T15:03:54.000+01:00
message: null
reason: null
status: True
type: PodScheduled
}]
containerStatuses: [class V1ContainerStatus {
containerID: docker://e933d59a4984e551285e564ea66cb26d11e102f4d3adbb779bfd7e85f2b41e04
image: mycompanyhub/szwadron/overseer-webserver:overseer-communication_0c9f59ab-6e78-409a-94db-fa747a4721e2
imageID: docker-pullable://mycompanyhub/szwadron/overseer-webserver@sha256:03039e0c721baa399b48f0c83d6501ea9dc080892baa8debe219d5919aa29d12
lastState: class V1ContainerState {
running: null
terminated: null
waiting: null
}
name: app
ready: true
restartCount: 0
state: class V1ContainerState {
running: class V1ContainerStateRunning {
startedAt: 2019-12-20T15:03:55.000+01:00
}
terminated: null
waiting: null
}
}]
hostIP: 10.3.49.137
initContainerStatuses: null
message: null
nominatedNodeName: null
phase: Running
podIP: 10.3.238.1
qosClass: Burstable
reason: null
startTime: 2019-12-20T15:03:54.000+01:00
}
Kubelet logs for given pod:
gru 20 15:03:54 worker-11-2.dc-1.mycompany kubelet[180176]: I1220 15:03:54.341468 180176 reconciler.go:203] operationExecutor.VerifyControllerAttachedVolume started for volume "default-token-ffrgs" (UniqueName: "kubernetes.io/secret/aba9a54e-bb6c-49e5-b877-0adb84dda76d-default-token-ffrgs") pod "overseer-communication-6057090-fmbn9" (UID: "aba9a54e-bb6c-49e5-b877-0adb84dda76d")
gru 20 15:03:58 worker-11-2.dc-1.mycompany kubelet[180176]: E1220 15:03:58.345453 180176 status_manager.go:180] Kubelet is trying to update pod condition "szwadron/kuro" for pod "overseer-communication-6057090-fmbn9_default(aba9a54e-bb6c-49e5-b877-0adb84dda76d)". But it is not owned by kubelet.
gru 20 15:03:59 worker-11-2.dc-1.mycompany kubelet[180176]: E1220 15:03:59.387138 180176 status_manager.go:180] Kubelet is trying to update pod condition "szwadron/kuro" for pod "overseer-communication-6057090-fmbn9_default(aba9a54e-bb6c-49e5-b877-0adb84dda76d)". But it is not owned by kubelet.
gru 20 15:04:00 worker-11-2.dc-1.mycompany kubelet[180176]: E1220 15:04:00.401381 180176 status_manager.go:180] Kubelet is trying to update pod condition "szwadron/kuro" for pod "overseer-communication-6057090-fmbn9_default(aba9a54e-bb6c-49e5-b877-0adb84dda76d)". But it is not owned by kubelet.
gru 20 15:04:01 worker-11-2.dc-1.mycompany kubelet[180176]: E1220 15:04:01.413149 180176 status_manager.go:180] Kubelet is trying to update pod condition "szwadron/kuro" for pod "overseer-communication-6057090-fmbn9_default(aba9a54e-bb6c-49e5-b877-0adb84dda76d)". But it is not owned by kubelet.
gru 20 15:05:07 worker-11-2.dc-1.mycompany kubelet[180176]: E1220 15:05:07.002422 180176 status_manager.go:180] Kubelet is trying to update pod condition "szwadron/kuro" for pod "overseer-communication-6057090-fmbn9_default(aba9a54e-bb6c-49e5-b877-0adb84dda76d)". But it is not owned by kubelet.
gru 20 15:06:35 worker-11-2.dc-1.mycompany kubelet[180176]: E1220 15:06:35.002408 180176 status_manager.go:180] Kubelet is trying to update pod condition "szwadron/kuro" for pod "overseer-communication-6057090-fmbn9_default(aba9a54e-bb6c-49e5-b877-0adb84dda76d)". But it is not owned by kubelet.
gru 20 15:06:56 worker-11-2.dc-1.mycompany kubelet[180176]: E1220 15:06:56.846982 180176 status_manager.go:180] Kubelet is trying to update pod condition "szwadron/kuro" for pod "overseer-communication-6057090-fmbn9_default(aba9a54e-bb6c-49e5-b877-0adb84dda76d)". But it is not owned by kubelet.
gru 20 15:07:08 worker-11-2.dc-1.mycompany kubelet[180176]: E1220 15:07:08.778363 180176 status_manager.go:180] Kubelet is trying to update pod condition "szwadron/kuro" for pod "overseer-communication-6057090-fmbn9_default(aba9a54e-bb6c-49e5-b877-0adb84dda76d)". But it is not owned by kubelet.
gru 20 15:07:08 worker-11-2.dc-1.mycompany kubelet[180176]: I1220 15:07:08.816210 180176 reconciler.go:177] operationExecutor.UnmountVolume started for volume "default-token-ffrgs" (UniqueName: "kubernetes.io/secret/aba9a54e-bb6c-49e5-b877-0adb84dda76d-default-token-ffrgs") pod "aba9a54e-bb6c-49e5-b877-0adb84dda76d" (UID: "aba9a54e-bb6c-49e5-b877-0adb84dda76d")
gru 20 15:07:08 worker-11-2.dc-1.mycompany kubelet[180176]: I1220 15:07:08.841743 180176 operation_generator.go:860] UnmountVolume.TearDown succeeded for volume "kubernetes.io/secret/aba9a54e-bb6c-49e5-b877-0adb84dda76d-default-token-ffrgs" (OuterVolumeSpecName: "default-token-ffrgs") pod "aba9a54e-bb6c-49e5-b877-0adb84dda76d" (UID: "aba9a54e-bb6c-49e5-b877-0adb84dda76d"). InnerVolumeSpecName "default-token-ffrgs". PluginName "kubernetes.io/secret", VolumeGidValue ""
gru 20 15:07:08 worker-11-2.dc-1.mycompany kubelet[180176]: I1220 15:07:08.917367 180176 reconciler.go:297] Volume detached for volume "default-token-ffrgs" (UniqueName: "kubernetes.io/secret/aba9a54e-bb6c-49e5-b877-0adb84dda76d-default-token-ffrgs") on node "worker-11-2.dc-1.mycompany" DevicePath ""
gru 20 15:07:09 worker-11-2.dc-1.mycompany kubelet[180176]: E1220 15:07:09.792575 180176 status_manager.go:180] Kubelet is trying to update pod condition "szwadron/kuro" for pod "overseer-communication-6057090-fmbn9_default(aba9a54e-bb6c-49e5-b877-0adb84dda76d)". But it is not owned by kubelet.
There is no errors in apiserver or controller-manager logs.
Environment:
- Kubernetes version (use
kubectl version
): 1.15.7 - Cloud provider or hardware configuration: custom install. apiserver in HA in 3 replicas
- OS (e.g:
cat /etc/os-release
): - Kernel (e.g.
uname -a
): CentOS Linux release 7.6.1810 (Core) 3.10.0-957.21.3.el7.x86_64 #1 SMP Tue Jun 18 16:35:19 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Reactions: 1
- Comments: 19 (5 by maintainers)
I have the same problem.My kubernetes version is 1.15.3.