kubernetes: Pod does not go into unready state while terminating

What happened:

Pod that requires a certain time to drain remains in READY state while terminating, until just before it becomes fully terminated.

What you expected to happen:

Once a Pod is deleted, it should go into unready state (within a short time), as the documentation says:

If you want to be able to drain requests when the Pod is deleted, you do not necessarily need a readiness probe; on deletion, the Pod automatically puts itself into an unready state regardless of whether the readiness probe exists. The Pod remains in the unready state while it waits for the containers in the Pod to stop.

https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#when-should-you-use-a-readiness-probe (Emphasis is mine.)

How to reproduce it (as minimally and precisely as possible):

Create a Pod resource

$ cat test.yaml
apiVersion: v1
kind: Pod
metadata:
  name: test
  namespace: default
spec:
  containers:
  - args:
    - sleep
    - "1000000"
    image: busybox
    name: test

$ kubectl apply -f ./test.yaml

sleep command is a good example here because sleep command does not handle SIGTERM and will keep running until it is killed.

Delete the Pod

$ kubectl delete pod test

Pod keeps READY state while it waits for the container in the Pod to stop, against the doc

$ kubectl get pods
NAME   READY   STATUS        RESTARTS   AGE
test   1/1     Terminating   0          44s

The values of Pod Condition Ready and ContainersReady also remain true.

Anything else we need to know?:

Environment:

Kubernetes version (use kubectl version):

Client Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.0", GitCommit:"cb303e613a121a29364f75cc67d3d580833a7479", GitTreeState:"clean", BuildDate:"2021-04-08T16:31:21Z", GoVersion:"go1.16.1", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"20+", GitVersion:"v1.20.6-gke.1000", GitCommit:"3ae0998c5052f420a17cb96bacf860ec5d6822a3", GitTreeState:"clean", BuildDate:"2021-04-29T09:17:16Z", GoVersion:"go1.15.10b5", Compiler:"gc", Platform:"linux/amd64"}

I also confirmed with v1.19.9-gke.1400.

Cloud provider or hardware configuration: GKE
OS (e.g: cat /etc/os-release)

NAME="Container-Optimized OS"
ID=cos
PRETTY_NAME="Container-Optimized OS from Google"
HOME_URL="https://cloud.google.com/container-optimized-os/docs"
BUG_REPORT_URL="https://cloud.google.com/container-optimized-os/docs/resources/support-policy#contact_us"
KERNEL_COMMIT_ID=c19d150c6bd658510ec786390aec80ad476c7578
GOOGLE_CRASH_ID=Lakitu
GOOGLE_METRICS_PRODUCT_ID=26
VERSION=89
VERSION_ID=89
BUILD_ID=16108.403.15

Kernel (e.g. uname -a):

Linux cs-291166234657-default-boost-vqzbq 5.4.104+ #1 SMP Fri Apr 30 09:52:02 PDT 2021 x86_64 GNU/Linux

Install tools: GKE
Network plugin and version (if this is a network-related bug):
Others:

About this issue

Original URL
State: closed
Created 3 years ago
Comments: 25 (12 by maintainers)

Most upvoted comments

~Posted a fix for this https://github.com/kubernetes/kubernetes/pull/110191~

rphillips on Jun 1, 2022

https://github.com/kubernetes/kubernetes/blob/80056f73a614b21c7d2165d65f3b74a2fbf2264e/pkg/controller/endpoint/endpoints_controller.go#L447-L450

https://github.com/kubernetes/kubernetes/blob/80056f73a614b21c7d2165d65f3b74a2fbf2264e/pkg/controller/endpointslice/utils.go#L44-L54

https://github.com/kubernetes/kubernetes/blob/ea0764452222146c47ec826977f49d7001b0ea8c/pkg/controller/util/endpoint/controller_utils.go#L126-L135

Refer to the codes above, the pod will be treated as an unready endpoint if the pod is terminating. I guess that’s what the document means. And if that’s true, maybe a PR for the doc is needed.

RyanAoh on Oct 9, 2021

I found the same using the test pod described above:

meta:
    creationTimestamp: "2021-06-09T13:29:12Z"
    deletionGracePeriodSeconds: 30
    deletionTimestamp: "2021-06-09T13:29:56Z"
...
  status:
    conditions:
    - lastProbeTime: null
      lastTransitionTime: "2021-06-09T13:29:12Z"
      status: "True"
      type: Initialized
    - lastProbeTime: null
      lastTransitionTime: "2021-06-09T13:29:57Z"
      message: 'containers with unready status: [test]'
      reason: ContainersNotReady
      status: "False"
      type: Ready
    - lastProbeTime: null
      lastTransitionTime: "2021-06-09T13:29:57Z"
      message: 'containers with unready status: [test]'
      reason: ContainersNotReady
      status: "False"
      type: ContainersReady
    - lastProbeTime: null
      lastTransitionTime: "2021-06-09T13:29:12Z"
      status: "True"
      type: PodScheduled
    containerStatuses:
    - containerID: cri-o://c2a4539d8f31bac8f96659f82111502060e9cdf28045928220ba2bf6f9400dfd
      image: docker.io/library/busybox:latest
      imageID: docker.io/library/busybox@sha256:930490f97e5b921535c153e0e7110d251134cc4b72bbb8133c6a5065cc68580d
      lastState: {}
      name: test
      ready: false
      restartCount: 0
      started: false
      state:
        terminated:
          containerID: cri-o://c2a4539d8f31bac8f96659f82111502060e9cdf28045928220ba2bf6f9400dfd
          exitCode: 137
          finishedAt: "2021-06-09T13:29:56Z"
          reason: Error
          startedAt: "2021-06-09T13:29:16Z"

I dumped the yaml for about 10-20 seconds (just repeatedly via cli), the output shows the pod status.Conditions never goes into ‘Ready: False’ until the container is terminated.

I don’t think the documentation is accurate. That phrasing was added relatively recently via: https://github.com/kubernetes/website/pull/22603

michaelgugino on Jun 9, 2021