kubernetes: DeletionTimestamp set after grace period is respected does not include grace period time

Dear Kubernetes-Team:

I ran an experiment with an application that does not exit after Kubelet sends an inital SIGTERM signal. The specified grace-period is respected and the pod is deleted after the grace period finishes.

The deletionTimestamp on the inital delete event seems to include the grace-period, but the deletionTimestamp that is finally updated by the APIServer does not account for the grace-period that was respected. This DeletionTimestamp is also seen when the DeleteFunc is triggered on separate podInformers we have running on our clusters.

I have attached some test results detailing this below; This behavior seems like a bug. The final DeletionTimestamp that is updated by the APIServer should indeed include any grace-period that was respected. (Ideally it should be when the application exits, and not just at the time the user puts in the delete request)

My experiment is as follows:

Create a busybox pod with a termination grace period (for example 5 mins)
Call delete pod
APIServer gets a delete event initiated by the user (k8s-admin) with a deletion timestamp by adding the 5 mins grace period
After the grace period ends api server gets another delete event from kubelet (system:node:fargate-ip-19)
This delete event from kubelet has grace period of 0 and the DeletionTimestamp set to when user deleted the pod (instead of that time + the respected grace period)

The podspec busybox.yaml that is applied:

apiVersion: v1
kind: Pod
metadata:
   name: busybox
   namespace: default
spec:
   containers:
   - image: busybox
     command:
       - sleep
       - "3600"
     imagePullPolicy: IfNotPresent
     name: busybox
   restartPolicy: Always
   terminationGracePeriodSeconds: 300

I executed: kubectl get pods -oyaml -w after performing a kubectl delete -f busybox.yaml which gives us:

apiVersion: v1
kind: Pod
metadata:
  annotations:
    kubernetes.io/psp: eks.privileged
  creationTimestamp: "2020-01-09T20:49:15Z"
  deletionGracePeriodSeconds: 300
  deletionTimestamp: "2020-01-09T20:56:25Z"
  labels:
    eks.amazonaws.com/fargate-profile: test-fp-pritrame
  name: busybox
  namespace: default
  resourceVersion: "8941315"
  selfLink: /api/v1/namespaces/default/pods/busybox
  uid: 7f91ad24-3321-11ea-80df-0622abc9f058
spec:
  containers:
  - command:
    - sleep
    - "3600"
    image: busybox
    imagePullPolicy: IfNotPresent
    name: busybox
    resources: {}
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: default-token-jpf24
      readOnly: true
  dnsPolicy: ClusterFirst
  enableServiceLinks: true
  nodeName: fargate-ip-10-0-45-109.us-west-2.compute.internal
  priority: 2000001000
  priorityClassName: system-node-critical
  restartPolicy: Always
  schedulerName: fargate-scheduler
  securityContext: {}
  serviceAccount: default
  serviceAccountName: default
  terminationGracePeriodSeconds: 300
  tolerations:
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 300
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 300
  volumes:
  - name: default-token-jpf24
    secret:
      defaultMode: 420
      secretName: default-token-jpf24
status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2020-01-09T20:49:55Z"
    status: "True"
    type: Initialized
  - lastProbeTime: null
    lastTransitionTime: "2020-01-09T20:50:00Z"
    status: "True"
    type: Ready
  - lastProbeTime: null
    lastTransitionTime: "2020-01-09T20:50:00Z"
    status: "True"
    type: ContainersReady
  - lastProbeTime: null
    lastTransitionTime: "2020-01-09T20:49:55Z"
    status: "True"
    type: PodScheduled
  containerStatuses:
  - containerID: containerd://3d4a21fd691b182a54d9a4bc07ee4a94af7f35c5a7a54e26c127fa9673b57f48
    image: docker.io/library/busybox:latest
    imageID: docker.io/library/busybox@sha256:6915be4043561d64e0ab0f8f098dc2ac48e077fe23f488ac24b665166898115a
    lastState: {}
    name: busybox
    ready: true
    restartCount: 0
    state:
      running:
        startedAt: "2020-01-09T20:49:59Z"
  hostIP: 10.0.45.109
  phase: Running
  podIP: 10.0.45.109
  qosClass: BestEffort
  startTime: "2020-01-09T20:49:55Z"
---
apiVersion: v1
kind: Pod
metadata:
  annotations:
    kubernetes.io/psp: eks.privileged
  creationTimestamp: "2020-01-09T20:49:15Z"
  deletionGracePeriodSeconds: 300
  deletionTimestamp: "2020-01-09T20:56:25Z"
  labels:
    eks.amazonaws.com/fargate-profile: test-fp-pritrame
  name: busybox
  namespace: default
  resourceVersion: "8941926"
  selfLink: /api/v1/namespaces/default/pods/busybox
  uid: 7f91ad24-3321-11ea-80df-0622abc9f058
spec:
  containers:
  - command:
    - sleep
    - "3600"
    image: busybox
    imagePullPolicy: IfNotPresent
    name: busybox
    resources: {}
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: default-token-jpf24
      readOnly: true
  dnsPolicy: ClusterFirst
  enableServiceLinks: true
  nodeName: fargate-ip-10-0-45-109.us-west-2.compute.internal
  priority: 2000001000
  priorityClassName: system-node-critical
  restartPolicy: Always
  schedulerName: fargate-scheduler
  securityContext: {}
  serviceAccount: default
  serviceAccountName: default
  terminationGracePeriodSeconds: 300
  tolerations:
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 300
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 300
  volumes:
  - name: default-token-jpf24
    secret:
      defaultMode: 420
      secretName: default-token-jpf24
status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2020-01-09T20:49:55Z"
    status: "True"
    type: Initialized
  - lastProbeTime: null
    lastTransitionTime: "2020-01-09T20:56:26Z"
    message: 'containers with unready status: [busybox]'
    reason: ContainersNotReady
    status: "False"
    type: Ready
  - lastProbeTime: null
    lastTransitionTime: "2020-01-09T20:56:26Z"
    message: 'containers with unready status: [busybox]'
    reason: ContainersNotReady
    status: "False"
    type: ContainersReady
  - lastProbeTime: null
    lastTransitionTime: "2020-01-09T20:49:55Z"
    status: "True"
    type: PodScheduled
  containerStatuses:
  - image: busybox
    imageID: ""
    lastState: {}
    name: busybox
    ready: false
    restartCount: 0
    state:
      waiting:
        reason: ContainerCreating
  hostIP: 10.0.45.109
  phase: Pending
  podIP: 10.0.45.109
  qosClass: BestEffort
  startTime: "2020-01-09T20:49:55Z"
---
apiVersion: v1
kind: Pod
metadata:
  annotations:
    kubernetes.io/psp: eks.privileged
  creationTimestamp: "2020-01-09T20:49:15Z"
  deletionGracePeriodSeconds: 0
  deletionTimestamp: "2020-01-09T20:51:25Z"
  labels:
    eks.amazonaws.com/fargate-profile: test-fp-pritrame
  name: busybox
  namespace: default
  resourceVersion: "8941941"
  selfLink: /api/v1/namespaces/default/pods/busybox
  uid: 7f91ad24-3321-11ea-80df-0622abc9f058
spec:
  containers:
  - command:
    - sleep
    - "3600"
    image: busybox
    imagePullPolicy: IfNotPresent
    name: busybox
    resources: {}
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: default-token-jpf24
      readOnly: true
  dnsPolicy: ClusterFirst
  enableServiceLinks: true
  nodeName: fargate-ip-10-0-45-109.us-west-2.compute.internal
  priority: 2000001000
  priorityClassName: system-node-critical
  restartPolicy: Always
  schedulerName: fargate-scheduler
  securityContext: {}
  serviceAccount: default
  serviceAccountName: default
  terminationGracePeriodSeconds: 300
  tolerations:
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 300
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 300
  volumes:
  - name: default-token-jpf24
    secret:
      defaultMode: 420
      secretName: default-token-jpf24
status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2020-01-09T20:49:55Z"
    status: "True"
    type: Initialized
  - lastProbeTime: null
    lastTransitionTime: "2020-01-09T20:56:26Z"
    message: 'containers with unready status: [busybox]'
    reason: ContainersNotReady
    status: "False"
    type: Ready
  - lastProbeTime: null
    lastTransitionTime: "2020-01-09T20:56:26Z"
    message: 'containers with unready status: [busybox]'
    reason: ContainersNotReady
    status: "False"
    type: ContainersReady
  - lastProbeTime: null
    lastTransitionTime: "2020-01-09T20:49:55Z"
    status: "True"
    type: PodScheduled
  containerStatuses:
  - image: busybox
    imageID: ""
    lastState: {}
    name: busybox
    ready: false
    restartCount: 0
    state:
      waiting:
        reason: ContainerCreating
  hostIP: 10.0.45.109
  phase: Pending
  podIP: 10.0.45.109
  qosClass: BestEffort
  startTime: "2020-01-09T20:49:55Z"
---
apiVersion: v1
kind: Pod
metadata:
  annotations:
    kubernetes.io/psp: eks.privileged
  creationTimestamp: "2020-01-09T20:49:15Z"
  deletionGracePeriodSeconds: 0
  deletionTimestamp: "2020-01-09T20:51:25Z"
  labels:
    eks.amazonaws.com/fargate-profile: test-fp-pritrame
  name: busybox
  namespace: default
  resourceVersion: "8941942"
  selfLink: /api/v1/namespaces/default/pods/busybox
  uid: 7f91ad24-3321-11ea-80df-0622abc9f058
spec:
  containers:
  - command:
    - sleep
    - "3600"
    image: busybox
    imagePullPolicy: IfNotPresent
    name: busybox
    resources: {}
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: default-token-jpf24
      readOnly: true
  dnsPolicy: ClusterFirst
  enableServiceLinks: true
  nodeName: fargate-ip-10-0-45-109.us-west-2.compute.internal
  priority: 2000001000
  priorityClassName: system-node-critical
  restartPolicy: Always
  schedulerName: fargate-scheduler
  securityContext: {}
  serviceAccount: default
  serviceAccountName: default
  terminationGracePeriodSeconds: 300
  tolerations:
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 300
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 300
  volumes:
  - name: default-token-jpf24
    secret:
      defaultMode: 420
      secretName: default-token-jpf24
status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2020-01-09T20:49:55Z"
    status: "True"
    type: Initialized
  - lastProbeTime: null
    lastTransitionTime: "2020-01-09T20:56:26Z"
    message: 'containers with unready status: [busybox]'
    reason: ContainersNotReady
    status: "False"
    type: Ready
  - lastProbeTime: null
    lastTransitionTime: "2020-01-09T20:56:26Z"
    message: 'containers with unready status: [busybox]'
    reason: ContainersNotReady
    status: "False"
    type: ContainersReady
  - lastProbeTime: null
    lastTransitionTime: "2020-01-09T20:49:55Z"
    status: "True"
    type: PodScheduled
  containerStatuses:
  - image: busybox
    imageID: ""
    lastState: {}
    name: busybox
    ready: false
    restartCount: 0
    state:
      waiting:
        reason: ContainerCreating
  hostIP: 10.0.45.109
  phase: Pending
  podIP: 10.0.45.109
  qosClass: BestEffort
  startTime: "2020-01-09T20:49:55Z"

We see above that the final updated DeletionTimestamp is set to: 2020-01-09T20:51:25Z which is the time I put my initial delete request in, and which does not account for the grace-period, even though the grace period was respected.

We see the initial DeletionTimestamp set to: 2020-01-09T20:56:25Z with a 300 second terminationGracePeriod set.

Here is a log line from on of our pod Informers with the timestamp showing when the DeleteFunc was triggered for this pod, which shows that the grace period of 5 minutes was respected: I0109 20:56:34.083187 1 eventhandlers.go:147] Received delete event for pod: default/busybox

About this issue

Original URL
State: open
Created 4 years ago
Comments: 29 (17 by maintainers)

Most upvoted comments

Just want to share my two cents.

There’re two timestamps that matter for a pod deletion request.

ScheduledDeletionTimestamp This is when user requests the pod to be deleted by.
ActualDeletionTimestamp This is when kube-apiserver successfully deleted the object from storage.

ScheduledDeletionTimestamp can only move backwards (with a shorten grace period), not forwards. ActualDeletionTimestamp is independent from ScheduledDeletionTimestamp, and can occur either before or after (say kubelet is down for a while or pod is stuck in terminating state).

Today, we seem to mix them up under a single field deletionTimestamp, which adds complexity to the implementation. The current approach also prevents user from differentiating between these two timestamps since there’s only one field in the API.

jqmichael on Jan 11, 2020