kubernetes: DeletionTimestamp set after grace period is respected does not include grace period time

Dear Kubernetes-Team:

I ran an experiment with an application that does not exit after Kubelet sends an inital SIGTERM signal. The specified grace-period is respected and the pod is deleted after the grace period finishes.

The deletionTimestamp on the inital delete event seems to include the grace-period, but the deletionTimestamp that is finally updated by the APIServer does not account for the grace-period that was respected. This DeletionTimestamp is also seen when the DeleteFunc is triggered on separate podInformers we have running on our clusters.

I have attached some test results detailing this below; This behavior seems like a bug. The final DeletionTimestamp that is updated by the APIServer should indeed include any grace-period that was respected. (Ideally it should be when the application exits, and not just at the time the user puts in the delete request)

My experiment is as follows:

  1. Create a busybox pod with a termination grace period (for example 5 mins)
  2. Call delete pod
  3. APIServer gets a delete event initiated by the user (k8s-admin) with a deletion timestamp by adding the 5 mins grace period
  4. After the grace period ends api server gets another delete event from kubelet (system:node:fargate-ip-19)
  5. This delete event from kubelet has grace period of 0 and the DeletionTimestamp set to when user deleted the pod (instead of that time + the respected grace period)

The podspec busybox.yaml that is applied:

apiVersion: v1
kind: Pod
metadata:
   name: busybox
   namespace: default
spec:
   containers:
   - image: busybox
     command:
       - sleep
       - "3600"
     imagePullPolicy: IfNotPresent
     name: busybox
   restartPolicy: Always
   terminationGracePeriodSeconds: 300

I executed: kubectl get pods -oyaml -w after performing a kubectl delete -f busybox.yaml which gives us:

apiVersion: v1
kind: Pod
metadata:
  annotations:
    kubernetes.io/psp: eks.privileged
  creationTimestamp: "2020-01-09T20:49:15Z"
  deletionGracePeriodSeconds: 300
  deletionTimestamp: "2020-01-09T20:56:25Z"
  labels:
    eks.amazonaws.com/fargate-profile: test-fp-pritrame
  name: busybox
  namespace: default
  resourceVersion: "8941315"
  selfLink: /api/v1/namespaces/default/pods/busybox
  uid: 7f91ad24-3321-11ea-80df-0622abc9f058
spec:
  containers:
  - command:
    - sleep
    - "3600"
    image: busybox
    imagePullPolicy: IfNotPresent
    name: busybox
    resources: {}
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: default-token-jpf24
      readOnly: true
  dnsPolicy: ClusterFirst
  enableServiceLinks: true
  nodeName: fargate-ip-10-0-45-109.us-west-2.compute.internal
  priority: 2000001000
  priorityClassName: system-node-critical
  restartPolicy: Always
  schedulerName: fargate-scheduler
  securityContext: {}
  serviceAccount: default
  serviceAccountName: default
  terminationGracePeriodSeconds: 300
  tolerations:
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 300
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 300
  volumes:
  - name: default-token-jpf24
    secret:
      defaultMode: 420
      secretName: default-token-jpf24
status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2020-01-09T20:49:55Z"
    status: "True"
    type: Initialized
  - lastProbeTime: null
    lastTransitionTime: "2020-01-09T20:50:00Z"
    status: "True"
    type: Ready
  - lastProbeTime: null
    lastTransitionTime: "2020-01-09T20:50:00Z"
    status: "True"
    type: ContainersReady
  - lastProbeTime: null
    lastTransitionTime: "2020-01-09T20:49:55Z"
    status: "True"
    type: PodScheduled
  containerStatuses:
  - containerID: containerd://3d4a21fd691b182a54d9a4bc07ee4a94af7f35c5a7a54e26c127fa9673b57f48
    image: docker.io/library/busybox:latest
    imageID: docker.io/library/busybox@sha256:6915be4043561d64e0ab0f8f098dc2ac48e077fe23f488ac24b665166898115a
    lastState: {}
    name: busybox
    ready: true
    restartCount: 0
    state:
      running:
        startedAt: "2020-01-09T20:49:59Z"
  hostIP: 10.0.45.109
  phase: Running
  podIP: 10.0.45.109
  qosClass: BestEffort
  startTime: "2020-01-09T20:49:55Z"
---
apiVersion: v1
kind: Pod
metadata:
  annotations:
    kubernetes.io/psp: eks.privileged
  creationTimestamp: "2020-01-09T20:49:15Z"
  deletionGracePeriodSeconds: 300
  deletionTimestamp: "2020-01-09T20:56:25Z"
  labels:
    eks.amazonaws.com/fargate-profile: test-fp-pritrame
  name: busybox
  namespace: default
  resourceVersion: "8941926"
  selfLink: /api/v1/namespaces/default/pods/busybox
  uid: 7f91ad24-3321-11ea-80df-0622abc9f058
spec:
  containers:
  - command:
    - sleep
    - "3600"
    image: busybox
    imagePullPolicy: IfNotPresent
    name: busybox
    resources: {}
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: default-token-jpf24
      readOnly: true
  dnsPolicy: ClusterFirst
  enableServiceLinks: true
  nodeName: fargate-ip-10-0-45-109.us-west-2.compute.internal
  priority: 2000001000
  priorityClassName: system-node-critical
  restartPolicy: Always
  schedulerName: fargate-scheduler
  securityContext: {}
  serviceAccount: default
  serviceAccountName: default
  terminationGracePeriodSeconds: 300
  tolerations:
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 300
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 300
  volumes:
  - name: default-token-jpf24
    secret:
      defaultMode: 420
      secretName: default-token-jpf24
status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2020-01-09T20:49:55Z"
    status: "True"
    type: Initialized
  - lastProbeTime: null
    lastTransitionTime: "2020-01-09T20:56:26Z"
    message: 'containers with unready status: [busybox]'
    reason: ContainersNotReady
    status: "False"
    type: Ready
  - lastProbeTime: null
    lastTransitionTime: "2020-01-09T20:56:26Z"
    message: 'containers with unready status: [busybox]'
    reason: ContainersNotReady
    status: "False"
    type: ContainersReady
  - lastProbeTime: null
    lastTransitionTime: "2020-01-09T20:49:55Z"
    status: "True"
    type: PodScheduled
  containerStatuses:
  - image: busybox
    imageID: ""
    lastState: {}
    name: busybox
    ready: false
    restartCount: 0
    state:
      waiting:
        reason: ContainerCreating
  hostIP: 10.0.45.109
  phase: Pending
  podIP: 10.0.45.109
  qosClass: BestEffort
  startTime: "2020-01-09T20:49:55Z"
---
apiVersion: v1
kind: Pod
metadata:
  annotations:
    kubernetes.io/psp: eks.privileged
  creationTimestamp: "2020-01-09T20:49:15Z"
  deletionGracePeriodSeconds: 0
  deletionTimestamp: "2020-01-09T20:51:25Z"
  labels:
    eks.amazonaws.com/fargate-profile: test-fp-pritrame
  name: busybox
  namespace: default
  resourceVersion: "8941941"
  selfLink: /api/v1/namespaces/default/pods/busybox
  uid: 7f91ad24-3321-11ea-80df-0622abc9f058
spec:
  containers:
  - command:
    - sleep
    - "3600"
    image: busybox
    imagePullPolicy: IfNotPresent
    name: busybox
    resources: {}
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: default-token-jpf24
      readOnly: true
  dnsPolicy: ClusterFirst
  enableServiceLinks: true
  nodeName: fargate-ip-10-0-45-109.us-west-2.compute.internal
  priority: 2000001000
  priorityClassName: system-node-critical
  restartPolicy: Always
  schedulerName: fargate-scheduler
  securityContext: {}
  serviceAccount: default
  serviceAccountName: default
  terminationGracePeriodSeconds: 300
  tolerations:
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 300
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 300
  volumes:
  - name: default-token-jpf24
    secret:
      defaultMode: 420
      secretName: default-token-jpf24
status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2020-01-09T20:49:55Z"
    status: "True"
    type: Initialized
  - lastProbeTime: null
    lastTransitionTime: "2020-01-09T20:56:26Z"
    message: 'containers with unready status: [busybox]'
    reason: ContainersNotReady
    status: "False"
    type: Ready
  - lastProbeTime: null
    lastTransitionTime: "2020-01-09T20:56:26Z"
    message: 'containers with unready status: [busybox]'
    reason: ContainersNotReady
    status: "False"
    type: ContainersReady
  - lastProbeTime: null
    lastTransitionTime: "2020-01-09T20:49:55Z"
    status: "True"
    type: PodScheduled
  containerStatuses:
  - image: busybox
    imageID: ""
    lastState: {}
    name: busybox
    ready: false
    restartCount: 0
    state:
      waiting:
        reason: ContainerCreating
  hostIP: 10.0.45.109
  phase: Pending
  podIP: 10.0.45.109
  qosClass: BestEffort
  startTime: "2020-01-09T20:49:55Z"
---
apiVersion: v1
kind: Pod
metadata:
  annotations:
    kubernetes.io/psp: eks.privileged
  creationTimestamp: "2020-01-09T20:49:15Z"
  deletionGracePeriodSeconds: 0
  deletionTimestamp: "2020-01-09T20:51:25Z"
  labels:
    eks.amazonaws.com/fargate-profile: test-fp-pritrame
  name: busybox
  namespace: default
  resourceVersion: "8941942"
  selfLink: /api/v1/namespaces/default/pods/busybox
  uid: 7f91ad24-3321-11ea-80df-0622abc9f058
spec:
  containers:
  - command:
    - sleep
    - "3600"
    image: busybox
    imagePullPolicy: IfNotPresent
    name: busybox
    resources: {}
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: default-token-jpf24
      readOnly: true
  dnsPolicy: ClusterFirst
  enableServiceLinks: true
  nodeName: fargate-ip-10-0-45-109.us-west-2.compute.internal
  priority: 2000001000
  priorityClassName: system-node-critical
  restartPolicy: Always
  schedulerName: fargate-scheduler
  securityContext: {}
  serviceAccount: default
  serviceAccountName: default
  terminationGracePeriodSeconds: 300
  tolerations:
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 300
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 300
  volumes:
  - name: default-token-jpf24
    secret:
      defaultMode: 420
      secretName: default-token-jpf24
status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2020-01-09T20:49:55Z"
    status: "True"
    type: Initialized
  - lastProbeTime: null
    lastTransitionTime: "2020-01-09T20:56:26Z"
    message: 'containers with unready status: [busybox]'
    reason: ContainersNotReady
    status: "False"
    type: Ready
  - lastProbeTime: null
    lastTransitionTime: "2020-01-09T20:56:26Z"
    message: 'containers with unready status: [busybox]'
    reason: ContainersNotReady
    status: "False"
    type: ContainersReady
  - lastProbeTime: null
    lastTransitionTime: "2020-01-09T20:49:55Z"
    status: "True"
    type: PodScheduled
  containerStatuses:
  - image: busybox
    imageID: ""
    lastState: {}
    name: busybox
    ready: false
    restartCount: 0
    state:
      waiting:
        reason: ContainerCreating
  hostIP: 10.0.45.109
  phase: Pending
  podIP: 10.0.45.109
  qosClass: BestEffort
  startTime: "2020-01-09T20:49:55Z"

We see above that the final updated DeletionTimestamp is set to: 2020-01-09T20:51:25Z which is the time I put my initial delete request in, and which does not account for the grace-period, even though the grace period was respected.

We see the initial DeletionTimestamp set to: 2020-01-09T20:56:25Z with a 300 second terminationGracePeriod set.

Here is a log line from on of our pod Informers with the timestamp showing when the DeleteFunc was triggered for this pod, which shows that the grace period of 5 minutes was respected: I0109 20:56:34.083187 1 eventhandlers.go:147] Received delete event for pod: default/busybox

About this issue

  • Original URL
  • State: open
  • Created 4 years ago
  • Comments: 29 (17 by maintainers)

Most upvoted comments

Just want to share my two cents.

There’re two timestamps that matter for a pod deletion request.

  1. ScheduledDeletionTimestamp This is when user requests the pod to be deleted by.

  2. ActualDeletionTimestamp This is when kube-apiserver successfully deleted the object from storage.

ScheduledDeletionTimestamp can only move backwards (with a shorten grace period), not forwards. ActualDeletionTimestamp is independent from ScheduledDeletionTimestamp, and can occur either before or after (say kubelet is down for a while or pod is stuck in terminating state).

Today, we seem to mix them up under a single field deletionTimestamp, which adds complexity to the implementation. The current approach also prevents user from differentiating between these two timestamps since there’s only one field in the API.