kubernetes: DeletionTimestamp set after grace period is respected does not include grace period time
Dear Kubernetes-Team:
I ran an experiment with an application that does not exit after Kubelet sends an inital SIGTERM signal. The specified grace-period is respected and the pod is deleted after the grace period finishes.
The deletionTimestamp on the inital delete event seems to include the grace-period, but the deletionTimestamp that is finally updated by the APIServer does not account for the grace-period that was respected. This DeletionTimestamp is also seen when the DeleteFunc is triggered on separate podInformers we have running on our clusters.
I have attached some test results detailing this below; This behavior seems like a bug. The final DeletionTimestamp that is updated by the APIServer should indeed include any grace-period that was respected. (Ideally it should be when the application exits, and not just at the time the user puts in the delete request)
My experiment is as follows:
- Create a busybox pod with a termination grace period (for example 5 mins)
- Call delete pod
- APIServer gets a delete event initiated by the user (k8s-admin) with a deletion timestamp by adding the 5 mins grace period
- After the grace period ends api server gets another delete event from kubelet (system:node:fargate-ip-19)
- This delete event from kubelet has grace period of 0 and the DeletionTimestamp set to when user deleted the pod (instead of that time + the respected grace period)
The podspec busybox.yaml
that is applied:
apiVersion: v1
kind: Pod
metadata:
name: busybox
namespace: default
spec:
containers:
- image: busybox
command:
- sleep
- "3600"
imagePullPolicy: IfNotPresent
name: busybox
restartPolicy: Always
terminationGracePeriodSeconds: 300
I executed:
kubectl get pods -oyaml -w
after performing a kubectl delete -f busybox.yaml
which gives us:
apiVersion: v1
kind: Pod
metadata:
annotations:
kubernetes.io/psp: eks.privileged
creationTimestamp: "2020-01-09T20:49:15Z"
deletionGracePeriodSeconds: 300
deletionTimestamp: "2020-01-09T20:56:25Z"
labels:
eks.amazonaws.com/fargate-profile: test-fp-pritrame
name: busybox
namespace: default
resourceVersion: "8941315"
selfLink: /api/v1/namespaces/default/pods/busybox
uid: 7f91ad24-3321-11ea-80df-0622abc9f058
spec:
containers:
- command:
- sleep
- "3600"
image: busybox
imagePullPolicy: IfNotPresent
name: busybox
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: default-token-jpf24
readOnly: true
dnsPolicy: ClusterFirst
enableServiceLinks: true
nodeName: fargate-ip-10-0-45-109.us-west-2.compute.internal
priority: 2000001000
priorityClassName: system-node-critical
restartPolicy: Always
schedulerName: fargate-scheduler
securityContext: {}
serviceAccount: default
serviceAccountName: default
terminationGracePeriodSeconds: 300
tolerations:
- effect: NoExecute
key: node.kubernetes.io/not-ready
operator: Exists
tolerationSeconds: 300
- effect: NoExecute
key: node.kubernetes.io/unreachable
operator: Exists
tolerationSeconds: 300
volumes:
- name: default-token-jpf24
secret:
defaultMode: 420
secretName: default-token-jpf24
status:
conditions:
- lastProbeTime: null
lastTransitionTime: "2020-01-09T20:49:55Z"
status: "True"
type: Initialized
- lastProbeTime: null
lastTransitionTime: "2020-01-09T20:50:00Z"
status: "True"
type: Ready
- lastProbeTime: null
lastTransitionTime: "2020-01-09T20:50:00Z"
status: "True"
type: ContainersReady
- lastProbeTime: null
lastTransitionTime: "2020-01-09T20:49:55Z"
status: "True"
type: PodScheduled
containerStatuses:
- containerID: containerd://3d4a21fd691b182a54d9a4bc07ee4a94af7f35c5a7a54e26c127fa9673b57f48
image: docker.io/library/busybox:latest
imageID: docker.io/library/busybox@sha256:6915be4043561d64e0ab0f8f098dc2ac48e077fe23f488ac24b665166898115a
lastState: {}
name: busybox
ready: true
restartCount: 0
state:
running:
startedAt: "2020-01-09T20:49:59Z"
hostIP: 10.0.45.109
phase: Running
podIP: 10.0.45.109
qosClass: BestEffort
startTime: "2020-01-09T20:49:55Z"
---
apiVersion: v1
kind: Pod
metadata:
annotations:
kubernetes.io/psp: eks.privileged
creationTimestamp: "2020-01-09T20:49:15Z"
deletionGracePeriodSeconds: 300
deletionTimestamp: "2020-01-09T20:56:25Z"
labels:
eks.amazonaws.com/fargate-profile: test-fp-pritrame
name: busybox
namespace: default
resourceVersion: "8941926"
selfLink: /api/v1/namespaces/default/pods/busybox
uid: 7f91ad24-3321-11ea-80df-0622abc9f058
spec:
containers:
- command:
- sleep
- "3600"
image: busybox
imagePullPolicy: IfNotPresent
name: busybox
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: default-token-jpf24
readOnly: true
dnsPolicy: ClusterFirst
enableServiceLinks: true
nodeName: fargate-ip-10-0-45-109.us-west-2.compute.internal
priority: 2000001000
priorityClassName: system-node-critical
restartPolicy: Always
schedulerName: fargate-scheduler
securityContext: {}
serviceAccount: default
serviceAccountName: default
terminationGracePeriodSeconds: 300
tolerations:
- effect: NoExecute
key: node.kubernetes.io/not-ready
operator: Exists
tolerationSeconds: 300
- effect: NoExecute
key: node.kubernetes.io/unreachable
operator: Exists
tolerationSeconds: 300
volumes:
- name: default-token-jpf24
secret:
defaultMode: 420
secretName: default-token-jpf24
status:
conditions:
- lastProbeTime: null
lastTransitionTime: "2020-01-09T20:49:55Z"
status: "True"
type: Initialized
- lastProbeTime: null
lastTransitionTime: "2020-01-09T20:56:26Z"
message: 'containers with unready status: [busybox]'
reason: ContainersNotReady
status: "False"
type: Ready
- lastProbeTime: null
lastTransitionTime: "2020-01-09T20:56:26Z"
message: 'containers with unready status: [busybox]'
reason: ContainersNotReady
status: "False"
type: ContainersReady
- lastProbeTime: null
lastTransitionTime: "2020-01-09T20:49:55Z"
status: "True"
type: PodScheduled
containerStatuses:
- image: busybox
imageID: ""
lastState: {}
name: busybox
ready: false
restartCount: 0
state:
waiting:
reason: ContainerCreating
hostIP: 10.0.45.109
phase: Pending
podIP: 10.0.45.109
qosClass: BestEffort
startTime: "2020-01-09T20:49:55Z"
---
apiVersion: v1
kind: Pod
metadata:
annotations:
kubernetes.io/psp: eks.privileged
creationTimestamp: "2020-01-09T20:49:15Z"
deletionGracePeriodSeconds: 0
deletionTimestamp: "2020-01-09T20:51:25Z"
labels:
eks.amazonaws.com/fargate-profile: test-fp-pritrame
name: busybox
namespace: default
resourceVersion: "8941941"
selfLink: /api/v1/namespaces/default/pods/busybox
uid: 7f91ad24-3321-11ea-80df-0622abc9f058
spec:
containers:
- command:
- sleep
- "3600"
image: busybox
imagePullPolicy: IfNotPresent
name: busybox
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: default-token-jpf24
readOnly: true
dnsPolicy: ClusterFirst
enableServiceLinks: true
nodeName: fargate-ip-10-0-45-109.us-west-2.compute.internal
priority: 2000001000
priorityClassName: system-node-critical
restartPolicy: Always
schedulerName: fargate-scheduler
securityContext: {}
serviceAccount: default
serviceAccountName: default
terminationGracePeriodSeconds: 300
tolerations:
- effect: NoExecute
key: node.kubernetes.io/not-ready
operator: Exists
tolerationSeconds: 300
- effect: NoExecute
key: node.kubernetes.io/unreachable
operator: Exists
tolerationSeconds: 300
volumes:
- name: default-token-jpf24
secret:
defaultMode: 420
secretName: default-token-jpf24
status:
conditions:
- lastProbeTime: null
lastTransitionTime: "2020-01-09T20:49:55Z"
status: "True"
type: Initialized
- lastProbeTime: null
lastTransitionTime: "2020-01-09T20:56:26Z"
message: 'containers with unready status: [busybox]'
reason: ContainersNotReady
status: "False"
type: Ready
- lastProbeTime: null
lastTransitionTime: "2020-01-09T20:56:26Z"
message: 'containers with unready status: [busybox]'
reason: ContainersNotReady
status: "False"
type: ContainersReady
- lastProbeTime: null
lastTransitionTime: "2020-01-09T20:49:55Z"
status: "True"
type: PodScheduled
containerStatuses:
- image: busybox
imageID: ""
lastState: {}
name: busybox
ready: false
restartCount: 0
state:
waiting:
reason: ContainerCreating
hostIP: 10.0.45.109
phase: Pending
podIP: 10.0.45.109
qosClass: BestEffort
startTime: "2020-01-09T20:49:55Z"
---
apiVersion: v1
kind: Pod
metadata:
annotations:
kubernetes.io/psp: eks.privileged
creationTimestamp: "2020-01-09T20:49:15Z"
deletionGracePeriodSeconds: 0
deletionTimestamp: "2020-01-09T20:51:25Z"
labels:
eks.amazonaws.com/fargate-profile: test-fp-pritrame
name: busybox
namespace: default
resourceVersion: "8941942"
selfLink: /api/v1/namespaces/default/pods/busybox
uid: 7f91ad24-3321-11ea-80df-0622abc9f058
spec:
containers:
- command:
- sleep
- "3600"
image: busybox
imagePullPolicy: IfNotPresent
name: busybox
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: default-token-jpf24
readOnly: true
dnsPolicy: ClusterFirst
enableServiceLinks: true
nodeName: fargate-ip-10-0-45-109.us-west-2.compute.internal
priority: 2000001000
priorityClassName: system-node-critical
restartPolicy: Always
schedulerName: fargate-scheduler
securityContext: {}
serviceAccount: default
serviceAccountName: default
terminationGracePeriodSeconds: 300
tolerations:
- effect: NoExecute
key: node.kubernetes.io/not-ready
operator: Exists
tolerationSeconds: 300
- effect: NoExecute
key: node.kubernetes.io/unreachable
operator: Exists
tolerationSeconds: 300
volumes:
- name: default-token-jpf24
secret:
defaultMode: 420
secretName: default-token-jpf24
status:
conditions:
- lastProbeTime: null
lastTransitionTime: "2020-01-09T20:49:55Z"
status: "True"
type: Initialized
- lastProbeTime: null
lastTransitionTime: "2020-01-09T20:56:26Z"
message: 'containers with unready status: [busybox]'
reason: ContainersNotReady
status: "False"
type: Ready
- lastProbeTime: null
lastTransitionTime: "2020-01-09T20:56:26Z"
message: 'containers with unready status: [busybox]'
reason: ContainersNotReady
status: "False"
type: ContainersReady
- lastProbeTime: null
lastTransitionTime: "2020-01-09T20:49:55Z"
status: "True"
type: PodScheduled
containerStatuses:
- image: busybox
imageID: ""
lastState: {}
name: busybox
ready: false
restartCount: 0
state:
waiting:
reason: ContainerCreating
hostIP: 10.0.45.109
phase: Pending
podIP: 10.0.45.109
qosClass: BestEffort
startTime: "2020-01-09T20:49:55Z"
We see above that the final updated DeletionTimestamp is set to: 2020-01-09T20:51:25Z
which is the time I put my initial delete request in, and which does not account for the grace-period, even though the grace period was respected.
We see the initial DeletionTimestamp set to: 2020-01-09T20:56:25Z
with a 300 second terminationGracePeriod
set.
Here is a log line from on of our pod Informers with the timestamp showing when the DeleteFunc was triggered for this pod, which shows that the grace period of 5 minutes was respected:
I0109 20:56:34.083187 1 eventhandlers.go:147] Received delete event for pod: default/busybox
About this issue
- Original URL
- State: open
- Created 4 years ago
- Comments: 29 (17 by maintainers)
Just want to share my two cents.
There’re two timestamps that matter for a pod deletion request.
ScheduledDeletionTimestamp This is when user requests the pod to be deleted by.
ActualDeletionTimestamp This is when kube-apiserver successfully deleted the object from storage.
ScheduledDeletionTimestamp can only move backwards (with a shorten grace period), not forwards. ActualDeletionTimestamp is independent from ScheduledDeletionTimestamp, and can occur either before or after (say kubelet is down for a while or pod is stuck in terminating state).
Today, we seem to mix them up under a single field
deletionTimestamp
, which adds complexity to the implementation. The current approach also prevents user from differentiating between these two timestamps since there’s only one field in the API.