kubernetes: Some pods stuck in Terminating state and can only be removed if deleting with propagationPolicy=Background

/kind bug

What happened: One (worker) node went OOM and pods where evicted. Before the node was “saved” the kernel OOM-killed at least some of the running containers.

The cluster came back to a save state, but a few pods got stuck in the state terminating. The issue / bug is, that there is no way to delete those pods via kubectl. I was able to remove the deployment and replica-set, but the pods still waited there “Terminating”.

Only after I send a DELETE request via API using '{"kind":"DeleteOptions","apiVersion":"v1","propagationPolicy":"Background"}

the Pods where deleted.

Also the kubelet on that node where the Pod / container used to run was actively sending DELETE requests to the API with no luck. There are also

I strongly believe there is some “inconsistency” in the data that

What you expected to happen:

The pods to be deleted.

How to reproduce it (as minimally and precisely as possible):

I shall attach a Pod object which I successfully deleted with propagationPolicy=Background and I do have more Pods in state Terminating that I can get info off if need be.

Anything else we need to know?:

The cluster was initially set up with Kubernetes 1.8.4, updated along a few minor versions of 1.8.x and then upgraded to 1.9.7 recently.

Environment:

  • Kubernetes version (use kubectl version): 1.9.7
  • Cloud provider or hardware configuration: AWS m5.2xlarge instances, Container-Linux,
  • OS (e.g. from /etc/os-release): Container Linux by CoreOS stable (1745.7.0)
  • Kernel (e.g. uname -a): Linux ip-10-8-10-253.eu-central-1.compute.internal 4.14.48-coreos-r2 #1 SMP Thu Jun 14 08:23:03 UTC 2018 x86_64 Intel® Xeon® Platinum 8175M CPU @ 2.50GHz GenuineIntel GNU/Linux

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Reactions: 3
  • Comments: 35 (16 by maintainers)

Most upvoted comments

We have this occurring occasionally and are able to delete the stuck pods with:

kubectl delete pod iamastuckpod-655c7947c9-pgzj2 -n namespace --force --grace-period=0

@frittentheke we are on v1.9.8, which does not include #62673. What we found out this morning is that this bug can be seen only when pod includes a sidecar from Istio 1.0.2 or 1.0.3. Without it, the pods are being removed without any problems