kubernetes: Some pods stuck in Terminating state and can only be removed if deleting with propagationPolicy=Background
/kind bug
What happened: One (worker) node went OOM and pods where evicted. Before the node was “saved” the kernel OOM-killed at least some of the running containers.
The cluster came back to a save state, but a few pods got stuck in the state terminating. The issue / bug is, that there is no way to delete those pods via kubectl. I was able to remove the deployment and replica-set, but the pods still waited there “Terminating”.
Only after I send a DELETE request via API using
'{"kind":"DeleteOptions","apiVersion":"v1","propagationPolicy":"Background"}
the Pods where deleted.
Also the kubelet on that node where the Pod / container used to run was actively sending DELETE requests to the API with no luck. There are also
I strongly believe there is some “inconsistency” in the data that
What you expected to happen:
The pods to be deleted.
How to reproduce it (as minimally and precisely as possible):
I shall attach a Pod object which I successfully deleted with propagationPolicy=Background and I do have more Pods in state Terminating that I can get info off if need be.
Anything else we need to know?:
The cluster was initially set up with Kubernetes 1.8.4, updated along a few minor versions of 1.8.x and then upgraded to 1.9.7 recently.
Environment:
- Kubernetes version (use
kubectl version
): 1.9.7 - Cloud provider or hardware configuration: AWS m5.2xlarge instances, Container-Linux,
- OS (e.g. from /etc/os-release): Container Linux by CoreOS stable (1745.7.0)
- Kernel (e.g.
uname -a
): Linux ip-10-8-10-253.eu-central-1.compute.internal 4.14.48-coreos-r2 #1 SMP Thu Jun 14 08:23:03 UTC 2018 x86_64 Intel® Xeon® Platinum 8175M CPU @ 2.50GHz GenuineIntel GNU/Linux
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Reactions: 3
- Comments: 35 (16 by maintainers)
We have this occurring occasionally and are able to delete the stuck pods with:
kubectl delete pod iamastuckpod-655c7947c9-pgzj2 -n namespace --force --grace-period=0
@frittentheke we are on
v1.9.8
, which does not include #62673. What we found out this morning is that this bug can be seen only when pod includes a sidecar from Istio 1.0.2 or 1.0.3. Without it, the pods are being removed without any problems