kubernetes: Log and expose cgroup OOM events to the associated Pod resource
What happened:
Previously the cadvisor library had a regex parsing error, which resulted in it falling back to returning
a string character of /
when parsing system oom messages on kernels >= 5.0. This was fixed in google/cadvisor#2813
~Since cadvisor was bumped in #99875, system OOM event messages are no longer being emitted because of the following line, where its checking that the event.VictimContainerName == "/"
.
https://github.com/kubernetes/kubernetes/blob/e557f61784a90adf8dfe4a0bca875043e895cc8b/pkg/kubelet/oom/oom_watcher_linux.go#L75~
Since cadvisor was bumped in #99875, we can now retrieve the OOM’d pod id, creating a log entry and emit an event for the associated resource
What you expected to happen:
System OOM messages events to be emitted to the Node resource.
How to reproduce it (as minimally and precisely as possible):
Running a 1.21.x cluster, create a pod that OOM’s.
apiVersion: v1
kind: Pod
metadata:
name: memory-demo-2
namespace: default
spec:
containers:
- name: memory-demo-2-ctr
image: polinux/stress
resources:
requests:
memory: "50Mi"
limits:
memory: "100Mi"
command: ["stress"]
args: ["--vm", "1", "--vm-bytes", "250M", "--vm-hang", "1"]
Observe that there is no SystemOOM message for the events of the node where that pod is running.
Anything else we need to know?:
Environment:
- Kubernetes version (use
kubectl version
): v1.21.0-beta.1.382+1a983bb958ba66 - Cloud provider or hardware configuration:
- OS (e.g:
cat /etc/os-release
): - Kernel (e.g.
uname -a
):Linux ip-172-31-48-224 5.4.0-1038-aws #40-Ubuntu SMP Fri Feb 5 23:50:40 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
- Install tools:
- Network plugin and version (if this is a network-related bug):
- Others:
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Reactions: 6
- Comments: 20 (4 by maintainers)
Probably still needed/valid. Please remove stale/rotten label.
After this is implemented I’ll be able to associate a OOM with the pod where it happened, right? If so that would be fantastic! With the current OOM description:
I have no idea how to correlate a process ID the culprit pod. Normally I’d just look at memory usage or restart count, but brief memory spikes might not be logged and due to https://github.com/kubernetes/kubernetes/issues/50632 the pod might not be restarted either.
Other people have had the same difficulty: https://stackoverflow.com/questions/58749290/process-inside-pod-is-oomkilled-even-though-pod-limits-not-reached
The issue around silent pod-process OOM kills is a mess and comes biting in the ass over and over again, where a crucial pod-process that is not PID 1 dies and services go offline perfectly silently. This happens with Selenium, Spark Operator, any JFrog chart, Couchbase and so, just to name a few.
Thanks for the clarification. In that case I think it will make sense to update this issue & associated pr to be a feature request instead of a bug, which would cover some of the criteria in #69676. Specifically logs in kubelet, and emitting the event to the pod resource.