kubernetes: Log something about OOMKilled containers

Is this a BUG REPORT or FEATURE REQUEST?:

/kind feature

What happened:

Container gets killed because it tries to use more memory than allowed.

What you expected to happen:

Have an OOMKilled event tied to the pod and logs about this

/sig node

About this issue

Original URL
State: open
Created 6 years ago
Reactions: 106
Comments: 72 (25 by maintainers)

Most upvoted comments

This has been discussed in #sig-instrumentation on Slack and was brought up on the sig-node call yesterday to determine a path forward.

There are two requests:

Have an OOMKilled event tied to the Pod (as noted by @sylr)
Have a count of termination reason by Pod in the Kubelet (or cAdvisor?), exposed to Prometheus as a monotonically increasing counter

To summarize what’s currently available in kube-state-metrics:

kube_pod_container_status_terminated_reason This is a (binary) gauge which has a value of 1 for the current reason, and 0 for all other reasons. As soon as the Pod restarts, all reasons go to 0.
kube_pod_container_status_last_terminated_reason Same as above for the prior reason, so it’s available after the Pod restarts.
kube_pod_container_status_restarts_total A count of the restarts, with no detail on the reason.

The issues are:

There is no way to get a count of the reasons over time (for alerting and debugging).
Some termination reasons will never be recorded by Prometheus when the reason changes before the next Prometheus scrape.

For example, given a Pod that is sometimes being OOMKilled, and sometimes crashing, it’s desired to be able to view the historical termination reasons over time.

As a note: it was discussed and it appears the design of kube-state-metrics prevents aggregating the reason gauge into counters, and it’s preferred if this happens at the source.

Implementing both of the above requests will significantly improve the ability of cluster-users and monitoring vendors to debug when Pods are failing.

Can @kubernetes/sig-node-feature-requests provide some guidance on the next steps here?

CC: @dchen1107

+34

BrianChristie on Nov 28, 2018

This query combines container restart and termination reason:

sum by (pod, container, reason) (kube_pod_container_status_last_terminated_reason{})
* on (pod,container) group_left
sum by (pod, container) (changes(kube_pod_container_status_restarts_total{}[1m]))

+29

hypnoglow on Jan 18, 2019

Our team came up with a custom controller to implement the idea of having an OOMKilled event tied to the Pod. Please find it here: https://github.com/xing/kubernetes-oom-event-generator

From the README: The Controller listens to the Kubernetes API for “Container Started” events and searches for those claiming they were OOMKilled previously. For matching ones an Event is generated as Warning with the reason PreviousContainerWasOOMKilled.

We would be very happy to get feedback on it.

+16

boosty on Nov 14, 2019

Indeed, it seems to work 😃

@brancz do you know why this happens? also tried it in 1.3.1.

    - alert: OOMKilled
      expr: sum_over_time(kube_pod_container_status_terminated_reason{reason="OOMKilled"}[5m]) > 0
      for: 1m
      labels:
        severity: warning
      annotations:
        description:  Pod {{$labels.pod}} in {{$labels.namespace}} got OOMKilled

+14

nikopen on Nov 19, 2018

Now that #87856 is closed, what is the best way to alert on OOMKilled containers?

+10

yashbhutwala on Nov 11, 2020

@lukeschlather #100487 should cover the logging and oom event being created for the associated pod that you are wanting.

kgtw on Mar 26, 2021

/remove-lifecycle stale

pierluigilenoci on Jun 13, 2023

/remove-lifecycle stale

pierluigilenoci on Nov 23, 2021

This query combines container restart and termination reason:

sum by (pod, container, reason) (kube_pod_container_status_last_terminated_reason{})
* on (pod,container) group_left
sum by (pod, container) (changes(kube_pod_container_status_restarts_total{}[1m]))

Thanks, this seems to work fine for my use case:

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: oom-rules
  namespace: kube-prometheus-stack
spec:
  groups:
  - name: OOMKilled
    rules:
    - alert: OOMKilled
      expr: 'sum by (pod, container, reason, namespace) (kube_pod_container_status_last_terminated_reason{reason="OOMKilled"}) * on (pod,container) group_left
sum by (pod, container) (changes(kube_pod_container_status_restarts_total{}[1m])) > 0'
      labels:
        severity: warning
      annotations:
        summary: "Container ({{ $labels.container }}) OOMKilled ({{ $labels.namespace }}/{{ $labels.pod }})"

This throws an alert on container OOM events and resolves the alert directly afterwards.

stefanandres on Aug 25, 2021

/remove-lifecycle stale

frittentheke on Aug 11, 2021

Is there a good way of probing OOMKilled? My use case is I want to detect OOM and have actions based on it. Thanks!

xiangninglyu on Jun 2, 2020

/remove-lifecycle rotten

I still think this should be more properly addressed.

brancz on Jun 5, 2019

@bjhaid fwiw you can use mtail against dmesg to produce metrics about oomkill messages.

brancz on Jan 23, 2019

The problem here is that a pod can disappear and there’s no record of why. A metric is useful in that it lets you know something is wrong but it doesn’t actually tell you what is wrong. K8s shouldn’t be killing pods without leaving a record of why it killed which pod in an obvious place.

lukeschlather on Feb 5, 2023

/remove-lifecycle stale

panicbit on Dec 12, 2022

There’s an in progress PR about this now. https://github.com/kubernetes/kubernetes/pull/87856

brancz on Feb 19, 2020

@anderson4u2 I am a bit confused by your last comment. You wrote:

just tried kube_pod_container_status_last_terminated_reason in version 1.4.0

But in the example below you use kube_pod_container_status_terminated_reason, not kube_pod_container_status_last_terminated_reason.

So as far as I see, the new (very useful) metric kube_pod_container_status_last_terminated_reason is still unreleased.

boosty on Nov 22, 2018

/remove-lifecycle stale

maxwolffe on May 22, 2022

Is this still relevant after https://github.com/kubernetes/kubernetes/pull/108004? It seems to me that it is covering the gaps kube-state-metrics has with OOMKilled events.

dgrisonnet on Feb 3, 2023

https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#how-pods-with-resource-limits-are-run

@lukeschlather ⬆️

dims on Feb 6, 2023

Are the memory requests and limits just cgroups under the hood?

lukeschlather on Feb 6, 2023

@lukeschlather for the record, the kernel kills pods, not k8s. that’s the whole problem with this issue 😦

please google for ( “oom kill kernel” )

dims on Feb 5, 2023

/remove-lifecycle stale

olfway on Feb 21, 2022

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot on Aug 11, 2021

What is the component that actually OOMkills the container for going over the memory limit? Can that component simply log something? Where would that log go in GKE? The kubernetes apiserver logs? The node logs?

It seems like a lot of the related issues to this one get bogged down in how to deal with pathological cases (stuff getting killed by the kernel rather than simply getting killed for going over its limit.) Also, I want an event but if it’s going to be another 2 years before someone can figure out how to properly generate an event I would settle for logging anything anywhere at all.

lukeschlather on Mar 17, 2021

What’s the equivalent to looking in dmesg if you’re using a hosted solution like GKE (my actual question) or EKS/AKS?

To the best of my knowledge there is so far no built in way for GKE.

We are using https://github.com/xing/kubernetes-oom-event-generator in combination with alerting on a metric. Just be aware: This only works if the main process is killed and the POD gets evicted. If a subprocess (like a gunicorn worker) is nuked you need to rely on the logging of your running application. See e.g. https://github.com/benoitc/gunicorn/pull/2475

aberres on Mar 17, 2021

https://github.com/xing/kubernetes-oom-event-generator

may be helpful

powo on Nov 7, 2019