k3s: Not killing pods by default, system OOM takes over

We have a customer cluster that is overloaded. However, if we describe one of the nodes with issues, it seems that at no point did Kubernetes out of resource handling kick in.

For example the events are:

Events:
  Type     Reason                   Age                  From                        Message
  ----     ------                   ----                 ----                        -------
  Normal   NodeNotReady             25m (x3 over 46m)    kubelet, kube-node-9f4e     Node kube-node-9f4e status is now: NodeNotReady
  Normal   NodeHasNoDiskPressure    19m (x7 over 5d1h)   kubelet, kube-node-9f4e     Node kube-node-9f4e status is now: NodeHasNoDiskPressure
  Normal   NodeHasSufficientPID     19m (x7 over 5d1h)   kubelet, kube-node-9f4e     Node kube-node-9f4e status is now: NodeHasSufficientPID
  Normal   NodeReady                19m (x6 over 5d1h)   kubelet, kube-node-9f4e     Node kube-node-9f4e status is now: NodeReady
  Normal   NodeHasSufficientMemory  19m (x7 over 5d1h)   kubelet, kube-node-9f4e     Node kube-node-9f4e status is now: NodeHasSufficientMemory
  Warning  SystemOOM                12m (x4 over 25m)    kubelet, kube-node-9f4e     System OOM encountered
  Warning  ContainerGCFailed        12m (x4 over 46m)    kubelet, kube-node-9f4e     rpc error: code = DeadlineExceeded desc = context deadline exceeded
  Normal   NodeAllocatableEnforced  9m23s                kubelet, kube-node-9f4e     Updated Node Allocatable limit across pods
  Normal   Starting                 9m23s                kubelet, kube-node-9f4e     Starting kubelet.
  Normal   Starting                 9m23s                kube-proxy, kube-node-9f4e  Starting kube-proxy.

The instance felt stuck (NotReady node status, couldn’t SSH to it) so we rebooted the VM and the events continue with:

  Warning  Rebooted                 9m23s                kubelet, kube-node-9f4e     Node kube-node-9f4e has been rebooted, boot id: 362cf7ed-b89b-44d6-accd-6a840bc56bdc
  Warning  InvalidDiskCapacity      9m23s                kubelet, kube-node-9f4e     invalid capacity 0 on image filesystem
  Normal   NodeHasSufficientPID     66s (x4 over 9m23s)  kubelet, kube-node-9f4e     Node kube-node-9f4e status is now: NodeHasSufficientPID
  Normal   NodeHasNoDiskPressure    66s (x4 over 9m23s)  kubelet, kube-node-9f4e     Node kube-node-9f4e status is now: NodeHasNoDiskPressure
  Normal   NodeReady                66s (x2 over 9m23s)  kubelet, kube-node-9f4e     Node kube-node-9f4e status is now: NodeReady
  Normal   NodeHasSufficientMemory  66s (x4 over 9m23s)  kubelet, kube-node-9f4e     Node kube-node-9f4e status is now: NodeHasSufficientMemory
  Warning  SystemOOM                15s (x3 over 67s)    kubelet, kube-node-9f4e     System OOM encountered

So, am is this correct from a k3s point of view, that the events say NodeHasSufficientMemory and then the next thing is System OOM encountered? Feels like at some point there should have been a memory pressure event and a pod killed, before it got to the point that the system’s OOM killer took over.

Obviously, we’ve mentioned to the customer about setting resource limits for their pods/containers and they will, but they quite rightly say that the cluster should handle this nicely rather than nodes just (effectively) dying completely.

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Reactions: 5
  • Comments: 16 (7 by maintainers)

Most upvoted comments

I have a similar situation:

apiVersion: v1
kind: Pod
metadata:
  name: memory-demo1
spec:
  containers:
    - name: mem
      image: polinux/stress
      resources:
        requests:
          memory: "50Mi"
        limits:
          memory: "100Mi"
      command: ["stress"]
      args: ["--vm", "2", "--vm-bytes", "250M", "--vm-hang", "0"]

should be terminated, but its not.

This appears to no be an issue as of v1.20.15+, spinning up a “guaranteeed” pod

apiVersion: v1
kind: Pod
metadata:
  name: qos-demo
  namespace: qos-example
spec:
  containers:
  - name: qos-demo-ctr
    image: nginx
    resources:
      limits:
        memory: "200Mi"
        cpu: "700m"
      requests:
        memory: "200Mi"
        cpu: "700m"

This pod appears under /sys/fs/cgroup/memory/kubepods/pod26f3fe02-9448-4434-8d53-fac22702a5d0/ which is the correct location. There is no “guaranteed” directory, pods with that QOS sit directly in the /kubepods/ directory

Please make sure that swap is disabled on the system, in particular the OOM killer appears to take swap into consideration when it should not.