kubernetes: Container Termination Discrepancy for Exit Code 137

In the documentation here it shows a pod using too much memory and is promptly killed. When this happens it shows the error reason as “OOM” and error code as 137 in the docs. When I go through similar steps myself, the termination reason is just “Error”, though I do still get the 137 error code. Is there a reason this was changed? OOM is very clear on what happened while “Error” can send people down a wild chase trying to figure out what happened to their pod - hence me filing this issue.

For reference the script ran in my docker image just eats memory until it the container gets killed.

$kubectl version
Client Version: version.Info{Major:"1", Minor:"1", GitVersion:"v1.1.1", GitCommit:"92635e23dfafb2ddc828c8ac6c03c7a7205a84d8", GitTreeState:"clean"}
Server Version: version.Info{Major:"1", Minor:"1", GitVersion:"v1.1.3", GitCommit:"6a81b50c7e97bbe0ade075de55ab4fa34f049dc2", GitTreeState:"clean"}
$ kubectl get pod -o json  memtest
{
    "kind": "Pod",
    "apiVersion": "v1",
    "metadata": {
        "name": "memtest",
        "namespace": "default",
        "selfLink": "/api/v1/namespaces/default/pods/memtest",
        "uid": "c480d1ba-bec2-11e5-ad45-062d2421a4bd",
        "resourceVersion": "21949421",
        "creationTimestamp": "2016-01-19T15:39:03Z"
    },
    "spec": {
        "containers": [
            {
                "name": "memtest",
                "image": "nyxcharon/docker-stress:latest",
                "args": [
                    "python",
                    "/scripts/mem-fill"
                ],
                "resources": {
                    "limits": {
                        "memory": "10M"
                    },
                    "requests": {
                        "memory": "10M"
                    }
                },
                "terminationMessagePath": "/dev/termination-log",
                "imagePullPolicy": "Always"
            }
        ],
        "restartPolicy": "Never",
        "terminationGracePeriodSeconds": 30,
        "dnsPolicy": "ClusterFirst",
        "nodeName": "<IP removed>"
    },
    "status": {
        "phase": "Failed",
        "conditions": [
            {
                "type": "Ready",
                "status": "False",
                "lastProbeTime": null,
                "lastTransitionTime": null
            }
        ],
        "hostIP": "<IP Removed>",
        "startTime": "2016-01-19T15:39:03Z",
        "containerStatuses": [
            {
                "name": "memtest",
                "state": {
                    "terminated": {
                        "exitCode": 137,
                        "reason": "Error",
                        "startedAt": "2016-01-19T15:39:15Z",
                        "finishedAt": "2016-01-19T15:39:16Z",
                        "containerID": "docker://3dd77f77dfd6e715c8792c625f388e0b31cbd36ccdb4a11dafbb6d381bf83943"
                    }
                },
                "lastState": {},
                "ready": false,
                "restartCount": 0,
                "image": "nyxcharon/docker-stress:latest",
                "imageID": "docker://bacbb71b34e92ed2074621d86b10ec15a856f5918537c4d75b6f16925b5b93e7",
                "containerID": "docker://3dd77f77dfd6e715c8792c625f388e0b31cbd36ccdb4a11dafbb6d381bf83943"
            }
        ]
    }
}
$ kubectl describe pod memtest
Name:               memtest
Namespace:          default
Image(s):           nyxcharon/docker-stress:latest
Node:               <IP removed>
Start Time:         Tue, 19 Jan 2016 10:39:03 -0500
Labels:             <none>
Status:             Failed
Reason:
Message:
IP:
Replication Controllers:    <none>
Containers:
  memtest:
    Container ID:   docker://3dd77f77dfd6e715c8792c625f388e0b31cbd36ccdb4a11dafbb6d381bf83943
    Image:      nyxcharon/docker-stress:latest
    Image ID:       docker://bacbb71b34e92ed2074621d86b10ec15a856f5918537c4d75b6f16925b5b93e7
    QoS Tier:
      memory:   Guaranteed
    Limits:
      memory:   10M
    Requests:
      memory:       10M
    State:      Terminated
      Reason:       Error
      Exit Code:    137
      Started:      Tue, 19 Jan 2016 10:39:15 -0500
      Finished:     Tue, 19 Jan 2016 10:39:16 -0500
    Ready:      False
    Restart Count:  0
    Environment Variables:
Conditions:
  Type      Status
  Ready     False
No volumes.
Events:
  FirstSeen LastSeen    Count   From                    SubobjectPath               Reason  Message
  ─────────   ────────    ───── ────                    ─────────────             ──────  ───────
  2m        2m      1   {kubelet <IP removed>}  implicitly required container POD   Pulled  Container image "gcr.io/google_containers/pause:0.8.0" already present on machine
  2m        2m      1   {scheduler }                                    Scheduled   Successfully assigned memtest to <IP removed>
  2m        2m      1   {kubelet <IP removed>}  implicitly required container POD   CreatedCreated with docker id 65a446677edd
  2m        2m      1   {kubelet <IP removed>}  spec.containers{memtest}        PullingPulling image "nyxcharon/docker-stress:latest"
  2m        2m      1   {kubelet <IP removed>}  implicitly required container POD   StartedStarted with docker id 65a446677edd
  2m        2m      1   {kubelet <IP removed>}  spec.containers{memtest}        Pulled  Successfully pulled image "nyxcharon/docker-stress:latest"
  2m        2m      1   {kubelet <IP removed>}  spec.containers{memtest}        CreatedCreated with docker id 3dd77f77dfd6
  2m        2m      1   {kubelet <IP removed>}  spec.containers{memtest}        StartedStarted with docker id 3dd77f77dfd6
  2m        2m      1   {kubelet <IP removed>}  implicitly required container POD   KillingKilling with docker id 65a446677edd

Here is the pod definition i’m using:

kind: Pod
apiVersion: v1
metadata:
  name: memtest
spec:
  containers:
  - name: memtest
    image: nyxcharon/docker-stress:latest
    args:
    - python
    - /scripts/mem-fill
    resources:
      limits:
        memory: 10M
  restartPolicy: Never

About this issue

  • Original URL
  • State: closed
  • Created 8 years ago
  • Reactions: 19
  • Comments: 31 (8 by maintainers)

Most upvoted comments

Hey guys, Sorry to bump this up after 2 years, but still the same problem here. I am getting a 137 error code while there is no OOM issue. And yes it’s because of the liveness probe not being successful. I am not even sure that this 137 should be even reserved for OOM. So my suggestion would be more about having a specific exitCode when the liveness probe is failed and the pod is running?

I suspect its because of liveliness probe failing, but not sure why it should exit with code 137 which is for OOM.

137 exit code is returned by docker engine whenever the container experiences a OOM kill.

On Sat, Mar 4, 2017 at 1:38 AM, Romeo Mihalcea notifications@github.com wrote:

I have the same Exit 137 status showing up all over the place. One of the things that I discovered was a service phpmyadmin (which was limited to 512Mi in ram) that was eating 2Gb of RAM. How it managed to do that I don’t know but it was causing other probes on the same VM to crash:

“resources”: { “requests”: { “cpu”: “50m”, “memory”: “128Mi” }, “limits”: { “cpu”: “200m”, “memory”: “512Mi” } }

I have frequent crashes of the same sort that occur on different VM’s though so I can’t place my finger exactly on the phpmyadmin service (which I completely removed now to test thing out).

All the pods that crash have liveness and readiness in place btw.

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/kubernetes/kubernetes/issues/19825#issuecomment-284140313, or mute the thread https://github.com/notifications/unsubscribe-auth/AGvIKHGqV9zoQ5a_mdsEB7LLYKR9fIj9ks5riTEPgaJpZM4HH2e5 .

I am also seeing one weird behaviour where container is using 40% the memory of the request, and it is getting restarted throwing 137 exit code. There is no logs for oom or liveliness check failures as it is set pretty high but container is getting kill repeatedly throwing 137 code. Also application in itself is not throwing any error, etc so I am guessing something from k8s is killing the pod.

Any clue anyone on what might be going wrong.

(…) I am not even sure that this 137 should be even reserved for OOM. So my suggestion would be more about having a specific exitCode when the liveness probe is failed and the pod is running?

Subtract 128, and you will get the signal itself (signal 9 is SIGKILL). See https://tldp.org/LDP/abs/html/exitcodes.html and https://www.gnu.org/software/bash/manual/html_node/Exit-Status.html

I’m seeing the same issue on k8s v1.21

Last State:     Terminated
Reason:       Error
Exit Code:    137

Even though the exit code shows 137, I don’t see any signs of OOM. The pod got restarted due to problems with readiness probe.

Warning  Unhealthy  15m (x6 over 16m)     kubelet  Readiness probe failed: Connecting to ...
wget: server returned error: HTTP/1.1 503 Service Unavailable

I set a higher memory limit for the pod and double the initial replicas number, and the problem was gone and there is no restarting now.

more than 5 yrs and still no proper solution.

I was having the same problem, first looked like it was a problem with resources in the pod, but eventually, liveness probe was failing and it happened due to Istio rules which were blocking health checks.