kubernetes: Liveness probe errored: rpc error: code = DeadlineExceeded desc = context deadline exceeded

What happened: I am getting Liveness probe errored: rpc error: code = DeadlineExceeded desc = context deadline exceeded error using execAction liveness probe Also the container is not restarted when this fails.

livenessProbe:
            exec:
              command:
                - curl
                - -XGET
                - http://10.8.0.1:9101/
            initialDelaySeconds: 10
            periodSeconds: 30
            timeoutSeconds: 2
            successThreshold: 1
            failureThreshold: 2

What you expected to happen: Liveness probe failed after failureThreshold and pod restarted

How to reproduce it (as minimally and precisely as possible):

apiVersion: apps/v1
kind: StatefulSet
metadata:
  labels:
    name: zenalytix
  name: zenalytix
  namespace: zenalytix
spec:
  serviceName: zenalytix
  replicas: 1
  selector:
    matchLabels:
      name: zenalytix
  template:
    metadata:
      labels:
        name: zenalytix
    spec:
      containers:
        - name: zenalytix
          image: gcr.io/archiver/test:0.0.1
          imagePullPolicy: IfNotPresent
          workingDir: /root/zenalytix
          securityContext:
            capabilities:
              add:
                - SYS_ADMIN
                - NET_ADMIN
          command:
            - /bin/bash
            - -c
            - |
              mkdir -p /dev/net && mknod /dev/net/tun c 10 200 && chmod 600 /dev/net/tun && openvpn --config Docker/OpenVPN/mqtt-driver.ovpn & disown &&
              gunicorn zenalytix.wsgi -b 0.0.0.0:9104 --workers 2 -k gthread --threads 16 --timeout 300 --log-level info --limit-request-line 8190 --access-logfile -
          ports:
            - containerPort: 9104
          envFrom:
            - configMapRef:
                name: zenalytix-configmap
          volumeMounts:
            - mountPath: /root/zenalytix
              name: zenalytix-data
          livenessProbe:
            exec:
              command:
                - curl
                - -XGET
                - http://10.8.0.1:9101/
            initialDelaySeconds: 10
            periodSeconds: 30
            timeoutSeconds: 2
            successThreshold: 1
            failureThreshold: 2
      volumes:
        - name: zenalytix-data
          persistentVolumeClaim:
            claimName: zenalytix-pvc

Anything else we need to know?:

Environment:

Kubernetes version (use kubectl version):

Client Version: version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.6", GitCommit:"96fac5cd13a5dc064f7d9f4f23030a6aeface6cc", GitTreeState:"clean", BuildDate:"2019-08-19T11:13:49Z", GoVersion:"go1.12.9", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.6", GitCommit:"96fac5cd13a5dc064f7d9f4f23030a6aeface6cc", GitTreeState:"clean", BuildDate:"2019-08-19T11:05:16Z", GoVersion:"go1.12.9", Compiler:"gc", Platform:"linux/amd64"}

Cloud provider or hardware configuration: Managed Azure Kubernetes Service (AKS) - 1.14.6
OS (e.g: cat /etc/os-release):

NAME="Ubuntu"
VERSION="18.04.3 LTS (Bionic Beaver)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 18.04.3 LTS"
VERSION_ID="18.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=bionic
UBUNTU_CODENAME=bionic

Kernel (e.g. uname -a):

Linux zenalytix-0 4.15.0-1052-azure #57-Ubuntu SMP Tue Jul 23 19:07:16 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

Install tools:
Network plugin and version (if this is a network-related bug):
Others:

About this issue

Original URL
State: closed
Created 5 years ago
Reactions: 20
Comments: 51 (20 by maintainers)

Commits related to this issue

Allow setting timeout on status command We use cilium status as a liveness check and sometimes (due to other issues), the daemon can fail to respond to the health endpoint within a reasonable amount ... — committed to ashrayjain/cilium by ashrayjain 5 years ago
Allow setting timeout on status command We use cilium status as a liveness check and sometimes (due to other issues), the daemon can fail to respond to the health endpoint within a reasonable amount ... — committed to ashrayjain/cilium by ashrayjain 5 years ago
Allow setting timeout on status command We use cilium status as a liveness check and sometimes (due to other issues), the daemon can fail to respond to the health endpoint within a reasonable amount ... — committed to cilium/cilium by ashrayjain 5 years ago
Allow setting timeout on status command We use cilium status as a liveness check and sometimes (due to other issues), the daemon can fail to respond to the health endpoint within a reasonable amount ... — committed to cilium/cilium by ashrayjain 5 years ago

Most upvoted comments

I am having the same issue when the timeout is not taken into account and the pod does not get restarted when the exec command just never ends. This seems quite a serious issue to me. I believe this should be re-opened

+11

sgandon on Sep 10, 2020

I also met this problem , read this source code , I have found the different between exec probe and http probe

1.   exec probe handle command error, command timeout (exception when exec ) as error probe ,  ignore the result , POD status never change in this  situation

2.  http, tcp  probe handle every error as failed probe ,  change POD status when some error occured

walk around exec command:

add timeout in command , and command exec timeout must < timeoutSeconds

gaopeiliang on Nov 4, 2019

My liveness probe had a script . My script used to call Rest API which actually checks overall health . When rest API is not responding or hanged for any reason we get this error “Liveness probe errored: rpc error: code = DeadlineExceeded desc = context deadline exceeded” . I used –max-time 10 option to curl and now whenever API times out i have handled it to make as failure and probe works fine

Gajanana on Nov 26, 2021

@deepaksood619 , any chance you could re-open this, a few of us seem to still be having issues

AntonOfTheWoods on Nov 16, 2020

We have the same situation with ANY liveness command kind, but not with liveness HTTP request (!!!).

E.g., with

        livenessProbe:
          exec:
            command:
            - /bin/sh
            - -c
            - curl http://localhost:8080/services/timeout
          failureThreshold: 5
          initialDelaySeconds: 300
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 5

in logs:

...
Oct 11 12:15:52 lpm-rtb-055 containerd[2943]: time="2019-10-11T12:15:52.750175916Z" level=error msg="ExecSync for \"e8c5a50d8910a6fa64367be2b7a1e972086cb304436edcf4bbfc923ba61c5ec0\" failed" error="rpc error: code = DeadlineExceeded desc = failed to exec in container: timeout 5s exceeded: context deadline exceeded"
Oct 11 12:15:52 lpm-rtb-055 kubelet[24706]: E1011 12:15:52.750354   24706 remote_runtime.go:351] ExecSync e8c5a50d8910a6fa64367be2b7a1e972086cb304436edcf4bbfc923ba61c5ec0 '/bin/sh -c curl http://localhost:8080/services/timeout' from runtime service failed: rpc error: code = DeadlineExceeded desc = failed to exec in container: timeout 5s exceeded: context deadline exceeded
Oct 11 12:15:52 lpm-rtb-055 containerd[2943]: time="2019-10-11T12:15:52.750621068Z" level=info msg="ExecSync for \"e8c5a50d8910a6fa64367be2b7a1e972086cb304436edcf4bbfc923ba61c5ec0\" with command [/bin/sh -c curl http://localhost:8080/services/timeout] and timeout 5 (s)"
Oct 11 12:15:56 lpm-rtb-055 containerd[2943]: time="2019-10-11T12:15:56.598590028Z" level=info msg="ExecSync for \"c307c74027f93dd8ee1a333dac58b3010381d47900bdbf147784d0b81073a548\" with command [cilium status --brief] and timeout 5 (s)"
Oct 11 12:15:56 lpm-rtb-055 containerd[2943]: time="2019-10-11T12:15:56.642762294Z" level=info msg="Finish piping \"stdout\" of container exec \"52099a96bd6c5ee436d55430081f0c70926ab7a46ffbf3f1861f8780c64bc92a\""
Oct 11 12:15:56 lpm-rtb-055 containerd[2943]: time="2019-10-11T12:15:56.642830099Z" level=info msg="Finish piping \"stderr\" of container exec \"52099a96bd6c5ee436d55430081f0c70926ab7a46ffbf3f1861f8780c64bc92a\""
Oct 11 12:15:56 lpm-rtb-055 containerd[2943]: time="2019-10-11T12:15:56.643353886Z" level=info msg="Exec process \"52099a96bd6c5ee436d55430081f0c70926ab7a46ffbf3f1861f8780c64bc92a\" exits with exit code 0 and error <nil>"
Oct 11 12:15:56 lpm-rtb-055 containerd[2943]: time="2019-10-11T12:15:56.651292451Z" level=info msg="ExecSync for \"c307c74027f93dd8ee1a333dac58b3010381d47900bdbf147784d0b81073a548\" returns with exit code 0"
Oct 11 12:15:57 lpm-rtb-055 containerd[2943]: time="2019-10-11T12:15:57.765087326Z" level=info msg="Timeout received while waiting for exec process kill \"4960584d1ab7137d9b8ec1cda7027d5eb9631c316df550febd14d8274fceaf7a\" code 137 and error <nil>"
...

or with:

        livenessProbe:
          exec:
            command:
            - /bin/sh
            - -c
            - cat /tmp/live  && /opt/java/openjdk/bin/jcmd 1 VM.version
          failureThreshold: 5
          initialDelaySeconds: 300
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 3

in log:

Oct 10 11:02:17 lpm-rtb-011 containerd[2929]: time="2019-10-10T11:02:17.197069769Z" level=error msg="ExecSync for \"9653ce99f2077028784b7da866f74cf3ac82e5f02720698eb7598f3456ad2aa1\" failed" error="rpc error: code = DeadlineExceeded desc = failed to exec in container: timeout 3s exceeded: context deadline exceeded"
Oct 10 11:02:17 lpm-rtb-011 kubelet[19143]: E1010 11:02:17.197213   19143 remote_runtime.go:351] ExecSync 9653ce99f2077028784b7da866f74cf3ac82e5f02720698eb7598f3456ad2aa1 '/bin/sh -c cat /tmp/live && /opt/java/openjdk/bin/jcmd 1 VM.version' from runtime service failed: rpc error: code = DeadlineExceeded desc = failed to exec in container: timeout 3s exceeded: context deadline exceeded
Oct 10 11:02:17 lpm-rtb-011 containerd[2929]: time="2019-10-10T11:02:17.197478679Z" level=info msg="ExecSync for \"9653ce99f2077028784b7da866f74cf3ac82e5f02720698eb7598f3456ad2aa1\" with command [/bin/sh -c cat /tmp/live && /opt/java/openjdk/bin/jcmd 1 VM.version] and timeout 3 (s)"
Oct 10 11:02:18 lpm-rtb-011 containerd[2929]: time="2019-10-10T11:02:18.222720588Z" level=info msg="Timeout received while waiting for exec process kill \"c260445a9ac301fe2cbd8484690496c66a16543e084b31fea31c6481e05ba1b7\" code 137 and error <nil>"
Oct 10 11:02:20 lpm-rtb-011 containerd[2929]: time="2019-10-10T11:02:20.230159197Z" level=info msg="Timeout received while waiting for exec process kill \"94db1a1c60c7c5ad0aef46949b42c219991522efcd388efe0107959ae195d71b\" code 137 and error <nil>"

but with:

        livenessProbe:
          failureThreshold: 5
          httpGet:
            path: /liveness
            port: 8080
          initialDelaySeconds: 120
          periodSeconds: 30
          successThreshold: 1
          timeoutSeconds: 5

all works fine

ealebed on Oct 11, 2019

@deepaksood619 please reopen the issue

antonakv on Nov 26, 2020

@jrivers96 - We just had this issue on EKS 1.17 with docker version 19.03.13 (from amazon linux 2 repo). Reverting back to docker 19.03.6 fixed it for us. It has something to do with the unix sockets being leaked when kubelet talks to dockerd to do exec probes.

PeteE on Nov 18, 2020

kubectl version
Client Version: version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.6", GitCommit:"96fac5cd13a5dc064f7d9f4f23030a6aeface6cc", GitTreeState:"clean", BuildDate:"2019-08-19T11:13:49Z", GoVersion:"go1.12.9", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"16", GitVersion:"v1.16.0", GitCommit:"2bd9643cee5b3b3a5ecbd3af49d09018f0773c77", GitTreeState:"clean", BuildDate:"2019-09-18T14:27:17Z", GoVersion:"go1.12.9", Compiler:"gc", Platform:"linux/amd64"}

cat /etc/os-release
NAME="Ubuntu"
VERSION="18.04.3 LTS (Bionic Beaver)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 18.04.3 LTS"
VERSION_ID="18.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=bionic
UBUNTU_CODENAME=bionic

uname -a
Linux lpm-rtb-055 5.3.1-050301-generic #201909210632 SMP Sat Sep 21 06:34:27 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

ealebed on Oct 11, 2019

/sig cluster-lifecycle

deepaksood619 on Sep 22, 2019