kubernetes: Liveness probe errored: rpc error: code = DeadlineExceeded desc = context deadline exceeded

What happened: I am getting Liveness probe errored: rpc error: code = DeadlineExceeded desc = context deadline exceeded error using execAction liveness probe Also the container is not restarted when this fails.

livenessProbe:
            exec:
              command:
                - curl
                - -XGET
                - http://10.8.0.1:9101/
            initialDelaySeconds: 10
            periodSeconds: 30
            timeoutSeconds: 2
            successThreshold: 1
            failureThreshold: 2

What you expected to happen: Liveness probe failed after failureThreshold and pod restarted

How to reproduce it (as minimally and precisely as possible):

apiVersion: apps/v1
kind: StatefulSet
metadata:
  labels:
    name: zenalytix
  name: zenalytix
  namespace: zenalytix
spec:
  serviceName: zenalytix
  replicas: 1
  selector:
    matchLabels:
      name: zenalytix
  template:
    metadata:
      labels:
        name: zenalytix
    spec:
      containers:
        - name: zenalytix
          image: gcr.io/archiver/test:0.0.1
          imagePullPolicy: IfNotPresent
          workingDir: /root/zenalytix
          securityContext:
            capabilities:
              add:
                - SYS_ADMIN
                - NET_ADMIN
          command:
            - /bin/bash
            - -c
            - |
              mkdir -p /dev/net && mknod /dev/net/tun c 10 200 && chmod 600 /dev/net/tun && openvpn --config Docker/OpenVPN/mqtt-driver.ovpn & disown &&
              gunicorn zenalytix.wsgi -b 0.0.0.0:9104 --workers 2 -k gthread --threads 16 --timeout 300 --log-level info --limit-request-line 8190 --access-logfile -
          ports:
            - containerPort: 9104
          envFrom:
            - configMapRef:
                name: zenalytix-configmap
          volumeMounts:
            - mountPath: /root/zenalytix
              name: zenalytix-data
          livenessProbe:
            exec:
              command:
                - curl
                - -XGET
                - http://10.8.0.1:9101/
            initialDelaySeconds: 10
            periodSeconds: 30
            timeoutSeconds: 2
            successThreshold: 1
            failureThreshold: 2
      volumes:
        - name: zenalytix-data
          persistentVolumeClaim:
            claimName: zenalytix-pvc

Anything else we need to know?:

Environment:

  • Kubernetes version (use kubectl version):
Client Version: version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.6", GitCommit:"96fac5cd13a5dc064f7d9f4f23030a6aeface6cc", GitTreeState:"clean", BuildDate:"2019-08-19T11:13:49Z", GoVersion:"go1.12.9", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.6", GitCommit:"96fac5cd13a5dc064f7d9f4f23030a6aeface6cc", GitTreeState:"clean", BuildDate:"2019-08-19T11:05:16Z", GoVersion:"go1.12.9", Compiler:"gc", Platform:"linux/amd64"}
  • Cloud provider or hardware configuration: Managed Azure Kubernetes Service (AKS) - 1.14.6
  • OS (e.g: cat /etc/os-release):
NAME="Ubuntu"
VERSION="18.04.3 LTS (Bionic Beaver)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 18.04.3 LTS"
VERSION_ID="18.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=bionic
UBUNTU_CODENAME=bionic
  • Kernel (e.g. uname -a):
Linux zenalytix-0 4.15.0-1052-azure #57-Ubuntu SMP Tue Jul 23 19:07:16 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
  • Install tools:
  • Network plugin and version (if this is a network-related bug):
  • Others:

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Reactions: 20
  • Comments: 51 (20 by maintainers)

Commits related to this issue

Most upvoted comments

I am having the same issue when the timeout is not taken into account and the pod does not get restarted when the exec command just never ends. This seems quite a serious issue to me. I believe this should be re-opened

I also met this problem , read this source code , I have found the different between exec probe and http probe

1.   exec probe handle command error, command timeout (exception when exec ) as error probe ,  ignore the result , POD status never change in this  situation

2.  http, tcp  probe handle every error as failed probe ,  change POD status when some error occured 

walk around exec command:

  1. add timeout in command , and command exec timeout must < timeoutSeconds

My liveness probe had a script . My script used to call Rest API which actually checks overall health . When rest API is not responding or hanged for any reason we get this error “Liveness probe errored: rpc error: code = DeadlineExceeded desc = context deadline exceeded” . I used –max-time 10 option to curl and now whenever API times out i have handled it to make as failure and probe works fine

@deepaksood619 , any chance you could re-open this, a few of us seem to still be having issues

We have the same situation with ANY liveness command kind, but not with liveness HTTP request (!!!).

E.g., with

        livenessProbe:
          exec:
            command:
            - /bin/sh
            - -c
            - curl http://localhost:8080/services/timeout
          failureThreshold: 5
          initialDelaySeconds: 300
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 5

in logs:

...
Oct 11 12:15:52 lpm-rtb-055 containerd[2943]: time="2019-10-11T12:15:52.750175916Z" level=error msg="ExecSync for \"e8c5a50d8910a6fa64367be2b7a1e972086cb304436edcf4bbfc923ba61c5ec0\" failed" error="rpc error: code = DeadlineExceeded desc = failed to exec in container: timeout 5s exceeded: context deadline exceeded"
Oct 11 12:15:52 lpm-rtb-055 kubelet[24706]: E1011 12:15:52.750354   24706 remote_runtime.go:351] ExecSync e8c5a50d8910a6fa64367be2b7a1e972086cb304436edcf4bbfc923ba61c5ec0 '/bin/sh -c curl http://localhost:8080/services/timeout' from runtime service failed: rpc error: code = DeadlineExceeded desc = failed to exec in container: timeout 5s exceeded: context deadline exceeded
Oct 11 12:15:52 lpm-rtb-055 containerd[2943]: time="2019-10-11T12:15:52.750621068Z" level=info msg="ExecSync for \"e8c5a50d8910a6fa64367be2b7a1e972086cb304436edcf4bbfc923ba61c5ec0\" with command [/bin/sh -c curl http://localhost:8080/services/timeout] and timeout 5 (s)"
Oct 11 12:15:56 lpm-rtb-055 containerd[2943]: time="2019-10-11T12:15:56.598590028Z" level=info msg="ExecSync for \"c307c74027f93dd8ee1a333dac58b3010381d47900bdbf147784d0b81073a548\" with command [cilium status --brief] and timeout 5 (s)"
Oct 11 12:15:56 lpm-rtb-055 containerd[2943]: time="2019-10-11T12:15:56.642762294Z" level=info msg="Finish piping \"stdout\" of container exec \"52099a96bd6c5ee436d55430081f0c70926ab7a46ffbf3f1861f8780c64bc92a\""
Oct 11 12:15:56 lpm-rtb-055 containerd[2943]: time="2019-10-11T12:15:56.642830099Z" level=info msg="Finish piping \"stderr\" of container exec \"52099a96bd6c5ee436d55430081f0c70926ab7a46ffbf3f1861f8780c64bc92a\""
Oct 11 12:15:56 lpm-rtb-055 containerd[2943]: time="2019-10-11T12:15:56.643353886Z" level=info msg="Exec process \"52099a96bd6c5ee436d55430081f0c70926ab7a46ffbf3f1861f8780c64bc92a\" exits with exit code 0 and error <nil>"
Oct 11 12:15:56 lpm-rtb-055 containerd[2943]: time="2019-10-11T12:15:56.651292451Z" level=info msg="ExecSync for \"c307c74027f93dd8ee1a333dac58b3010381d47900bdbf147784d0b81073a548\" returns with exit code 0"
Oct 11 12:15:57 lpm-rtb-055 containerd[2943]: time="2019-10-11T12:15:57.765087326Z" level=info msg="Timeout received while waiting for exec process kill \"4960584d1ab7137d9b8ec1cda7027d5eb9631c316df550febd14d8274fceaf7a\" code 137 and error <nil>"
...

or with:

        livenessProbe:
          exec:
            command:
            - /bin/sh
            - -c
            - cat /tmp/live  && /opt/java/openjdk/bin/jcmd 1 VM.version
          failureThreshold: 5
          initialDelaySeconds: 300
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 3

in log:

Oct 10 11:02:17 lpm-rtb-011 containerd[2929]: time="2019-10-10T11:02:17.197069769Z" level=error msg="ExecSync for \"9653ce99f2077028784b7da866f74cf3ac82e5f02720698eb7598f3456ad2aa1\" failed" error="rpc error: code = DeadlineExceeded desc = failed to exec in container: timeout 3s exceeded: context deadline exceeded"
Oct 10 11:02:17 lpm-rtb-011 kubelet[19143]: E1010 11:02:17.197213   19143 remote_runtime.go:351] ExecSync 9653ce99f2077028784b7da866f74cf3ac82e5f02720698eb7598f3456ad2aa1 '/bin/sh -c cat /tmp/live && /opt/java/openjdk/bin/jcmd 1 VM.version' from runtime service failed: rpc error: code = DeadlineExceeded desc = failed to exec in container: timeout 3s exceeded: context deadline exceeded
Oct 10 11:02:17 lpm-rtb-011 containerd[2929]: time="2019-10-10T11:02:17.197478679Z" level=info msg="ExecSync for \"9653ce99f2077028784b7da866f74cf3ac82e5f02720698eb7598f3456ad2aa1\" with command [/bin/sh -c cat /tmp/live && /opt/java/openjdk/bin/jcmd 1 VM.version] and timeout 3 (s)"
Oct 10 11:02:18 lpm-rtb-011 containerd[2929]: time="2019-10-10T11:02:18.222720588Z" level=info msg="Timeout received while waiting for exec process kill \"c260445a9ac301fe2cbd8484690496c66a16543e084b31fea31c6481e05ba1b7\" code 137 and error <nil>"
Oct 10 11:02:20 lpm-rtb-011 containerd[2929]: time="2019-10-10T11:02:20.230159197Z" level=info msg="Timeout received while waiting for exec process kill \"94db1a1c60c7c5ad0aef46949b42c219991522efcd388efe0107959ae195d71b\" code 137 and error <nil>"

but with:

        livenessProbe:
          failureThreshold: 5
          httpGet:
            path: /liveness
            port: 8080
          initialDelaySeconds: 120
          periodSeconds: 30
          successThreshold: 1
          timeoutSeconds: 5

all works fine

@deepaksood619 please reopen the issue

@jrivers96 - We just had this issue on EKS 1.17 with docker version 19.03.13 (from amazon linux 2 repo). Reverting back to docker 19.03.6 fixed it for us. It has something to do with the unix sockets being leaked when kubelet talks to dockerd to do exec probes.

kubectl version
Client Version: version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.6", GitCommit:"96fac5cd13a5dc064f7d9f4f23030a6aeface6cc", GitTreeState:"clean", BuildDate:"2019-08-19T11:13:49Z", GoVersion:"go1.12.9", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"16", GitVersion:"v1.16.0", GitCommit:"2bd9643cee5b3b3a5ecbd3af49d09018f0773c77", GitTreeState:"clean", BuildDate:"2019-09-18T14:27:17Z", GoVersion:"go1.12.9", Compiler:"gc", Platform:"linux/amd64"}
cat /etc/os-release
NAME="Ubuntu"
VERSION="18.04.3 LTS (Bionic Beaver)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 18.04.3 LTS"
VERSION_ID="18.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=bionic
UBUNTU_CODENAME=bionic
uname -a
Linux lpm-rtb-055 5.3.1-050301-generic #201909210632 SMP Sat Sep 21 06:34:27 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

/sig cluster-lifecycle