kubernetes: Liveness probe errored: rpc error: code = DeadlineExceeded desc = context deadline exceeded
What happened:
I am getting Liveness probe errored: rpc error: code = DeadlineExceeded desc = context deadline exceeded
error using execAction liveness probe Also the container is not restarted when this fails.
livenessProbe:
exec:
command:
- curl
- -XGET
- http://10.8.0.1:9101/
initialDelaySeconds: 10
periodSeconds: 30
timeoutSeconds: 2
successThreshold: 1
failureThreshold: 2
What you expected to happen: Liveness probe failed after failureThreshold and pod restarted
How to reproduce it (as minimally and precisely as possible):
apiVersion: apps/v1
kind: StatefulSet
metadata:
labels:
name: zenalytix
name: zenalytix
namespace: zenalytix
spec:
serviceName: zenalytix
replicas: 1
selector:
matchLabels:
name: zenalytix
template:
metadata:
labels:
name: zenalytix
spec:
containers:
- name: zenalytix
image: gcr.io/archiver/test:0.0.1
imagePullPolicy: IfNotPresent
workingDir: /root/zenalytix
securityContext:
capabilities:
add:
- SYS_ADMIN
- NET_ADMIN
command:
- /bin/bash
- -c
- |
mkdir -p /dev/net && mknod /dev/net/tun c 10 200 && chmod 600 /dev/net/tun && openvpn --config Docker/OpenVPN/mqtt-driver.ovpn & disown &&
gunicorn zenalytix.wsgi -b 0.0.0.0:9104 --workers 2 -k gthread --threads 16 --timeout 300 --log-level info --limit-request-line 8190 --access-logfile -
ports:
- containerPort: 9104
envFrom:
- configMapRef:
name: zenalytix-configmap
volumeMounts:
- mountPath: /root/zenalytix
name: zenalytix-data
livenessProbe:
exec:
command:
- curl
- -XGET
- http://10.8.0.1:9101/
initialDelaySeconds: 10
periodSeconds: 30
timeoutSeconds: 2
successThreshold: 1
failureThreshold: 2
volumes:
- name: zenalytix-data
persistentVolumeClaim:
claimName: zenalytix-pvc
Anything else we need to know?:
Environment:
- Kubernetes version (use
kubectl version
):
Client Version: version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.6", GitCommit:"96fac5cd13a5dc064f7d9f4f23030a6aeface6cc", GitTreeState:"clean", BuildDate:"2019-08-19T11:13:49Z", GoVersion:"go1.12.9", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.6", GitCommit:"96fac5cd13a5dc064f7d9f4f23030a6aeface6cc", GitTreeState:"clean", BuildDate:"2019-08-19T11:05:16Z", GoVersion:"go1.12.9", Compiler:"gc", Platform:"linux/amd64"}
- Cloud provider or hardware configuration: Managed Azure Kubernetes Service (AKS) - 1.14.6
- OS (e.g:
cat /etc/os-release
):
NAME="Ubuntu"
VERSION="18.04.3 LTS (Bionic Beaver)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 18.04.3 LTS"
VERSION_ID="18.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=bionic
UBUNTU_CODENAME=bionic
- Kernel (e.g.
uname -a
):
Linux zenalytix-0 4.15.0-1052-azure #57-Ubuntu SMP Tue Jul 23 19:07:16 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
- Install tools:
- Network plugin and version (if this is a network-related bug):
- Others:
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Reactions: 20
- Comments: 51 (20 by maintainers)
Commits related to this issue
- Allow setting timeout on status command We use cilium status as a liveness check and sometimes (due to other issues), the daemon can fail to respond to the health endpoint within a reasonable amount ... — committed to ashrayjain/cilium by ashrayjain 5 years ago
- Allow setting timeout on status command We use cilium status as a liveness check and sometimes (due to other issues), the daemon can fail to respond to the health endpoint within a reasonable amount ... — committed to ashrayjain/cilium by ashrayjain 5 years ago
- Allow setting timeout on status command We use cilium status as a liveness check and sometimes (due to other issues), the daemon can fail to respond to the health endpoint within a reasonable amount ... — committed to cilium/cilium by ashrayjain 5 years ago
- Allow setting timeout on status command We use cilium status as a liveness check and sometimes (due to other issues), the daemon can fail to respond to the health endpoint within a reasonable amount ... — committed to cilium/cilium by ashrayjain 5 years ago
I am having the same issue when the timeout is not taken into account and the pod does not get restarted when the exec command just never ends. This seems quite a serious issue to me. I believe this should be re-opened
I also met this problem , read this source code , I have found the different between exec probe and http probe
walk around exec command:
My liveness probe had a script . My script used to call Rest API which actually checks overall health . When rest API is not responding or hanged for any reason we get this error “Liveness probe errored: rpc error: code = DeadlineExceeded desc = context deadline exceeded” . I used
–max-time 10
option tocurl
and now whenever API times out i have handled it to make as failure and probe works fine@deepaksood619 , any chance you could re-open this, a few of us seem to still be having issues
We have the same situation with ANY liveness command kind, but not with liveness HTTP request (!!!).
E.g., with
in logs:
or with:
in log:
but with:
all works fine
@deepaksood619 please reopen the issue
@jrivers96 - We just had this issue on EKS 1.17 with docker version 19.03.13 (from amazon linux 2 repo). Reverting back to docker 19.03.6 fixed it for us. It has something to do with the unix sockets being leaked when
kubelet
talks todockerd
to doexec
probes./sig cluster-lifecycle