kubernetes: liveness/readiness probe is executed and failed while pod is terminated

What happened: liveness/readiness probe fails while pod is terminated. Also it happened only once during the pod termination. The issue started happening after upgrading version to v1.7 from v1.6.X

How to reproduce it (as minimally and precisely as possible): execute kubectl delete pod nginx-A1 to delete pod, so status of the nginx-podA1 is changed to Terminating, right after that it seems Liveness and Readiness Probe is executed and failed, but only once. Nginx reverse proxy is running in the pod. so I just use httpGetmethod for liveness and readiness

Here is my Deployment config.

   ...
   spec:
      terminationGracePeriodSeconds: 60
        ...
        livenessProbe:
          httpGet:
            path: /healthz
            port: 8080
          initialDelaySeconds: 15
          timeoutSeconds: 3
        readinessProbe:
          httpGet:
            path: /healthz
            port: 8080
          initialDelaySeconds: 15
          timeoutSeconds: 3

Here is Events log by kubectl describe pod nginx-A1

Events:
  FirstSeen	LastSeen	Count	From					SubObjectPath			Type		Reason		Message
  ---------	--------	-----	----					-------------			--------	------		-------
  14s		14s		1	kubelet, *****	spec.containers{dnsmasq}	Normal		Killing		Killing container with id docker://dnsmasq:Need to kill Pod
  9s		9s		1	kubelet, *****	spec.containers{nginx}		Warning		Unhealthy	Liveness probe failed: Get http://100.*.*.*:8080/healthz: dial tcp 100.*.*.*:8080: getsockopt: connection refused
  9s		9s		1	kubelet, *****	spec.containers{nginx}		Warning		Unhealthy	Readiness probe failed: Get http://100.*.*.*:8080/healthz: dial tcp 100.*.*.*:8080: getsockopt: connection refused

Environment:

Kubernetes version: 1.7.2

About this issue

Original URL
State: closed
Created 7 years ago
Reactions: 23
Comments: 41 (16 by maintainers)

Most upvoted comments

@matthyx we’re running 1.16 and being hit by this continuously when some of our elixir apps shutdown so cherry-picking it in 1.18 would at least put it closer in our update path

https://github.com/kubernetes/kubernetes/pull/100525 https://github.com/kubernetes/kubernetes/pull/100526 https://github.com/kubernetes/kubernetes/pull/100527

matthyx on Mar 24, 2021

/reopen /remove-lifecycle rotten

We see this consistently with all pods who define a liveness or readiness probe. Whenever we roll out a new deployment, the pods who are terminated will emit a failed liveness/readiness probe AFTER they have been terminated. We have considered adding a preStop hook that just sleeps for 2-3 seconds, but it seems like a band-aid solution to something that should not happen in the first place.

Is this an impossible-to-solve race condition between kubernetes moving parts?

cpnielsen on Nov 20, 2018

Should I consider a cherry-pick for 1.20 and 1.19? (maybe 1.18 too?)

matthyx on Feb 25, 2021

Since the upgrade to 1.7, it seems our deployment rollouts have a higher failure rate. Occasionally, the pod would come up, but no readiness probe ever gets started. It stays in that state, blocking the entire deployment. I usually have to delete the pod so it is rescheduled, and a new readiness probe is fired to check.

I wonder if these are related issues here.

hosh on Oct 12, 2017

@matthyx we’re running 1.16 and being hit by this continuously when some of our elixir apps shutdown so cherry-picking it in 1.18 would at least put it closer in our update path 👼

opsidao on Mar 24, 2021

I am also facing this issue. We have pods which have a lot of cleanup to do during shutdown, it can take up to 5 mins to terminate gracefully. During this time the livelinessProbe is detecting failure and restarting the pod. not really what we want. I am unable to prevent the service that handles the liveliness check from stopping while the cleanup is happening. It would be better if the pod was immediately removed from the service and the probes stopped while the shutdown is performed. Basically this ends up that k8s never actually is able to terminate the pod.

nathanleyton on Aug 1, 2020

kubernetes_version 1.19.3. Same issue.

Samridhigupta786 on Jan 18, 2021